Semi-Supervised Learning

Size: px
Start display at page:

Download "Semi-Supervised Learning"

Transcription

1 Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume that all possble conjunctons of the 4 attrbutes are used. (5 feature n each example). Assume we wll use naïve Bayes for learnng to decde between [n,v] Examples are: (x,x, x n,[n,v]) EM CS446 Sprng 7

2 Usng naïve Bayes To use naïve Bayes, we need to use the data to estmate: P(n) P(v) P(x n) P(x v) P(x n) P(x v) P(x n n) P(x n v) Then, gven an example (x,x, x n,?), compare: P(n x)=p(n) P(x n) P(x n) P(x n n) and P(v x)=p(v) P(x v) P(x v) P(x n v) EM CS446 Sprng 7

3 Usng naïve Bayes After seeng 0 examples, we have: P(n) =0.5; P(v)=0.5 P(x n)=0.75;p(x n) =0.5; P(x 3 n) =0.5; P(x 4 n) =0.5 P(x v)=0.5; P(x v) =0.5;P(x 3 v) =0.75;P(x 4 v) =0.5 Then, gven an example x=(000), we have: P n (x)= = 3/64 P v (x)= =3/56 Now, assume that n addton to the 0 labeled examples, we also have 00 unlabeled examples. Wll that help? EM CS446 Sprng 7 3

4 Usng naïve Bayes For example, what can be done wth the example (000)? We have an estmate for ts label But, can we use t to mprove the classfer (that s, the estmaton of the probabltes that we wll use n the future)? Opton : We can make predctons, and beleve them Or some of them (based on what?) Opton : We can assume the example x=(000) s a An n-labeled example wth probablty P n (x)/(p n (x) + P v (x)) A v-labeled example wth probablty P v (x)/(p n (x) + P v (x)) Estmaton of probabltes does not requre workng wth ntegers! EM CS446 Sprng 7 4

5 Usng Unlabeled Data The dscusson suggests several algorthms:. Use a threshold. Chose examples labeled wth hgh confdence. Label them [n,v]. Retran.. Use fractonal examples. Label the examples wth fractonal labels [p of n, (-p) of v]. Retran. EM CS446 Sprng 7 5

6 Comments on Unlabeled Data Both algorthms suggested can be used teratvely. Both algorthms can be used wth other classfers, not only naïve Bayes. The only requrement a robust confdence measure n the classfcaton. There are other approaches to Sem-Supervsed learnng: See ncluded papers (co-tranng; Yarowksy s Decson Lst/Bootstrappng algorthm; graph-based algorthms that assume smlar examples have smlar labels, etc.) What happens f nstead of 0 labeled examples we start wth 0 labeled examples? Make a Guess; contnue as above; a verson of EM EM CS446 Sprng 7 6

7 EM EM s a class of algorthms that s used to estmate a probablty dstrbuton n the presence of mssng attrbutes. Usng t, requres an assumpton on the underlyng probablty dstrbuton. The algorthm can be very senstve to ths assumpton and to the startng pont (that s, the ntal guess of parameters. In general, known to converge to a local maxmum of the maxmum lkelhood functon. EM CS446 Sprng 7 7

8 Three Con Example We observe a seres of con tosses generated n the followng way: A person has three cons. Con 0: probablty of Head s a Con : probablty of Head p Con : probablty of Head q Consder the followng con-tossng scenaros: EM CS446 Sprng 7 8

9 Estmaton Problems Scenaro I: Toss one of the cons four tmes. Observng HHTH Queston: Whch con s more lkely to produce ths sequence? Scenaro II: Toss con 0. If Head toss con ; o/w toss con Observng the sequence HHHHT, THTHT, HHHHT, HHTTH produced by Con 0, Con and Con Queston: Estmate most lkely values for p, q (the probablty of H n each con) and the probablty to use each of the cons (a) Scenaro III: Toss con 0. If Head toss con ; o/w toss con Observng the sequence HHHT, HTHT, HHHT, HTTH produced by Con and/or Con Con 0 Queston: Estmate most lkely values for p, q and a There s no known analytcal soluton to ths problem (general settng). That s, t s not known how to compute the values of the parameters so as to maxmze the lkelhood of the data. EM CS446 Sprng 7 st toss nd toss nth toss 9

10 Key Intuton () If we knew whch of the data ponts (HHHT), (HTHT), (HTTH) came from Con and whch from Con, there was no problem. Recall that the smple estmaton s the ML estmaton: Assume that you toss a (p,-p) con m tmes and get k Heads m-k Tals. log[p(d p)] = log [ p k (-p) m-k ]= k log p + (m-k) log (-p) To maxmze, set the dervatve w.r.t. p equal to 0: d log P(D p)/dp = k/p (m-k)/(-p) = 0 Solvng ths for p, gves: p=k/m EM CS446 Sprng 7 0

11 Key Intuton () If we knew whch of the data ponts (HHHT), (HTHT), (HTTH) came from Con and whch from Con, there was no problem. Instead, use an teratve approach for estmatng the parameters: Guess the probablty that a gven data pont came from Con or ; Generate fctonal labels, weghted accordng to ths probablty. Now, compute the most lkely value of the parameters. [recall NB example] Compute the lkelhood of the data gven ths model. Re-estmate the ntal parameter settng: set them to maxmze the lkelhood of the data. (Labels Model Parameters) Lkelhood of the data Ths process can be terated and can be shown to converge to a local maxmum of the lkelhood functon EM CS446 Sprng 7

12 EM Algorthm (Cons) -I We wll assume (for a mnute) that we know the parameters and use t to estmate whch Con t s (Problem ) Then, we wll use ths label estmaton of the observed tosses, to estmate the most lkely parameters and so on... p,q, a Notaton: n data ponts; n each one: m tosses, h heads. What s the probablty that the th data pont came from Con? STEP (Expectaton Step): (Here h=h ) P P(Con D ) P(D Con) P(Con) P(D ) a p h h a p (p) mh (p) (a)q mh h ( q) mh EM CS446 Sprng 7

13 EM Algorthm (Cons) - II Now, we would lke to compute the lkelhood of the data, and fnd the parameters that maxmze t. We wll maxmze the log lkelhood of the data (n data ponts) LL =,n logp(d p,q,a) But, one of the varables the con s name - s hdden. We can margnalze: LL= =,n log y=0, P(D, y p,q, a) LL= =,n log y=0, P(D, y p,q, a) = = =,n log y=0, P(D p,q, a )P(y D,p,q,a) = = =,n log E_y P(D p,q, a) =,n E_y log P(D p,q, a) Where the nequalty s due to Jensen s Inequalty. We maxmze a lower bound on the Lkelhood. However, the sum s nsde the log, makng ML soluton dffcult. Snce the latent varable y s not observed, we cannot use the completedata log lkelhood. Instead, we use the expectaton of the complete-data log lkelhood under the posteror dstrbuton of the latent varable to approxmate log p(d p,q, ) We thnk of the lkelhood logp(d p,q,a ) as a random varable that depends on the value y of the con n the th toss. Therefore, nstead of maxmzng the LL we wll maxmze the expectaton of ths random varable (over the con s name). [Justfed usng Jensen s Inequalty; later & above] EM CS446 Sprng 7 3

14 EM Algorthm (Cons) - III We maxmze the expectaton of ths random varable (over the con name). E[LL] = E[ =,n log P(D p,q, a)] = =,n E[log P(D p,q, a)] = = =,n P log P(D, p,q, a)] + (-P ) log P(D, 0 p,q, a)] - P log P - (-P ) log (- P ) (Does not matter when we maxmze) Ths s due to the lnearty of the expectaton and the random varable defnton: log P(D, y p,q, a) = log P(D, p,q, a) wth Probablty P log P(D, 0 p,q, a) wth Probablty (-P ) EM CS446 Sprng 7 4

15 EM Algorthm (Cons) - IV Explctly, we get: E( log P(D p,q, a) P log P(,D p,q, a) (P )log P(0,D p,q, a) h mh h mh P log( a p (p) ) (P )log((- a) q ( q) ) P (log a hlogp (m-h )log(p)) (P )(log(- a) hlogq (m-h )log( q)) EM CS446 Sprng 7 5

16 EM CS446 Sprng 7 EM Algorthm (Cons) - V Fnally, to fnd the most lkely parameters, we maxmze the dervatves wth respect to : STEP : Maxmzaton Step (Santy check: Thnk of the weghted fctonal ponts) a, p,q n P 0 P - P d de n a a a a P m h P p 0 ) p h m - p h P ( dp de n (-P ) m h P ) ( q 0 ) q h m - q h (-P )( dq de n When computng the dervatves, notce P here s a constant; t was computed usng the current parameters n the E step 6

17 Models wth Hdden Varables EM CS446 Sprng 7 7

18 EM: General Settng The EM algorthm s a general purpose algorthm for fndng the maxmum lkelhood estmate n latent varable models. In the E-Step, we fll n the latent varables usng the posteror, and n the M-Step, we maxmze the expected complete log lkelhood wth respect to the complete posteror dstrbuton. Let D = (x,, x N ) be the observed data, and Let Z denote hdden random varables. (We are not commttng to any partcular model.) Let θ be the model parameters. Then µ * = argmax µ p(x µ) = argmax µ z p(x,z µ) = = argmax µ z [p(z µ)p(x z, µ)] Ths expresson s called the complete log lkelhood. EM CS446 Sprng 7 8

19 EM: General Settng () To derve the EM objectve functon, we re-wrte the complete log lkelhood functon by multplyng t by q(z)/q(z), where q(z) s an arbtrary dstrbuton for the random varable z. log p(x µ) = log z p(x,z µ) = log z p(z µ) p(x z,µ) q(z)/q(z) = = log E q [p(z µ) p(x z,µ) /q(z)] E q log [p(z µ) p(x z,µ) /q(z)], Where the nequalty s due to Jensen s nequalty appled to the concave functon, log. We get the objectve: Jensen s Inequalty for convex functons: E(f(x)) f(e(x)) But log s concave, so E(log(x)) log (E(x)) L(µ, q) = E q [log p(z µ)] + E q [log p(x z,µ)] - E q [log q(z)] The last component s an Entropy component; t s also possble to wrte the objectve so that t ncludes a KL dvergence (a dstance functon between dstrbutons) of q(z) and p(z x,µ). EM CS446 Sprng 7 9

20 Other q s can be chosen [Samdan & Roth0] to gve other EM algorthms. Specfcally, you can choose a q that chooses the most lkely z n the E-step, and then contnues to estmate the parameters (called Truncated EM, or Hard EM). (Thnk back to the sem-supervsed case) EM: General Settng (3) EM now contnues teratvely, as a gradent accent algorthm, where we choose q = p(z x, µ). At the t-th step, we have q (t) and µ (t). E-Step: update the posteror q, whle holdng µ (t) fxed: q (t+) = argmax q L(q, µ (t) ) = p(z x, µ (t) ). M-Step: update the model parameters to maxmze the expected complete log-lkelhood functon: µ (t+) = argmax µ L(q (t+), µ) To wrap t up, wth the rght q: L(µ, q) = E q log [p(z µ) p(x z,µ) /q(z)] = z p(z x, µ) log [p(x, z µ)/p(z x, µ)] = = z p(z x, µ) log [p(x, z µ) p(x µ)/p(z, x µ)] = = z p(z x, µ) log [p(x µ)] = log [p(x µ)] z p(z x, µ) = log [p(x µ)] So, by maxmzng the objectve functon, we are also maxmzng the log lkelhood functon. EM CS446 Sprng 7 0

21 The General EM Procedure E M EM CS446 Sprng 7

22 EM Summary (so far) EM s a general procedure for learnng n the presence of unobserved varables. We have shown how to use t n order to estmate the most lkely densty functon for a mxture of (Bernoull) dstrbutons. EM s an teratve algorthm that can be shown to converge to a local maxmum of the lkelhood functon. It depends on assumng a famly of probablty dstrbutons. In ths sense, t s a famly of algorthms. The update rules you wll derve depend on the model assumed. It has been shown to be qute useful n practce, when the assumptons made on the probablty dstrbuton are correct, but can fal otherwse. EM CS446 Sprng 7

23 EM Summary (so far) EM s a general procedure for learnng n the presence of unobserved varables. The (famly of ) probablty dstrbuton s known; the problem s to estmate ts parameters In the presence of hdden varables, we can often thnk about t as a problem of a mxture of dstrbutons the partcpatng dstrbutons are known, we need to estmate: Parameters of the dstrbutons The mxture polcy Our prevous example: Mxture of Bernoull dstrbutons EM CS446 Sprng 7 3

24 Example: K-Means Algorthm K- means s a clusterng algorthm. We are gven data ponts, known to be sampled ndependently from a mxture of k Normal dstrbutons, wth means, =, k and the same standard varaton p(x) x EM CS446 Sprng 7 4

25 Example: K-Means Algorthm Frst, notce that f we knew that all the data ponts are taken from a normal dstrbuton wth mean, fndng ts most lkely value s easy. p(x ) exp[ (x ) ] We get many data ponts, D = {x,,x m } ln(l(d )) ln(p(d )) - (x ) Maxmzng the log-lkelhood s equvalent to mnmzng: ML argmn (x ) Calculate the dervatve wth respect to, we get that the mnmal pont, that s, the most lkely mean s EM CS446 Sprng 7 m x 5

26 A mxture of Dstrbutons As n the con example, the problem s that data s sampled from a mxture of k dfferent normal dstrbutons, and we do not know, for a gven data pont x, where s t sampled from. Assume that we observe data pont x ;what s the probablty that t was sampled from the dstrbuton j? P(x j )P( j) P(x x j) Pj P( j x) k k P(x ) P(x x n n) k exp[ (x j ) ] k exp[ (x n n) ] EM CS446 Sprng 7 6

27 A Mxture of Dstrbutons As n the con example, the problem s that data s sampled from a mxture of k dfferent normal dstrbutons, and we do not know, for a gven each data pont x, where s t sampled from. For a data pont x, defne k bnary hdden varables, z,z,,z k, s.t z j = ff x s sampled from the j-th dstrbuton. E[z j ] P(x 0 P(x was sampled from ) was not sampled from ) P EM CS446 Sprng 7 7 j j E[Y] y P(Y y j y ) E[X Y] E[X] E[Y]

28 Example: K-Means Algorthms,,,..., Expectaton: (here: h = k ) p(y h) p(x,z,..., zk h) exp[ j z j (x ) j ] Computng the lkelhood gven the observed data D = {x,,x m } and the hypothess h (w/o the constant coeffcent) m ln(p(y h)) - z (x ) j j j [ m E[ln(P(Y h))] E - z ] j j(x j) m - E[z ](x ) j j j EM CS446 Sprng 7 8

29 Example: K-Means Algorthms Maxmzaton: Maxmzng m Q(h h') - E[z ](x ) j j j wth respect to we get that: Whch yelds: j dq d j j m C E[z ](x ) m m E[z j j E[z ]x j ] j 0 EM CS446 Sprng 7 9

30 Summary: K-Means Algorthms Gven a set D = {x,,x m } of data ponts, guess ntal parameters,,,..., k Compute (for all,j) exp[ (x p j ) ] j E[zj] k exp[ (x n n ) and a new set of means: m E[z ]x j j m E[z ] j repeat to convergence Notce that ths algorthm wll fnd the best k means n the sense of mnmzng the sum of square dstance. ] EM CS446 Sprng 7 30

31 Summary: EM EM s a general procedure for learnng n the presence of unobserved varables. We have shown how to use t n order to estmate the most lkely densty functon for a mxture of probablty dstrbutons. EM s an teratve algorthm that can be shown to converge to a local maxmum of the lkelhood functon. Thus, mght requres many restarts. It depends on assumng a famly of probablty dstrbutons. It has been shown to be qute useful n practce, when the assumptons made on the probablty dstrbuton are correct, but can fal otherwse. As examples, we have derved an mportant clusterng algorthm, the k-means algorthm and have shown how to use t n order to estmate the most lkely densty functon for a mxture of probablty dstrbutons. EM CS446 Sprng 7 3

32 More Thoughts about EM Tranng: a sample of data ponts, (x 0, x,, x n ) {0,} n+ Task: predct the value of x 0, gven assgnments to all n varables. EM CS446 Sprng 7 3

33 z z P z More Thoughts about EM Assume that a set x {0,} n+ of data ponts s generated as follows: Postulate a hdden varable Z, wth k values, z k wth probablty z,,k z = Havng randomly chosen a value z for the hdden varable, we choose the value x for each observable X to be wth probablty p z and 0 otherwse, [ = 0,,,.n] Tranng: a sample of data ponts, (x 0, x,, x n ) {0,} n+ Task: predct the value of x 0, gven assgnments to all n varables. EM CS446 Sprng 7 33

34 z z P z More Thoughts about EM Two optons: Parametrc: estmate the model usng EM. Once a model s known, use t to make predctons. Problem: Cannot use EM drectly wthout an addtonal assumpton on the way data s generated. Non-Parametrc: Learn x 0 drectly as a functon of the other varables. Problem: whch functon to try and learn? x 0 turns out to be a lnear functon of the other varables, when k= (what does t mean)? When Another k mportant s known, dstncton the EM to attend approach to s the fact performs that, once you well; f an estmated ncorrect all the value parameters s assumed wth EM, you the can answer estmaton many predcton fals; the problems e.g., p(x lnear methods 0, x performs 7,,x 8 x, x,, x better n ) whle wth Perceptron (say) [Grove & Roth 00] you need to learn separate models for each predcton problem. EM CS446 Sprng 7 34

35 EM EM CS446 Sprng 7 35

36 The EM Algorthm Algorthm: Guess ntal values for the hypothess h=,,..., Expectaton: Calculate Q(h,h) = E(Log P(Y h ) h, X) usng the current hypothess h and the observed data X., Maxmzaton: Replace the current hypothess h by h, that maxmzes the Q functon (the lkelhood functon) set h = h, such that Q(h,h) s maxmal Repeat: Estmate the Expectaton agan. k EM CS446 Sprng 7 36

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Machine learning: Density estimation

Machine learning: Density estimation CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng

More information

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Clustering gene expression data & the EM algorithm

Clustering gene expression data & the EM algorithm CG, Fall 2011-12 Clusterng gene expresson data & the EM algorthm CG 08 Ron Shamr 1 How Gene Expresson Data Looks Entres of the Raw Data matrx: Rato values Absolute values Row = gene s expresson pattern

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above The conjugate pror to a Bernoull s A) Bernoull B) Gaussan C) Beta D) none of the above The conjugate pror to a Gaussan s A) Bernoull B) Gaussan C) Beta D) none of the above MAP estmates A) argmax θ p(θ

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

9 : Learning Partially Observed GM : EM Algorithm

9 : Learning Partially Observed GM : EM Algorithm 10-708: Probablstc Graphcal Models 10-708, Sprng 2012 9 : Learnng Partally Observed GM : EM Algorthm Lecturer: Erc P. Xng Scrbes: Mrnmaya Sachan, Phan Gadde, Vswanathan Srpradha 1 Introducton So far n

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

CS286r Assign One. Answer Key

CS286r Assign One. Answer Key CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

1 The Mistake Bound Model

1 The Mistake Bound Model 5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore 8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

1 Matrix representations of canonical matrices

1 Matrix representations of canonical matrices 1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

The big picture. Outline

The big picture. Outline The bg pcture Vncent Claveau IRISA - CNRS, sldes from E. Kjak INSA Rennes Notatons classes: C = {ω = 1,.., C} tranng set S of sze m, composed of m ponts (x, ω ) per class ω representaton space: R d (=

More information

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013 COS 511: heoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 15 Scrbe: Jemng Mao Aprl 1, 013 1 Bref revew 1.1 Learnng wth expert advce Last tme, we started to talk about learnng wth expert advce.

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Expected Value and Variance

Expected Value and Variance MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecture Sldes for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydn@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/2ml3e CHAPTER 3: BAYESIAN DECISION THEORY Probablty

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Clustering & Unsupervised Learning

Clustering & Unsupervised Learning Clusterng & Unsupervsed Learnng Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 2012 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Clustering with Gaussian Mixtures

Clustering with Gaussian Mixtures Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Lecture Nov

Lecture Nov Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

The Basic Idea of EM

The Basic Idea of EM The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan

More information

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential Open Systems: Chemcal Potental and Partal Molar Quanttes Chemcal Potental For closed systems, we have derved the followng relatonshps: du = TdS pdv dh = TdS + Vdp da = SdT pdv dg = VdP SdT For open systems,

More information

Representing arbitrary probability distributions Inference. Exact inference; Approximate inference

Representing arbitrary probability distributions Inference. Exact inference; Approximate inference Bayesan Learnng So far What does t mean to be Bayesan? Naïve Bayes Independence assumptons EM Algorthm Learnng wth hdden varables Today: Representng arbtrary probablty dstrbutons Inference Exact nference;

More information

Vapnik-Chervonenkis theory

Vapnik-Chervonenkis theory Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown

More information

} Often, when learning, we deal with uncertainty:

} Often, when learning, we deal with uncertainty: Uncertanty and Learnng } Often, when learnng, we deal wth uncertanty: } Incomplete data sets, wth mssng nformaton } Nosy data sets, wth unrelable nformaton } Stochastcty: causes and effects related non-determnstcally

More information

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Probabilistic Classification: Bayes Classifiers. Lecture 6: Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.

More information

CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning

CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning CS9 Problem Set #3 Solutons CS 9, Publc Course Problem Set #3 Solutons: Learnng Theory and Unsupervsed Learnng. Unform convergence and Model Selecton In ths problem, we wll prove a bound on the error of

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

Laboratory 3: Method of Least Squares

Laboratory 3: Method of Least Squares Laboratory 3: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly they are correlated wth

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

CHAPTER 3: BAYESIAN DECISION THEORY

CHAPTER 3: BAYESIAN DECISION THEORY HATER 3: BAYESIAN DEISION THEORY Decson mang under uncertanty 3 Data comes from a process that s completely not nown The lac of nowledge can be compensated by modelng t as a random process May be the underlyng

More information

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1 MATH 5707 HOMEWORK 4 SOLUTIONS CİHAN BAHRAN 1. Let v 1,..., v n R m, all lengths v are not larger than 1. Let p 1,..., p n [0, 1] be arbtrary and set w = p 1 v 1 + + p n v n. Then there exst ε 1,..., ε

More information

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models I529: Machne Learnng n Bonformatcs (Sprng 217) Markov Models Yuzhen Ye School of Informatcs and Computng Indana Unversty, Bloomngton Sprng 217 Outlne Smple model (frequency & profle) revew Markov chan

More information

Clustering & (Ken Kreutz-Delgado) UCSD

Clustering & (Ken Kreutz-Delgado) UCSD Clusterng & Unsupervsed Learnng Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y ), fnd an approxmatng

More information

Laboratory 1c: Method of Least Squares

Laboratory 1c: Method of Least Squares Lab 1c, Least Squares Laboratory 1c: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly

More information