1/10/18. Definitions. Probabilistic models. Why probabilistic models. Example: a fair 6-sided dice. Probability

Size: px

Start display at page:

Download "1/10/18. Definitions. Probabilistic models. Why probabilistic models. Example: a fair 6-sided dice. Probability"

Adele Thompson
5 years ago
Views:

/0/8 I529: Machne Learnng n Bonformatcs Defntons Probablstc models Probablstc models A model means a system that smulates the object under consderaton A probablstc model s one that produces dfferent

1 /0/8 I529: Machne Learnng n Bonformatcs Defntons Probablstc models Probablstc models A model means a system that smulates the object under consderaton A probablstc model s one that produces dfferent outcomes wth dfferent probabltes (BSA) Yuzhen Ye School of Informatcs, Computng and Computng Indana Unversty, Bloomngton Sprng 208 Fgure. The Organzaton of the ENCODE Consortum. Why probablstc models The bologcal system beng analyzed s stochastc Or nosy Or completely determnstc, but because a number of hdden varables effectng ts behavor are unknown, the observed data mght be best explaned wth a probablstc model The ENCODE Project Consortum (20) A User's Gude to the Encyclopeda of DNA Elements (ENCODE). PLoS Bol 9(4): e do:0.37/journal.pbo Probablty Example: a far 6-sded dce Experment: a procedure nvolvng chance that leads to dfferent results Outcome: the result of a sngle tral of an experment Outcome: The possble outcomes of ths experment are, 2, 3, 4, 5 and 6 Events: ; 6; even Probablty: outcomes are equally lkely to occur Event: one or more outcomes of an experment Probablty: the measure of how lkely an event s Between 0 (wll not occur) and (wll occur) P(A) The Number Of Ways Event A Can Occur / The Total Number Of Possble Outcomes P()P(6)/6; P(even)3/6/2;

2 Random varable Random varables Y are functons that assgn a unque number to each possble outcome of an experment An example Experment: tossng a con Outcome space: {heads, tals} f heads X 0 f tals More exactly, X s a dscrete random varable P(X)/2, P(X0)/2 Probablty dstrbuton Probablty dstrbuton: the assgnment of a probablty P(x) to each outcome x. A far dce: outcomes are equally lkely to occur à the probablty dstrbuton over the all sx outcomes P(x)/6, x,2,3,4,5 or 6. A loaded dce: outcomes are unequally lkely to occur à the probablty dstrbuton over the all sx outcomes P(x)f(x), x,2,3,4,5 or 6, but åf(x). Probablty mass functon (pmf) A probablty mass functon Y s a functon that gves the probablty that a dscrete random varable s exactly equal to some value; t s often the prmary means of defnng a dscrete probablty dstrbuton An example 8 < /2 heads P (X) /2 tals : 0 others Probablty densty functon (pdf) 8 8 < < Probablty densty functons (pdf) are for : : contnuous rather than dscrete random varables; f(x) : A pdf must be ntegrated over an nterval to yeld a probablty, snce P (X x) 0 b P (a apple X apple b) f(x)dx a apple apple Cumulatve dstrbuton functon (cdf) x P (X apple x) f(t)d(t) Jont probablty Two experments (random varables) X and Y P(X,Y) à jont probablty (dstrbuton) of X and Y P(X,Y)P(X Y)P(Y)P(Y X)P(X) P(X Y)P(X), X and Y are ndependent Example: experment (selectng a dce), experment 2 (rollng the selected dce) P(y): yd or D2 P(, D)P( D)P(D) P( D)P( D2), ndependent events The probablty of a DNA sequence Event: Observng a DNA sequence Sss2 sn: s Î {A,C,G,T}; Random sequence model (or Independent and dentcally-dstrbuted,..d. model): s occurs at random wth the probablty P(s), ndependent of all other resdues n the sequence; n P(S) P( s ) Õ Ths model wll be used as a background model (or called a null hypothess). 2

3 Margnal probablty The dstrbuton of the margnal varables (the margnal dstrbuton) s obtaned by margnalzng over the dstrbuton of the varables beng dscarded (so the dscarded varables are margnalzed out) Margnalzng means consderng all possble values the unknown varables may take, and averagng over them P(X)å Y P(X Y)P(Y) P (x) P (x, y)dy Example: experment (selectng a dce), experment 2 (rollng the selected dce) P(y): yd or D2 P() P( D)P(D)+P( D2)P(D2) P( D)P( D2), ndependent events P() P( D)(P(D)+P(D2)) P( D) Condtonal probablty Condtonng the jont dstrbuton on a partcular observaton Condtonal probablty P(X Y): the measure of how lkely an event X happens under the condton Y; P (x y) Example: two dces D, D2 P( D) àprobablty for pckng usng dce D P( D2) àprobablty for pckng usng dce D2 P (x, y) P (y) P (x, y) R P (x, y)dy Probablty models A system that produces dfferent outcomes wth dfferent probabltes. It can smulate a class of objects (events), assgnng each an assocated probablty. Typcal probablty dstrbutons Bnomal dstrbuton Gaussan dstrbuton Multnomal dstrbuton Posson dstrbuton Drchlet dstrbuton Smple objects (processes) à probablty dstrbutons Bnomal dstrbuton An experment wth bnary outcomes: 0 or ; Probablty dstrbuton of a sngle experment: P( )p and P( 0 ) -p; Probablty dstrbuton of N tres of the same experment N N -k æ ö k ç p ( - p) B(k s out of N tres) ~ èk ø Gaussan dstrbuton When N ->, B -> Gaussan dstrbuton The Gaussan (normal) dstrbuton s a contnuous probablty dstrbuton wth probablty densty functon defned as: f(x; µ, 2 ) p 2 e 2 ( x µ ) 2 μ: mean (expectaton); σ 2 : varance (σ: the standard dervaton) If we defne a new varable u(x-μ)/σ f(x) p e u2 /2 2 3

4 Gaussan dstrbuton Multnomal dstrbuton Fgure from Wkpeda standard normal dstrbuton when μ 0 and σ 2 An experment wth K ndependent outcomes wth probabltes q,,,k, åq. Probablty dstrbuton of N tres of the same experment, gettng n occurrences of outcome, ån N (n{n }). KY P (n ) M (n) M(n) n!n 2! n K! ( P k n k)! n Q n! ( P k n k)! Example: a far dce Probablty: outcomes (,2,,6) are equally lkely to occur Probablty of rollng dozen tmes (2) and gettng each outcome twce: 2! ~ ( ) Example: a loaded dce Probablty: outcomes (,2,,6) are unequally lkely to occur: P(6)0.5, P()P(2) P(5)0. Probablty of rollng dozen tmes (2) and gettng each outcome twce: 2! ( 0.5) 2 ( 0.) 0 2 ~ Posson dstrbuton Posson gves the probablty of seeng n events over some nterval, when there s a probablty p of an ndvdual event occurrng n that perod. Posson dstrbuton for sequencng coverage modelng C Assumng unform dstrbuton of reads: Length of genomc segment: L Number of reads: n Coverage l n l / L Length of each read: l How much coverage s enough (or what s suffcent oversamplng)? Lander-Waterman model: P(x) (l x * e -l ) / x! P(x0) e -l where l s coverage 4

5 Posson dstrbuton Drchlet dstrbuton Y Outcomes: q(q, q2,, qk) X KY KX Densty: D( ) ( ) ( ) Y K KX Q ( ) ( ) ( )d ( P ) (a, a2,, ak) are constants à dfferent a gves dfferent probablty dstrbuton over q. K2 à Beta dstrbuton Example: dce factores Dce factores produce all knds of dces: q(), q(2),, q(6) A dce factory dstngush tself from the others by parameters a(a,a2,a3, a4, a5, a6) The probablty of producng a dce q n the factory a s determned by D(q a) Probablstc model Selectng a model A model can be anythng from a smple dstrbuton to a complex stochastc grammar wth many mplct probablty dstrbutons Probablstc dstrbutons (Gaussan, bnomnal, etc) Probablstc graphcal models Markov models Hdden Markov models (HMM) Bayesan models Stochastc grammars Data à model (learnng) The parameters of the model have to be nferred from the data MLE (maxmum lkelhood estmaton) & MAP (maxmum a posteror probablty) Model à data (nference/samplng) MLE Estmatng the model parameters (learnng): from large sets of trusted examples Gven a set of data D (tranng set), fnd a model wth parameters q wth the maxmal lkelhood P(D q) ˆ MLE arg max P (D ) Example: a loaded dce Loaded dce: to estmate parameters q, q 2,, q 6, based on N observatons Dd,d 2, d N q n / N, where n s the occurrence of outcome (observed frequences), s the maxmum lkelhood soluton (BSA.5) P (n MLE ) >P(n ) for any 6 MLE Learnng from counts 5

6 When to use MLE A drawback of MLE s that t can gve poor estmatons when the data are scarce E.g, f you flp con twce, you may only get heads, then P(tal) 0 It may be wser to apply pror knowledge (e.g, we assume P(tal) s close to 0.5) Use MAP nstead MAP Bayesan statstcs P ( D) P (D )P () P (D) P (D )P () P (D )P () MAP P P(q) à pror probablty P(q D) à posteror probablty P(D/q) àlkelhood ˆ MAP arg max P ( D) P (D )P () arg max P (D) arg max P (D )P () Example: two de Pror probabltes: far dce 0.99; loaded dce: 0.0; Loaded dce: P(6)0.5, P() P(5)0. Data: 3 consecutve 6 es: P(loaded 3 6 s)p(loaded)*[p(3 6 s loaded)/p(3 6 s)] 0.0*(0.5 3 / C) P(far 3 6 s)p(far)*[p(3 6 s far)/p(3 6 s)] 0.99 * ((/6) 3 / C) Model comparson by usng lkelhood rato: P(loaded 3 6 s) / P(far 3 6 s) < So far dce s more lkely to generate the observaton. Learnng from counts: ncludng pror Use pror knowledge when the data s scarce Use Drchlet dstrbuton as pror for the multnomal dstrbuton: Posteror P ( n) P (n )P () P (n) P (n )D( ) P (n) Posteror mean estmator (PME) Y PME D( n + )d (n + ) nk+ k k d PME n + N + A yesan Equvalent statstcs): to add a as pseudo-counts to the observaton n (BSA.5) (Add-one smoothng; Laplace estmator) We can forget about statstcs and use pseudo-counts n the parameter estmaton! k Samplng Probablstc model wth parameter q à P(x q) for event x; Samplng: generate a large set of events x wth probablty P(x q); Random number generator ( functon rand() pcks a number randomly from the nterval [0,) wth the unform densty; Samplng from a probablstc model à transformng P(x q) to a unform dstrbuton For a fnte set X (xîx), fnd s.t. P(x)+ +P(x-) < rand(0,) < P(x)+ +P(x-) + P(x) Entropy Probabltes dstrbutons P(x ) over K events H(x)-å P(x ) log P(x ) Maxmzed for unform dstrbuton P(x )/K A measure of average uncertanty A sample applcaton of entropy n bonformatcs: as a measurement for conservaton 6

7 Mutual nformaton Measure of ndependence of two random varable X and Y P(X Y)P(X), X and Y are ndependent à P(X,Y)/P(X)P(Y) M(X;Y)åx,y P(x,y)log[P(x,y)/P(x)P(y)] 0 à ndependent A sample applcaton of mutual nformaton: Correlaton between two resdues Applcaton n RNA structure predcton BRCA and BRCA2 A lttle background BRCA and BRCA2 are human genes that produce tumor suppressor protens. Specfc nherted mutatons n BRCA and BRCA2 ncrease the rsk of female breast and ovaran cancers, and they have been assocated wth ncreased rsks of several addtonal types of cancer. Together, BRCA and BRCA2 mutatons account for about 20 to 25 percent of heredtary breast cancers and about 5 to 0 percent of all breast cancers. A smple calculaton A rare mutaton n an mportant gene s observed n only 2% of the populaton. A person that carres ths mutaton n hs/her genome has 90% chance of developng a dsease. On the other hand, a person that has a normal gene (wthout mutaton) only has a 5% chance of developng ths dsease. Queston: If you tested havng ths dsease, what's your chance of carryng ths rare mutaton? 7

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before