Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models

Size: px
Start display at page:

Download "Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models"

Transcription

1 Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models Genome 570 February, 2012 Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.1/63

2 Restriction sites and microsatellite loci Note owing to time pressure, we are going to skip the restriction sites and microsatellites material in the lectures. The slides are still included here. Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.2/63

3 Restriction sites GGGGAATCCCAGTCAGGTTCCGTTAGTGAATCCGTTACCCGTAAT GGGCAATCCCAGTCAGGTTCCGTTAGTGAATCCGTTACCCGTAAT GGGCATTCCCAGTCAGGTTCCGTTAGTGAA CCCGTTACCCGTAAT Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.3/63

4 Number of sequences in a restriction site location N k = ( ) r 3 k. k k number Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.4/63

5 Transition probabilities for restriction sites, aggregated P kl = P kl = b m=a ( r k ) l k+m min[k,r l] m=max[0,k l] p l k+m (1 p) r k (l k+m) ( k m) ( p 3 ) m ( ) 1 p k m 3 ( r k ) l k+m p l k+m (1 p) r l+m ( k m)( p 3 ) m ( ) 1 p k m 3 Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.5/63

6 Presence/absence of restriction sites π 0 = ( 1 4 ) r P ++ = π 0 (1 p) r P ++ = P + = P + = ( ) ] r r 1 [1 + 3e 4 3 t 4 4 ( ) ( r [ 1 + 3e 4 3 t 4 ] r ) P = 1 ( ) ( r [ 1 + 3e 4 3 t 4 ] r ) Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.6/63

7 Conditioning on the site not being entirely absent Prob (+ + not ) = P ++ / (P ++ + P + + P + ) Prob ( + not ) = P + / (P ++ + P + + P + ) Prob (+ not ) = P + / (P ++ + P + + P + ) Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.7/63

8 Likelihoods for two sequences with restriction sites L = ( P ++ ) n++ ( ) n+ +n P + + P ++ + P + + P + P ++ + P + + P + is of the form L = x n ++ ( ) n+ +n 1 + (1 x) 2 where x = P ++ P ++ + P + + P + Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.8/63

9 Maximizing the likelihood The maximum likelihood is when x = n ++ n ++ + n + + n + using the equations for the P s and for x ( 1 + 3e 4 3 t 4 ) r = which can be solved for t to give the distance D = ˆt = 3 4 ln 4 3 [ n ++ n (n + + n + ) n ++ n (n + + n + ) ] 1/r 1 3 Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.9/63

10 Probability ancestor is + Are shared restriction sites parallelisms? restriction site (6 bases) t + t + + two state 0 1 model t Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.10/63

11 A one-step microsatellite model Computing the probability that there are a net of i steps by having i + k steps up and k steps down, Prob (there are i + 2k steps) = e µt (µt) i+2k /(i + 2k)! Prob (i + k of these are steps upwards) = ( ) ( i+2k 1 i+k ( 1 ) k i+k 2) 2 = ( ) ( i+2k 1 ) i+2k i+k 2... we put these together to and sum over k to get the result... Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.11/63

12 Transition probabilities in the microsatellite model Transition probability of i steps net change in branch of length µt is then p i (µt) = k=0 = e µt e µt (µt) ( i+2k i+2k ) ( 1 ) i+2k (i+2k)! i+k 2 k=0 ( µt ) i+2k 1 2 (i+k)!k! Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.12/63

13 Why the (δµ) 2 distance works approximately 2N generations e t generations approximately 2N generations e Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.13/63

14 (δµ) 2 distance for microsatellites (δµ) 2 = (µ A µ B ) 2 Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.14/63

15 Transition probabilities in a one-step model µ t = 1 µ t = change in number of repeats Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.15/63

16 A Brownian motion approximation Prob (m + k m) min [ 1, ( 1 exp 1 µt 2π 2 k 2 )] µt Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.16/63

17 Is it accurate? well... Probability Branch Length 3 Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.17/63

18 Likelihood ratios: the odds-ratio form of Bayes theorem Prob (H 1 D) Prob (H 2 D) = Prob (D H 1) Prob (D H 2 ) Prob (H 1 ) Prob (H 2 ) } {{ } } {{ } }{{} posterior likelihood prior odds ratio ratio odds ratio Given prior odds, we can use the data to compute the posterior odds. Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.18/63

19 With many sites the likelihood wins out Independence of the evolution in the different sites implies that: Prob (D H i ) = Prob (D (1) H i ) Prob (D (2) H i )... Prob (D (n) H i ) so we can rewrite the odds-ratio formula as Prob (H 1 D) Prob (H 2 D) = ( n i=1 ) Prob (D (i) H 1 ) Prob (D (i) H 2 ) Prob (H 1 ) Prob (H 2 ) This implies that as n gets large the likelihood-ratio part will dominate. Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.19/63

20 An example coin tossing If we toss a coin 11 times and gethhtththhttt, the likelihood is: L = Prob (D p) = p p (1 p) (1 p) p (1 p) p p (1 p) (1 p) (1 p) = p 5 (1 p) 6 Solving for the maximum likelihood estimate for p by finding the maximum: and equating it to zero and solving: gives p = 5/11 dl dp = 5p4 (1 p) 6 6p 5 (1 p) 5 dl dp = p4 (1 p) 5 (5(1 p) 6p) = 0 Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.20/63

21 Likelihood as function of p for the 11 coin tosses Likelihood p Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.21/63

22 Maximizing the likelihood Maximizing the likelihood is the same as maximizing the log likelihood, because its log increases as the number increases, so: and ln L = 5 lnp + 6 ln(1 p), d(ln L) dp = 5 p 6 (1 p) = 0, so that, again, p = 5/11 Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.22/63

23 An example with one site A C C G t 1 t 2 y C t 3 t 4 t 5 w t t 6 z 7 x t 8 We want to compute the probability of these data at the tips of the tree, given the tree and the branch lengths. Each site has an independent outcome, all on the same tree. Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.23/63

24 Likelihood sums over states of interior nodes The likelihood is the product over sites (as they are independent given the tree): L = Prob (D T) = m Prob ( ) D (i) T i=1 For site i, summing over all possible states at interior nodes of the tree: ( ) Prob D (i) T = Prob (A, C, C, C, G, x, y, z, w T) x y z w Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.24/63

25 ... the products over events in the branches By independence of events in different branches (conditional only on their starting states): Prob (A, C, C, C, G, x, y, z, w T) = Prob (x) Prob (y x, t 6 ) Prob (A y, t 1 ) Prob (C y, t 2 ) Prob (z x, t 8 ) Prob (C z, t 3 ) Prob (w z, t 7 ) Prob (C w, t 4 ) Prob (G w, t 5 ) Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.25/63

26 The result looks hard to compute Summing over all x, y, z, and w (each taking values A, G, C, T): Prob ( D (i) T ) = x y z w Prob (x) Prob (y x, t 6 ) Prob (A y, t 1 ) Prob (C y, t 2 ) Prob (z x, t 8 ) Prob (C z, t 3 ) Prob (w z, t 7 ) Prob (C w, t 4 ) Prob (G w, t 5 ) This could be hard to do on a larger tree. For example, with 20 species there are 19 interior nodes and thus the number of outcomes for interior nodes is 4 19 = 274, 877, 906, 944. Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.26/63

27 ... but there s a trick... We can move summations in as far as possible: Prob ( D (i) T ) = x Prob (x) ( y ) Prob (y x, t 6 ) Prob (A y, t 1 ) Prob (C y, t 2 ) ( w ( Prob (z x, t 8 ) Prob (C z, t 3 ) z )) Prob (w z, t 7 ) Prob (C w, t 4 ) Prob (G w, t 5 ) The pattern of parentheses parallels the structure of the tree: (A, C)(C, (C, G)) Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.27/63

28 Conditional likelihoods in this calculation Working from innermost parentheses outwards is the same as working down the tree. We can define a quantity L (i) j (s) the conditional likelihood at site i of everything at or above point j in the tree, given that point j have state s One such is the term: L 7 (w) = Prob (C w, t 4 ) Prob (G w, t 5 ) Another is the term including that: ( ) L 8 (z) = Prob (C z, t 3 ) Prob (w z, t 7 ) Prob (C w, t 4 ) Prob (G w, t 5 ) w Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.28/63

29 The pruning algorithm This follows the recursion down the tree: L (i) k (s) = P x Prob (x s,t l ) L (i) l (x) «P y Prob (y s,t m )L (i) m (y)! At a tip the quantity is easily seen to be like this (if the tip is in state A): L (i) (A), L (i) (C), L (i) (G), L (i) (T) = (1,0,0, 0) At the bottom we have a weighted sum over all states, weighted by their prior probabilities: L (i) = X x π x L (i) 0 (x) We can do that because, if evolution has gone on for a long time before the root of the tree, the probabilities of bases there are just the equilibrium probabilities under the DNA model (or whatever model we assume). Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.29/63

30 Handling ambiguity and error If a base is unknown: use (1, 1, 1, 1). If known only to be a purine: (1, 0, 1, 0) Note donot do something like this: (0.5, 0, 0.5, 0). It isnot the probability ofbeing that base but the probability of the observationgiven that base. If there is sequencing error use something like this: (1 ε, ε/3, ε/3, ε/3) (assuming an error is equally likely to be each of the three other bases). Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.30/63

31 Rerooting the tree before after y 6 8 z 8 z t 6 x 0 t 8 y 6 x 0 t 6 t 8 Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.31/63

32 Unrootedness L (i) = x y z Prob (x) Prob (y x, t 6 ) Prob (z x, t 8 ) Reversibility of the Markov process guarantees that Prob (x) Prob (y x, t 6 ) = Prob (y) Prob (x y, t 6 ). Substituting that in: L (i) = y x y Prob (y) Prob (x y, t 6 ) Prob (z x, t 8 )... which means that (if the model of change is a reversible one) the likelihood does not depend on where the root is placed in the unrooted tree. Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.32/63

33 To compute likelihood when the root is in one branch prune in this subtree First we get (for the site) the conditional likelihoods for the left subtree... Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.33/63

34 To compute likelihood when the root is in one branch prune in this subtree... then we get the conditional likelihoods for the right subtree Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.34/63

35 To compute likelihood when the root is in one branch prune to bottom and finally we get the conditional likelihoods for the root, and from that the likelhoods for the site. (Note that it does not matter where in that branch we place the root, as long as the sum of the branch lengths on either side of the root is 0.236). Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.35/63

36 To do it with a different branch length... prune to bottom we can just do the last step, but now with the new branch length. The other conditional likelihoods were already calculated, have not changed, and so we can re-use them. This really speeds things up for calculating how the likelihood depends on the length of one branch just put the root there! Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.36/63

37 A data example: 7-species primates mtdna Chimp Human Orang Gorilla Gibbon ln L = Mouse Bovine These are noncoding or synonymous sites from the D-loop region (and some adjacent coding regions) of mitochondrial DNA, selected by Masami Hasegawa from sequences done by S. Hayashi and coworkers in Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.37/63

38 Rate variation among sites Evolution is independent once each site has had its rate specified Prob (D T, r 1, r 2,...,r p ) = p i=1 Prob (D (i) T, r i ). Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.38/63

39 Coping with uncertainty about rates Using a Gamma distribution independently assigning rates to sites: L = m [ i=1 0 ] f(r; α) L (i) (r) dr Unfortunately this is hard to compute on a tree with more than a few species. Yang (1994a) approximated this by a discrete histogram of rates: L (i) = 0 f(r; α) L (i) (r) dr k w k L (i) (r k ) j=1 Felsenstein (J. Mol. Evol., 2001) has suggested using Gauss-Laguerre quadrature to choose the rates r i and the weights w i. Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.39/63

40 Hidden Markov Models These are the most widely used models allowing rate variation to be correlated along the sequence. We assume: There are a finite number of rates, m. Rate i is r i. There are probabilities p i of a site having rate i. A process not visible to us ( hidden") assigns rates to sites. It is a Markov process working along the sequence. For example it might have transition probability Prob (j i) of changing to rate j in the next site, given that it is at rate i in this site. The probability of our seeing some data are to be obtained by summing over all possible combinations of rates, weighting appropriately by their probabilities of occurrence. Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.40/63

41 Likelihood with a[n] HMM Suppose that we have a way of calculating, for each possible rate at each possible site, the probability of the data at that site (i) given that rate (r j ). This is ) Prob (D (i) r j This can be done because the probabilities of change as a function of the rate r and time t are (in almost all models) just functions of their product rt, so a site that has twice the rate is just like a site that has branches twice as long. To get the overall probability of all data, sum over all possible paths through the array of sites rates, weighting each combination of rates by its probability: Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.41/63

42 Hidden Markov Model of rate variation Sites Phylogeny C C C C A A G G A A C T A A G G A G A T A A A A G C C C C C G G G G G A A G G C... Hidden Markov Process Rates of evolution Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.42/63

43 Hidden Markov Models If there are a number of hidden rate states, with state i having rate r i Prob (D T) = Prob (r i1, r i2,...r ip ) i 1 i 2 i p Prob (D T, r i1, r i2,...r im ) Evolution is independent once each site has had its rate specified Prob (D T, r 1, r 2,...,r p ) = p i=1 Prob (D (i) T, r i ). Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.43/63

44 Seems impossible... To compute the likelihood we sum over all ways rate states could be assigned to sites: L = Prob (D T) = m m m i 1 =1 i 2 =1 i p =1 Prob ( r i1, r i2,...,r ip ) Prob ( D (1) r i1 ) Prob ( D (2) r i2 )...Prob ( D (n) r ip ) Problem: The number of rate combinations is very large. With 100 sites and 3 rates at each, it is This makes the summation impractical. Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.44/63

45 Factorization and the algorithm Fortunately, the terms can be reordered: L = Prob (D T) = m i 1 =1 m i 2 =1... m i p =1 Prob (i 1 ) Prob ( D (1) r i1 ) Prob (i 2 i 1 ) Prob ( D (2) r i2 ) Prob (i 3 i 2 ) Prob ( D (3) r i3 ). Prob (i p i p 1 ) Prob ( D (p) r ip ) Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.45/63

46 Using Horner s Rule and the summations can be moved each as far rightwards as it can go: L = m i 1 =1 m i 2 =1 m i 3 =1. Prob (i 1 ) Prob ( ) D (1) r i1 Prob (i 2 i 1 ) Prob ( ) D (2) r i2 Prob (i 3 i 2 ) Prob ( D (3) r i3 ) m i p =1 Prob (i p i p 1 ) Prob ( D (p) r ip ) Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.46/63

47 Recursive calculation of HMM likelihoods The summations can be evaluated innermost-outwards. The same summations appear in multiple terms. We can then evaluate them only once. A huge saving results. The result is this algorithm: Define P i (j) as the probability of everything at or to the right of site i, given that site i has the j-th rate. Now we can immediately see for the last site that for each possible rate category i p ) P p (i p ) = Prob (D (p) r ip (as at or to the right of" simply means at" for that site). Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.47/63

48 Recursive calculation More generally, for site l < p and its rates i l P l (i l ) = Prob (D (l) r il ) m i l =1 Prob (i l+1 i l ) P l+1 (i l+1 ) We can compute the P s recursively using this, starting with the last site and moving leftwards down the sequence. Finally we have the P 1 (i 1 ) for all m states. These are simply weighted by the equilibrium probabilities of the Markov chain of rate categories: L = Prob (D T) = m i 1 =1 Prob (i 1 ) P 1 (i 1 ) An entirely similar calculation can be done from left to right, remembering that the transition probabilities Prob (i k i k+1 ) would be different in that case. Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.48/63

49 All paths through the array array of conditional probabilities of everything at or to the right of that site, given the state at that site Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.49/63

50 Starting and finishing the calculation At the end, at site m: Prob (D [m] T, r im ) = Prob (D (m) T, r im ) and once we get to site 1, we need only use the prior probabilities of the rates r i to get a weighted sum: Prob (D T) = i 1 π i1 Prob (D [1] T, r i1 ) Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.50/63

51 The pruning algorithm is just like species 1 species 2 different bases ancestor ancestor Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.51/63

52 the forward algorithm site 22 different rates site 21 site 20 Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.52/63

53 Paths from one state in one site Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.53/63

54 Paths from another state in that site Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.54/63

55 Paths from a third state Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.55/63

56 We can also use the backward algorithm you can also do it the other way Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.56/63

57 Using both we can find likelihood contributed by a state the "forward backward" algorithm allows us to get the probability of everything given one site s state Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.57/63

58 A particular Markov process on rates There are many possible Markov processes that could be used in the HMM rates problem. I have used: Prob (r i r j ) = (1 λ) δ ij + λ π i Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.58/63

59 A numerical example. Cytochrome B We analyze 31 cytochrome B sequences, aligned by Naoko Takezaki, using the Proml protein maximum likelihood program. Assume a Hidden Markov Model with 3 states, rates: and expected block length 3. category rate probability We get a reasonable, but not perfect, tree with the best rate combination inferred to be Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.59/63

60 Phylogeny for Takezaki cytochrome B whalebm whalebp borang sorang hseal gseal gibbon gorilla2 bovine gorilla1 cchimp cat rhinocer pchimp platypus dhorse horse african caucasian wallaroo rat opossum mouse chicken seaurchin2 xenopus loach carp seaurchin1 trout lamprey Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.60/63

61 Rates inferred from Cytochrome B african M-----TPMR KINPLMKLIN HSFIDLPTPS NISAWWNFGS LLGACLILQI TTGLFLA caucasian......r t cchimp...t pchimp...t t gorilla1... T...A T gorilla2... T...A T borang... T....L I.TI... sorang...st.. T....L I gibbon...l.. T....L...A.....M......I... bovine...ni.. SH...IV.N A...A.....S.....I...L... whalebm...ni.. TH...I..D A.....S.....L...V..L... whalebp...ni.. TH...IV.D A.V.....S.....L...M..L... dhorse...ni.. SH..I.I......A.....S.....I...L... horse...ni.. SH..I.I S.....I...L... rhinocer...ni.. SH..V.I S.....I...L... cat...ni.. SH..I.I......A V..T...L... gseal...ni.. TH...I..N I...L... hseal...ni.. TH...I..N I...L... mouse...n... TH..F.I......A.....S.....V..MV..I... rat...ni.. SH..F.I......A.....S.....V..MV..L... platypus...nnl.. TH..I.IV S.....L...I..L... wallaroo...nl.. SH..I.IV.....A I..L... opossum...ni.. TH...I..D V...I..L... chicken...apni.. SH..L.M..N.L...A AV..MT..L...L... xenopus...apni.. SH..I.I..N.....SL.....V...A..I... carp...a-sl.. TH..I.IA.D ALV L...T..L... loach...a-sl.. TH..I.IA.D ALV...A.....V.....L...T..L... trout...a-nl.. TH..L.IA.D ALV...A.....V.....L..AT..L... lamprey.shqpsii.. TH..LS.G.S MLV...S.A.....SL...I...I... seaurchin1 -...LG.L.. EH.IFRIL.S T.V...L... L.I.....L...T..L... seaurchin2 -...AG.L.. EH.IFRIL.S T.V...L... Week 6: Restriction sites, L.M... RAPDs, microsatellites, likelihood,..l...i.li hidden Markov models..i... p.61/63

62 Rates inferred from Cytochrome B african PDASTAFSSI AHITRDVNYG WIIRYLHANG ASMFFICLFL HIGRGLYYGS FLYSETW caucasian cchimp l... pchimp l....v......l... gorilla t hq... gorilla t hq... borang...t m..h......l thl... sorang m..h thl... gibbon...v l... bovine S.TT...V T..C......M......YM.V... YTFL... whalebm..tm...v T..C....V......YA.M... HAFR... whalebp..tt...v T..C YA.M... YAFR... dhorse S.TT...V T..C I.V... YTFL... horse S.TT...V T..C I.V... YTFL... rhinocer..tt...v T..C....M......I.V... YTFL... cat S.TM...V T..C YM.V...M... YTF... gseal S.TT...V T..C YM.V... YTFT... hseal S.TT...V T..C YM.V... YTFT... mouse S.TM...V T..C....L...M V... YTFM... rat S.TM...V T..C....L...Q V... YTFL... platypus S.T...V...C....L...M.....L..M.I..... YTQT... wallaroo S.TL...V...C....L..N......M....V...I... Y..K... opossum S.TL...V...C....L..NI......M....V...I... Y..K... chicken A.T.L...V..TC.N.Q...L..N.....F...I..... Y..K... xenopus A.T.M...V...CF... LL..N... L.F...IY K... carp S.I...V T..C....L..NV.....F...IYM..A... Y..K... loach S.I...V...C....L..NI.....F...Y...A... Y..K... trout S.I...V C..C...S...L..NI.....F...IYM..A... Y..K... lamprey ANTEL...V M..C...N..LM.N......IYA...I... Y..K... seaurchin1 A.I.L...A S..C....LL.NV.....L...MYC...G SNKI... seaurchin2 A.INL...V S..C....LL.NV...C Week 6: Restriction sites,..l...myc RAPDs, microsatellites, likelihood,...l hidden Markov models TNKI... p.62/63

63 PhyloHMMs: used in the UCSC Genome Browser The conservation scores calculated in the Genome Browser use PhyloHMMs, which is just these HMM methods. Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.63/63

Week 6: Protein sequence models, likelihood, hidden Markov models

Week 6: Protein sequence models, likelihood, hidden Markov models Week 6: Protein sequence models, likelihood, hidden Markov models Genome 570 February, 2016 Week 6: Protein sequence models, likelihood, hidden Markov models p.1/57 Variation of rates of evolution across

More information

Likelihoods and Phylogenies

Likelihoods and Phylogenies Likelihoods and Phylogenies Joe Felsenstein Department of enome Sciences and Department of Biology University of Washington, Seattle Likelihoods and Phylogenies p.1/68 n ideal parsimony method? Ideally,

More information

= 1 = 4 3. Odds ratio justification for maximum likelihood. Likelihoods, Bootstraps and Testing Trees. Prob (H 2 D) Prob (H 1 D) Prob (D H 2 )

= 1 = 4 3. Odds ratio justification for maximum likelihood. Likelihoods, Bootstraps and Testing Trees. Prob (H 2 D) Prob (H 1 D) Prob (D H 2 ) 4 1 1/3 1 = 4 3 1 4 1/3 1 = 1 12 Odds ratio justification for maximum likelihood Likelihoods, ootstraps and Testing Trees Joe elsenstein the data H 1 Hypothesis 1 H 2 Hypothesis 2 the symbol for given

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary

More information

Molecular Evolution and Phylogenetic Tree Reconstruction

Molecular Evolution and Phylogenetic Tree Reconstruction 1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/39

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26 Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies Week 8: Testing trees, ootstraps, jackknifes, gene frequencies Genome 570 ebruary, 2016 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.1/69 density e log (density) Normal distribution:

More information

Week 7: Bayesian inference, Testing trees, Bootstraps

Week 7: Bayesian inference, Testing trees, Bootstraps Week 7: ayesian inference, Testing trees, ootstraps Genome 570 May, 2008 Week 7: ayesian inference, Testing trees, ootstraps p.1/54 ayes Theorem onditional probability of hypothesis given data is: Prob

More information

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm

More information

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

More information

Week 5: Distance methods, DNA and protein models

Week 5: Distance methods, DNA and protein models Week 5: Distance methods, DNA and protein models Genome 570 February, 2016 Week 5: Distance methods, DNA and protein models p.1/69 A tree and the expected distances it predicts E A 0.08 0.05 0.06 0.03

More information

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22 Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

More information

L23: hidden Markov models

L23: hidden Markov models L23: hidden Markov models Discrete Markov processes Hidden Markov models Forward and Backward procedures The Viterbi algorithm This lecture is based on [Rabiner and Juang, 1993] Introduction to Speech

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

An Introduction to Bayesian Inference of Phylogeny

An Introduction to Bayesian Inference of Phylogeny n Introduction to Bayesian Inference of Phylogeny John P. Huelsenbeck, Bruce Rannala 2, and John P. Masly Department of Biology, University of Rochester, Rochester, NY 427, U.S.. 2 Department of Medical

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How

More information

Inferring Speciation Times under an Episodic Molecular Clock

Inferring Speciation Times under an Episodic Molecular Clock Syst. Biol. 56(3):453 466, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701420643 Inferring Speciation Times under an Episodic Molecular

More information

Inferring Molecular Phylogeny

Inferring Molecular Phylogeny Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden

More information

Inference of phylogenies, with some thoughts on statistics and geometry p.1/31

Inference of phylogenies, with some thoughts on statistics and geometry p.1/31 Inference of phylogenies, with some thoughts on statistics and geometry Joe Felsenstein University of Washington Inference of phylogenies, with some thoughts on statistics and geometry p.1/31 Darwin s

More information

HIDDEN MARKOV MODELS

HIDDEN MARKOV MODELS HIDDEN MARKOV MODELS Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

Molecular Evolution and Comparative Genomics

Molecular Evolution and Comparative Genomics Molecular Evolution and Comparative Genomics --- the phylogenetic HMM model 10-810, CMB lecture 5---Eric Xing Some important dates in history (billions of years ago) Origin of the universe 15 ±4 Formation

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Molecular Evolution & Phylogenetics

Molecular Evolution & Phylogenetics Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic

More information

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton. 2017-07-29 part 4: and biological inference review types of models phenomenological Newton F= Gm1m2 r2 mechanistic Einstein Gαβ = 8π Tαβ 1 molecular evolution is process and pattern process pattern MutSel

More information

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics

More information

order is number of previous outputs

order is number of previous outputs Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms  Hidden Markov Models Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Statistical Sequence Recognition and Training: An Introduction to HMMs

Statistical Sequence Recognition and Training: An Introduction to HMMs Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with

More information

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral

More information

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

Computational Genomics

Computational Genomics omputational Genomics Molecular Evolution: Phylogenetic trees Eric Xing Lecture, March, 2007 Reading: DTW boo, hap 2 DEKM boo, hap 7, Phylogeny In, Ernst Haecel coined the word phylogeny and presented

More information

Lecture 9. Intro to Hidden Markov Models (finish up)

Lecture 9. Intro to Hidden Markov Models (finish up) Lecture 9 Intro to Hidden Markov Models (finish up) Review Structure Number of states Q 1.. Q N M output symbols Parameters: Transition probability matrix a ij Emission probabilities b i (a), which is

More information

CS711008Z Algorithm Design and Analysis

CS711008Z Algorithm Design and Analysis .. Lecture 6. Hidden Markov model and Viterbi s decoding algorithm Institute of Computing Technology Chinese Academy of Sciences, Beijing, China . Outline The occasionally dishonest casino: an example

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Taming the Beast Workshop

Taming the Beast Workshop Workshop and Chi Zhang June 28, 2016 1 / 19 Species tree Species tree the phylogeny representing the relationships among a group of species Figure adapted from [Rogers and Gibbs, 2014] Gene tree the phylogeny

More information

Discrete & continuous characters: The threshold model

Discrete & continuous characters: The threshold model Discrete & continuous characters: The threshold model Discrete & continuous characters: the threshold model So far we have discussed continuous & discrete character models separately for estimating ancestral

More information

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG Department of Biology (Galton Laboratory), University College London, 4 Stephenson

More information

Multiple Sequence Alignment using Profile HMM

Multiple Sequence Alignment using Profile HMM Multiple Sequence Alignment using Profile HMM. based on Chapter 5 and Section 6.5 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. students Beatrice Miron, Oana Răţoi,

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

More information

Molecular evolution. Joe Felsenstein. GENOME 453, Autumn Molecular evolution p.1/49

Molecular evolution. Joe Felsenstein. GENOME 453, Autumn Molecular evolution p.1/49 Molecular evolution Joe Felsenstein GENOME 453, utumn 2009 Molecular evolution p.1/49 data example for phylogeny inference Five DN sequences, for some gene in an imaginary group of species whose names

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

Bayesian Models for Phylogenetic Trees

Bayesian Models for Phylogenetic Trees Bayesian Models for Phylogenetic Trees Clarence Leung* 1 1 McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada ABSTRACT Introduction: Inferring genetic ancestry of different species

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Alignment Algorithms. Alignment Algorithms

Alignment Algorithms. Alignment Algorithms Midterm Results Big improvement over scores from the previous two years. Since this class grade is based on the previous years curve, that means this class will get higher grades than the previous years.

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Lecture 3: Markov chains.

Lecture 3: Markov chains. 1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.

More information

Bioinformatics 2 - Lecture 4

Bioinformatics 2 - Lecture 4 Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Molecular Evolution, course # Final Exam, May 3, 2006

Molecular Evolution, course # Final Exam, May 3, 2006 Molecular Evolution, course #27615 Final Exam, May 3, 2006 This exam includes a total of 12 problems on 7 pages (including this cover page). The maximum number of points obtainable is 150, and at least

More information

Lecture 8: December 17, 2003

Lecture 8: December 17, 2003 Computational Genomics Fall Semester, 2003 Lecture 8: December 17, 2003 Lecturer: Irit Gat-Viks Scribe: Tal Peled and David Burstein 8.1 Exploiting Independence Property 8.1.1 Introduction Our goal is

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Lecture Notes: Markov chains

Lecture Notes: Markov chains Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Stephen Scott.

Stephen Scott. 1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for

More information

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018 Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

More information

Using algebraic geometry for phylogenetic reconstruction

Using algebraic geometry for phylogenetic reconstruction Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Comparative Gene Finding. BMI/CS 776  Spring 2015 Colin Dewey Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

Hidden Markov models in population genetics and evolutionary biology

Hidden Markov models in population genetics and evolutionary biology Hidden Markov models in population genetics and evolutionary biology Gerton Lunter Wellcome Trust Centre for Human Genetics Oxford, UK April 29, 2013 Topics for today Markov chains Hidden Markov models

More information

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

Approximate Inference

Approximate Inference Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate

More information

Hidden Markov Models (I)

Hidden Markov Models (I) GLOBEX Bioinformatics (Summer 2015) Hidden Markov Models (I) a. The model b. The decoding: Viterbi algorithm Hidden Markov models A Markov chain of states At each state, there are a set of possible observables

More information

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas CG-Islands Given 4 nucleotides: probability of occurrence is ~ 1/4. Thus, probability of

More information

Phylogeny: traditional and Bayesian approaches

Phylogeny: traditional and Bayesian approaches Phylogeny: traditional and Bayesian approaches 5-Feb-2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275-284, 2003 1 Phylogeny A graph depicting the ancestor-descendent

More information

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Maximum Entropy Models I Welcome back for the 3rd module

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Hidden Markov Models 1

Hidden Markov Models 1 Hidden Markov Models Dinucleotide Frequency Consider all 2-mers in a sequence {AA,AC,AG,AT,CA,CC,CG,CT,GA,GC,GG,GT,TA,TC,TG,TT} Given 4 nucleotides: each with a probability of occurrence of. 4 Thus, one

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

Hidden Markov Models. x 1 x 2 x 3 x N

Hidden Markov Models. x 1 x 2 x 3 x N Hidden Markov Models 1 1 1 1 K K K K x 1 x x 3 x N Example: The dishonest casino A casino has two dice: Fair die P(1) = P() = P(3) = P(4) = P(5) = P(6) = 1/6 Loaded die P(1) = P() = P(3) = P(4) = P(5)

More information

ECE521 Lecture 19 HMM cont. Inference in HMM

ECE521 Lecture 19 HMM cont. Inference in HMM ECE521 Lecture 19 HMM cont. Inference in HMM Outline Hidden Markov models Model definitions and notations Inference in HMMs Learning in HMMs 2 Formally, a hidden Markov model defines a generative process

More information

Supplementary information

Supplementary information Supplementary information Superoxide dismutase 1 is positively selected in great apes to minimize protein misfolding Pouria Dasmeh 1, and Kasper P. Kepp* 2 1 Harvard University, Department of Chemistry

More information