Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Size: px
Start display at page:

Download "Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!"

Transcription

1 Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site:

2 y = x x x x x Why do we need models? y = x Copyright 2007 Paul O. Lewis 2

3 Models Models help us intelligently interpolate between our observations for purposes of making predictions Adding parameters to a model generally increases its fit to the data Underparameterized models lead to poor fit to observed data points Overparameterized models lead to poor prediction of future observations Criteria for choosing models include likelihood ratio tests, AIC, BIC, Bayes Factors, etc. all provide a way to choose a model that is neither underparameterized nor overparameterized Copyright 2007 Paul O. Lewis 3

4 The Poisson distribution Probability distribution on the number of events when: 1. events are assumed to be independent, 2. the rate of events some constant, µ, and 3. the process continues for some duration of time, t. The expectation of the number of events is ν = µt. Note that ν can be any non-negative number, but the Poisson is a discrete distribution it gives the probabilities of the number of events (and this number will always be a non-negative integer).

5 The Poisson distribution Pr(k events Expected # is ν) = νk e ν k! Pr(0 events) = ν0 e ν 0! = e ν = e µt Pr( 1 events) = 1 e ν = 1 e µt

6 "Disruptions" vs. substitutions When a disruption occurs, any base can appear in a sequence. T? Note: disruption is my term for this make-believe event. You will not see this term in the literature. If the base that A appears is different C G from the base that was already there, then a substitution event has occurred. T The rate at which any particular substitution occurs will be 1/4 the disruption rate (assuming equal base frequencies) Copyright 2007 Paul O. Lewis 13

7 Probability of T G over time t If µ is the rate of disruptions, and a branch is t units of time long then: Let s use θ for the rate of any particular disruption. µ T A = µ T C = µ T G = µ T T = θ µ = 4θ Furthermore, given that there is a disruption the chance of any particular change is 1 4

8 Probability of T G over time t Pr(0 disruptions t) = e µt Pr(at least 1 disruption t) = 1 e µt Pr(last disruption leads to G) = 0.25 Pr(T G t) = 0.25 ( 1 e µt) = 0.25 ( 1 e 4θt)

9 JC69 model Bases are assumed to be equally frequent (all 0.25) Assumes rate of substitution (α) is the same for all possible substitutions Usually described as a 1-parameter model (the parameter being α) Remember, however, that each edge in a tree can have its own α, so there are really as many parameters in the model as there are edges in the tree! Jukes, T. H., and C. R. Cantor Evolution of protein molecules. Pages in H. N. Munro (ed.), Mammalian Protein Metabolism. Academic Press, New York. Copyright 2007 Paul O. Lewis 15

10 JC transition probabilities Pr(T A t) = 0.25 ( 1 e 4θt) Pr(T C t) = 0.25 ( 1 e 4θt) Pr(T G t) = 0.25 ( 1 e 4θt) Pr(T T t) = 0.25 ( 1 e 4θt) but this only adds up to: ( 1 e 4θt ) instead of 1!

11 We left out the probability of no disruptions:e 4θt So: Pr(T A t) = 0.25 ( 1 e 4θt) Pr(T C t) = 0.25 ( 1 e 4θt) Pr(T G t) = 0.25 ( 1 e 4θt) Pr(T T t) = e 4θt ( 1 e 4θt) = e 4θt

12 JC transition probabilities Pr(i j t) = 0.25 ( 1 e 4θt) Pr(i i t) = e 4θt When t = 0, then e 4θt = 1, and: Pr(i j t) = 0 Pr(i i t) = 1

13 JC transition probabilities Pr(i j t) = 0.25 ( 1 e 4θt) Pr(i i t) = e 4θt When t =, then e 4θt = 0, and: Pr(i j t) = 0.25 Pr(i i t) = 0.25

14 Probability of A present as a function of time Upper curve assumes we started with A at time 0. Over time, the probability of still seeing an A at this site drops because rate of changing to one of the other three bases is 3α (so rate of staying the same is -3α). The equilibrium relative frequency of A is 0.25 Lower curve assumes we started with some state other than A (T is used here). Over time, the probability of seeing an A at this site grows because the rate at which the current base will change into an A is α. Copyright 2007 Paul O. Lewis 24

15 Water analogy (time 0) 3α A C G T Start with container A completely full and others empty Imagine that all containers are connected by tubes that allow same rate of flow between any two Initially, A will be losing water at 3 times the rate that C (or G or T) gains water α Copyright 2007 Paul O. Lewis 25

16 Water analogy (after some time) A C G T A s level is not dropping as fast now because it is now also receiving water from C, G and T Copyright 2007 Paul O. Lewis 26

17 Water analogy (after a very long time) A C G T Eventually, all containers are one fourth full and there is zero net volume change stationarity (equilibrium) has been achieved (Thanks to Kent Holsinger for this analogy) Copyright 2007 Paul O. Lewis 27

18 JC instantaneous rate matrix - the Q matrix for JC The 1 parameter is α (sometimes parameterized in terms of µ). This is the rate of replacements ( disruptions that change the state): To State A C G T From A 3α α α α C α 3α α α State G α α 3α α T α α α 3α

19 Change probabilities We can calculate a transition probability matrix as a function of time by: P(t) = e Qt The important thing to note is the rates (Q matrix) is multiplied by the time. We can t separate rates and times since we always see the effect of their product. Is a medium level of character divergence: 1. medium rate of change and medium amount of time, 2. high rate, but short time period, 3. low rate, but a long time period?

20 JC instantaneous rate matrix again What if you do not know the length of time for a branch in the tree? We estimate branch lengths in terms of character divergence the product of rate and time. What is important is that we know the relative rates of different types of substitutions, so JC can be expressed: To State A C G T From A C State G T

21 JC instantaneous rate matrix yet again We estimate branch lengths in terms of expected number of changes per site. To do this we standardize the total rate of divergence in the Q matrix and estimate ν = µt = 3αt for each branch. From A C State G 1 3 T 1 3 To State A C G T

22 Kimura (1980) model or the K80 model Transitions and transversions occur at different rates: To State A C G T From A 2β α β α β C β 2β α β α State G α β 2β α β T β α β 2β α

23 Kimura (1980) model or the K80 model. Reparameterized. Once again, we care only about the relative rates, so we can choose one rate to be frame of reference. This turns the 2 parameter model into a 1 parameter form: From State To State A C G T A (2 + κ)β β κβ β C β (2 + κ)β β κβ G κβ β (2 + κ)β β T β κβ β (2 + κ)β

24 Kimura (1980) model or the K80 model. Reparameterized again. To State A C G T From A 2 κ 1 κ 1 C 1 2 κ 1 κ State G κ 1 2 κ 1 T 1 κ 1 2 κ

25 Kappa is the transititon/transversion rate ratio: κ = α β (if κ = 1 then we are back to JC).

26 What is the instantaneous probability of an particular transversion? Pr(A C) = Pr(A) Pr(change to C) = 1 4 (βdt)

27 What is the instantaneous probability of an particular transition? Pr(A G) = Pr(A) Pr(change to G) = 1 4 (κβdt)

28 There are four types of transitions: A G, G A, C T, T C and eight types of transversions: A C, A T, G C, G T, C A, C G, T A, T G Ti/Tv ratio = Pr(any transition) Pr(any transversion) = 4 ( 1 4 (κβdt)) 8 ( 1 4 (βdt)) = κ 2 For K2P instantaneous transition/transversion ratio is one-half the instantaneous transition/transversion rate ratio

29 Felsenstein 1981 model or F81 model To State A C G T From A π C π G π T C π State A π G π T G π A π C π T T π A π C π G

30 HKY 1985 model To State A C G T From A π C κπ G π T C π State A π G κπ T G κπ A π C π T T π A κπ C π G

31 F84 model: F84* vs. HKY85 μ rate of process generating all types of substitutions kμ rate of process generating only transitions Becomes F81 model if k = 0 HKY85 model: β rate of process generating only transversions κβ rate of process generating only transitions Becomes F81 model if κ = 1 *First used in PHYLIP in 1984, first published bykishino, H., and M. Hasegawa Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. Journal of Molecular Evolution 29: Copyright 2007 Paul O. Lewis 38

32 General Time Reversible GTR model To State A C G T From A aπ C bπ G cπ T C aπ State A dπ G eπ T G bπ A dπ C fπ T T cπ A eπ C fπ G In PAUP, f = 1 indicating that G T is the reference rate

33 References Kimura, M. (1980). A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16:

34 Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site:

35 JC instantaneous rate matrix yet again We estimate branch lengths in terms of expected number of changes per site. To do this we standardize the total rate of divergence in the Q matrix and estimate ν = µt = 3αt for each branch. From A C State G 1 3 T 1 3 To State A C G T

36 JC state-comparison probabilities i and j refer to states (A, C, G, T). i j: Pr(i i ν) = e 4ν 3 Pr(i j ν) = e 4ν 3

37 CFN transition probabilities Pr(0 0 ν) = Pr(1 1 ν) = e 2ν Pr(0 1 ν) = Pr(1 0 ν) = e 2ν

38 Mk transition probabilities k-state version of the one-rate model. Pr(i i ν) = 1 k + (k 1)e ( k k 1)ν k Pr(i j ν) = 1 k e ( k k 1)ν k

39 Kimura (1980) model or the K80 model Transitions and transversions occur at different rates: To State A C G T From A 2β α β α β C β 2β α β α State G α β 2β α β T β α β 2β α

40 Kimura (1980) model or the K80 model. Reparameterized. We only care about the relative rates, so we can choose one rate to be frame of reference. This turns the 2 parameter model into a 1 parameter form: From State To State A C G T A (2 + κ)β β κβ β C β (2 + κ)β β κβ G κβ β (2 + κ)β β T β κβ β (2 + κ)β

41 Kimura (1980) model or the K80 model. Reparameterized again. To State A C G T From A 2 κ 1 κ 1 C 1 2 κ 1 κ State G κ 1 2 κ 1 T 1 κ 1 2 κ

42 Kappa is the transititon/transversion rate ratio: κ = α β (if κ = 1 then we are back to JC).

43 What is the instantaneous probability of an particular transversion? Pr(A C) = Pr(A) Pr(change to C) = 1 4 (βdt)

44 What is the instantaneous probability of an particular transition? Pr(A G) = Pr(A) Pr(change to G) = 1 4 (κβdt)

45 There are four types of transitions: A G, G A, C T, T C and eight types of transversions: A C, A T, G C, G T, C A, C G, T A, T G Ti/Tv ratio = Pr(any transition) Pr(any transversion) = 4 ( 1 4 (κβdt)) 8 ( 1 4 (βdt)) = κ 2 For K2P instantaneous transition/transversion ratio is one-half the instantaneous transition/transversion rate ratio

46 Kimura model change probabilities Pr(A A ν) = 1 4 ( 1 + e ( 4 2+κ)ν + 2e ( 2+2κ 2+κ )ν ) Pr(A G ν) = 1 4 ( 1 + e ( 4 2+κ)ν 2e ( 2+2κ 2+κ )ν ) Pr(A C ν) = 1 4 ( 1 e ( 4 2+κ)ν )

47 Felsenstein 1981 model or F81 model To State A C G T From A π C π G π T C π State A π G π T G π A π C π T T π A π C π G

48 HKY 1985 model To State A C G T From A π C κπ G π T C π State A π G κπ T G κπ A π C π T T π A κπ C π G

49 F84 model: F84* vs. HKY85 μ rate of process generating all types of substitutions kμ rate of process generating only transitions Becomes F81 model if k = 0 HKY85 model: β rate of process generating only transversions κβ rate of process generating only transitions Becomes F81 model if κ = 1 *First used in PHYLIP in 1984, first published bykishino, H., and M. Hasegawa Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. Journal of Molecular Evolution 29: Copyright 2007 Paul O. Lewis 38

50 General Time Reversible GTR model To State A C G T From A aπ C bπ G cπ T C aπ State A dπ G eπ T G bπ A dπ C fπ T T cπ A eπ C fπ G In PAUP, f = 1 indicating that G T is the reference rate

51 Likelihood of a single sequence First 32 nucleotides of the ψη-globin gene of gorilla: GAAGTCCTTGAGAAATAAACTGCACACACTGG L = π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π G A A G T C C T T G A G A A A T A A A C T G C A C A C A C T G G = π π π π A C G T ( ) ( ) ( ) ( ) ln L = 12 ln π + 7 ln π + 7 ln π + 6 ln π A C G T We can already see by eye-balling this that the F81 model (which allows unequal base frequencies) will fit better than the JC69 model (which assumes equal base frequencies) because there are about twice as many As as there are Cs, Gs and Ts. Copyright 2007 Paul O. Lewis 4

52 How can we calculate the likelihood score Under the JC (or K2P) model: ln L = 12 ln π A + 7 ln π C + 7 ln π G + 6 ln π T = 12 ln ln ln ln 0.25 =

53 How can we calculate the likelihood score Under the F81 (or HKY or GTR) model: ln L = 12 ln π A + 7 ln π C + 7 ln π G + 6 ln π T But what are the values for the parameters: π A, π C, π G, π T? In many cases we refer to these parameters as nuisance parameters. They must be specified in order to calculate the likelihood, but we are not interested in them by themselves.

54 ML parameter estimates We can find the maximum likelihood estimates of the parameters to give us the ML score: the maximum likelihood obtainable under this model: ln L = 12 ln π A + 7 ln π C + 7 ln π G + 6 ln π T = 12 ln π A + 7 ln π C + 7 ln π G + 6 ln π T = 12 ln ln ln ln = But how did I get the numbers to fill for the parameters? How do we know that π A = and π C =

55 ML parameter estimates We might guess that: but how do we prove it? π A = = π C = 7 32 = π G = 7 32 = π T = 6 32 =

56 ML parameter estimates For simple problems we solve for the point in parameter space for which derivatives with respect to all parameters are 0 (we also have to consider boundary points). We would have to do constrained optimization because and that is a pain. π A + π C + π G + π T = 1

57 ML parameter estimates We can reparameterize: r = π A + π G a = c = π A π A + π G π C π C + π T and always recover the original parameters: π A = ra π G = r(1 a) π C = (1 r)c π T = (1 r)(1 c)

58 ML parameter estimates ln L = 12 ln π A + 7 ln π C + 7 ln π G + 6 ln π T Recall that: = 12 ln [ra] + 7 ln [(1 r)c] + 7 ln [r(1 a)] + 6 ln [(1 r)(1 c)] ln f(x) x = f(x) x f(x)

59 ln L = 12 ln [ra] + 7 ln [(1 r)c] + 7 ln [r(1 a)] + 6 ln [(1 r)(1 c)] ln L a = 12r ra + 7( r) r(1 a) = 12 a 7 (1 a) 0 = 12 â 7 (1 â) â = 12 19

60 ln L = 12 ln [ra] + 7 ln [(1 r)c] + 7 ln [r(1 a)] + 6 ln [(1 r)(1 c)] ln L c = 7(1 r) (1 r)c = 7 c 6 (1 c) 0 = 7 ĉ 6 (1 ĉ) ĉ = (1 r) (1 r)(1 a)

61 ln L = 12 ln [ra] + 7 ln [(1 r)c] + 7 ln [r(1 a)] + 6 ln [(1 r)(1 c)] ln L r 0 = = 12a ra + 7( c) (1 r)c + 7(1 a) r(1 a) = 12 r 7 (1 r) + 7 r 6 1 r = 19 r 13 (1 r) ˆr = ˆr 13 (1 ˆr) + 6( (1 c)) (1 r)(1 c)

62 ML inference displays scale invariance so we can just transform the ML estimates into our original parameters: π A = ˆrâ = π G = ˆr(1 â) = π C = (1 ˆr)ĉ = π T = (1 ˆr)(1 ĉ) = ( ) ( ) = ( ) ( ) 19 7 = ( ) ( ) 13 6 = ( ) ( ) 13 6 =

63 Likelihood ratio testing ln L JC = ln L F 81 = But the F81 model has 3 more free parameters than JC. The likelihood ratio test is a hypothesis testing approach to model selection.

64 Likelihood ratio testing H 0 : the data were generated under the simpler model H A : the data were generated under the more complex model. ) test statistic: 2 (ln L complex ln L simple The LRT only works if the simple model is nested inside the more complex model (if the free parameters for the simple model are a subset of the free parameters for the more complex model).

65 Likelihood ratio testing Null distribution: χ 2 distribution with d.f. = difference in the number of free parameters. If the LR test statistic is > than the critical value from the appropriate chi-square table, then we reject the simple model and prefer the more complex model.

66 Likelihood ratio testing - example ln L JC = ln L F 81 = LRT = 2( ( )) = 2.54 df = 3 0 = 3 χ 2 3(critical, P = 0.05) = Not significant. Do not reject the JC model. (if we look up the P -value for this test statistic it is ).

67 Likelihoods on the simplest possible tree GA GG L = L 1 L 2 = Pr(G) Pr(G G) Pr(A) Pr(A G) = Pr(G) Pr(G G ν) Pr(A) Pr(A G ν) ( ) ( 1 1 = ) ( ) ( e 4ν ) e 4ν 3 4

68 d = e 4ν 3 ( ) e 4ν 3 = 1 3d 4 ln L d L = ( ) ( e 4ν 3 4 (1 3d)d = 16 = 1 6d 16 0 = 1 6 ˆd 16 ˆd = 1 6 ˆν = L = ) ( ) ( ) e 4ν 3 4

69 You may recall that the JC distance correction from lecture 8 looked like this: ν = 3 ( 4 ln 1 4p ) 3 If you put in p = 0.5, because half the sites differ in our example then you the same branch length: ν = Our JC distance correction formula is actually an ML estimator of the branch length between a pair of taxa.

70 The first 30 nucleotides of the ψη-globin gene 50 gorilla orangutan [( 1 L = 4 GAAGTCCTTGAGAAATAAACTGCACACTGG GGACTCCTTGAGAAATAAACTGCACACTGG )] 28 [( ) ( )] 2 e 4ν 3 4 ) ( e 4ν ˆν = ln L =

71 Likelihood of a tree (data for only one site shown) A A A C Arbitrarily chosen to serve as the root node C T Ancestral states like this are not really known - we will address this in a minute. Copyright 2007 Paul O. Lewis 9

72 Likelihood for site k A C ν 1 A ν 3 C ν 5 ν 5 is the expected no. substitutions for just this segment of the tree π A A ν 2 ν 4 T ν1/ ν2/ ν3/ ν4/ ν5/3 Lk = e 4 + 4e 4 4e 4 4e 4 + 4e P AA (ν 1 ) P AA (ν 2 ) P AC (ν 3 ) Copyright 2007 Paul O. Lewis P CT (ν 4 ) P CC (ν 5 ) 10

73 Brute force approach would be to calculate L k for all 16 combinations of ancestral states and sum Copyright 2007 Paul O. Lewis 11

74 Pruning algorithm* (same result, much less time) Many calculations can be done just once, and then reused many times *The pruning algorithm was introduced by: Felsenstein, J Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17: Copyright 2007 Paul O. Lewis 12

75 Taxon Character 1 A 2 C 3 C 4 C 5 G ν 6 ν 8 ν 1 ν 2 ν 3 ν ν 4 5 ν 5

76 L = x Pr(x, y, z, w, A, C, C, C, G ν) y z w A y ν 1 x ν 6 ν 2 C C ν 8 z ν 3 w C ν 4 G ν 7 ν 5 1

77 L = x Pr(x) Pr(y x, ν 6 ) Pr(A y, ν 1 ) Pr(C y, ν 2 ) y z w Pr(z x, ν 8 ) Pr(C z, ν 3 ) Pr(w z, ν 7 ) Pr(C w, ν 4 ) Pr(G w, ν 5 ) A y ν 1 x ν 6 ν 2 C C ν 8 z ν 3 w C ν 4 G ν 7 ν 5

78 L = Pr(x) Pr(y x, ν 6 ) Pr(A y, ν 1 ) Pr(C y, ν 2 ) x y z ( ) Pr(z x, ν 8 ) Pr(C z, ν 3 ) Pr(w z, ν 7 ) Pr(C w, ν 4 ) Pr(G w, ν 5 ) w A y ν 1 x ν 6 ν 2 C C ν 8 z ν 3 w C ν 4 G ν 7 ν 5

79 L = Pr(x) Pr(y x, ν 6 ) Pr(A y, ν 1 ) Pr(C y, ν 2 ) x y ( ( )) Pr(z x, ν 8 ) Pr(C z, ν 3 ) Pr(w z, ν 7 ) Pr(C w, ν 4 ) Pr(G w, ν 5 ) z w A y ν 1 x ν 6 ν 2 C C ν 8 z ν 3 w C ν 4 G ν 7 ν 5

80 L = ( Pr(x) x y ( ( Pr(z x, ν 8 ) Pr(C z, ν 3 ) z ) Pr(y x, ν 6 ) Pr(A y, ν 1 ) Pr(C y, ν 2 ) w )) Pr(w z, ν 7 ) Pr(C w, ν 4 ) Pr(G w, ν 5 ) A y ν 1 x ν 6 ν 2 C C ν 8 z ν 3 w C ν 4 G ν 7 ν 5

81 Maximum likelihood is a lot of work Site likelihoods involve products of transition probabilities, summed over ancestral states Overall log-likelihood for a tree is sum of site loglikelihoods Overall log-likelihood must be maximized! must find MLEs for all edge lengths and all model parameters this involves computing the overall log-likelihood many, many times (try turning on logiter in PAUP to get a feel for how much work this involves) Maximized lnl can now be compared to maximized lnl from other trees Copyright 2007 Paul O. Lewis 13

82 Uses all information Is it worth it? Parsimony ignores constant and autapomorphic sites Distance methods ignore information not captured in pairwise comparisons Model generality Some models possible with distance methods, but some quantities cannot be estimated reliably (e.g. variation in rates across sites) Many parsimony variants exist, but parsimony does not allow estimation of the step matrix entries, for example Many complex models are only possible under likelihood or Bayesian methods (which have a likelihood foundation) Copyright 2007 Paul O. Lewis 14

83 References Kimura, M. (1980). A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16:

84 Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site:

85 RuBisCO enzyme 8 small subunits (white) 8 large subunits (colored) Responsible for fixing CO 2 Copyright 2007 Paul O. Lewis 2

86 Green Plant rbcl First 88 amino acids, translation is for Zea mays M--S--P--Q--T--E--T--K--A--S--V--G--F--K--A--G--V--K--D--Y--K--L--T--Y--Y--T--P--E--Y--E--T--K--D--T--D--I--L--A--A--F--R--V--T--P-- Chara (green alga; land plant lineage) AAAGATTACAGATTAACTTACTATACTCCTGAGTATAAAACTAAAGATACTGACATTTTAGCTGCATTTCGTGTAACTCCA Chlorella (green alga)...c...c.t...t..cc..c.a...c...t...c.t..a..g..c...a.g...t Volvox (green alga)...tc.t...a...c..a...c...gt.gta...c...c...a...a.g... Conocephalum (liverwort)...tc...t...g..t...g...g..t...a...a.aa.g...t Bazzania (moss)...t...c..t...g...a...g.g..c...g..a..t...g..a...a.g...c Anthoceros (hornwort)...t...cc.t...c...t..cg.g..c..g...t...g..a..g.c.t.aa.g...t Osmunda (fern)...tc...g...c...c..t...g.g..c..g...t...g..a...c..aa.g...c Lycopodium (club "moss").gg...c.t..c...t...g..c...a..c..t...c.g..a...aa.g...t Ginkgo (gymnosperm; Ginkgo biloba)...g...t...a...c...c...t..c..g..a...c..a...t Picea (gymnosperm; spruce)...t...a...c.g..c...g..t...g..a...c..a...t Iris (flowering plant)...g...t...t..cg...c...t..c..g..a...c..a...t Asplenium (fern; spleenwort)...tc..c.g...t..c..c..c..a..c..g..c...c..t..c..g..a..t..c..ga.g..c... Nicotiana (flowering plant; tobacco)...g...a...g...t...cc...c..g...t..a..g..a...c..a...t Q--L--G--V--P--P--E--E--A--G--A--A--V--A--A--E--S--S--T--G--T--W--T--T--V--W--T--D--G--L--T--S--L--D--R--Y--K--G--R--C--Y--H--I--E-- CAACCTGGCGTTCCACCTGAAGAAGCAGGGGCTGCAGTAGCTGCAGAATCTTCTACTGGTACATGGACTACTGTTTGGACTGACGGATTAACTAGTTTGGACCGATACAAAGGAAGATGCTACGATATTGAA...A..T...A...G..T..G...A...A..A...T...G...A...T..T...A...T...TC.T..T..T..C..C..G...A..T...TGT..T...T..T...T...A..A..A...T...A...A...T..T...A...C.T...T...TC.T..T..T..C..C..G..G...G..A...G.A...A..A...T...T...A...T..TC.T...ACC.T..T..T..T...TC...T.G...C...G..A..A...A..G...T...A..C...G...C..G...C..T..GC.T..A...C.C..T..T...TC...T..C..C... T...A..G..G...A..C...T...A...C..T...C.T..C..CC.T...T...TC...C......C..A..A..GG...G...T..A...G...A...G...C...A...G..T...C.T..C...C.T..T..T..T..G..TC......T...A..A...C..G...G..A..C...T...C...C..T...C.T..C...C.C..T..C...TC.G...T..A......A..G...G...G..A...C...C...C...C..T...C.T..C...C.T..T..T...G...T..C..C..G...A..G..G..G..C..G...G..A..A...T...C..C...C...C..T...C.T...C.T..T..T...G..GC...T..C..C..G...C..A...TG...G...C..G...C...A..A..G...T...C.T..C...C.T..T..T...C...C.C..C..G...C..A..A...G...C..A...G..C...A...C...G...A...G..G..C..CC.T...T...G..CC...C..G...A...C..G...C...A...A...C..T...C.T..C..CC.T..T..T...GC...CGC..C..G All four bases are observed at some sites......while at other sites, only one base is observed Copyright 2007 Paul O. Lewis 3

87 Question: Why is rate heterogeneity ubiquituous? Answer: Differences in mutational rates and (mainly) selective constraint Many sites are under purifying (stabilizing) selection: Any mutation results in a different amino acid, AND A amino acid replacement at the site results in dramatically worse functioning of the protein. These sites will show low rates of evolution on a tree. Other sites are less constrained. A mutation results in the same amino acid, OR Many amino acids will work equally well at that position in the protein. These sites will show high rates of evolution on a tree.

88 Rate heterogeneity in protein-coding genes: terms Synonymous mutations result in the same amino acid. Non-synonymous mutations result in the different amino acid. Conservative changes are non-synonymous changes that result in a chemically similar amino acid. Neutral mutations result in a new genotype that has the same fitness as the genotypes currently fixed in the population.

89 Rate heterogeneity in protein-coding genes: generalities Synonymous changes are often neutral (or close to neutral), Third base positions and untranslated regions (introns and other non-coding regions) tend to have high rates because changes to these sites lead to synonymous changes. Transitions tend to lead to more synonymous or conservative changes. Amino acid residues that are embedded, involved in salt bonding, or part of the active site tend to be more constrained. Loops of amino acid residues on the outside of proteins often tolerate a wide range of substitutions (or even indels).

90 U C A G 2nd Base U C A G UUU UCU UAU UGU F Y C UUC UCC UAC UGC S UUA UCA UAA UGA * L * UUG UCG UAG UGG W CUU CCU CAU CGU H CUC CCC CAC CGC L P CUA CCA CAA CGA Q CUG CCG CAG CGG R AUU ACU AAU AGU N S AUC I ACC AAC AGC T AUA ACA AAA AGA K R AUG M ACG AAG AGG GUU GCU GAU GGU V D GUC GCC GAC GGC A GUA GCA GAA GGA L E GUG GCG GAG GGG G

91 Rate heterogeneity in RNA coding genes Stem regions formed when RNA strand forms double-helix with itself strongly conserved in general evidence for compensatory substitutions Loop regions some strongly conserved some entire loops are found in only particular lineages Copyright 2007 Paul O. Lewis 6

92 Accommodating rate heterogeneity in substitution models Site-specific rates approach e.g. let 1st, 2nd and 3rd position sites each have their own relative substitution rate Proportion of invariable sites approach assume that some proportion p invar of sites have rate 0, while a proportion 1-p invar have a rate > 0 Discrete gamma distributed relative rates approach assume that each site is evolving at one of n cat relative rates, where the relative rates are determined using a gamma distribution having mean 1 and shape α Codon models (protein-coding genes only) uses genetic code to determine appropriate relative rates Secondary structure models (RNA-coding genes only) uses separate model for loops vs. stems, stem model takes account of compensatory substitutions Copyright 2007 Paul O. Lewis 7

93 Site-specific rates You decide there are 3 classes of sites: 1st positions evolve at relative rate r 1 2nd positions evolve at relative rate r 2 3rd positions evolve at relative rate r 3 r 1, r 2 and r 3 are relative rates, not actual rates: their average is 1.0: if each category has the same number of sites, (r 1 + r 2 + r 3 )/3 = 1.0 the actual rates are r 1 α (for 1st positions), r 2 α (for 2nd positions) and r 3 α (for 3rd positions) note that the average substitution rate over all sites is α (r 1 α + r 2 α + r 3 α)/3 = α (1.0) = α Assuming k rate classes adds k-1 parameters to the model Copyright 2007 Paul O. Lewis 8

94 Transition probabilities under the JC69 model with no rate heterogeneity: Pr(i i ν) = e 4ν 3 Pr(i j ν) = e 4ν 3

95 Transition probabilities under the JC69 model First base positions under a site-specific rates model: Pr(i i ν) = ν 4 e 4r 3 Pr(i j ν) = ν 4 e 4r 3

96 Site-specific rates in PAUP* First, define a character partition that puts each site into one of several mutually exclusive categories (the category names are arbitrary): charpartition codons = one:1-.\3, two:2-.\3, three:3-.\3; Then tell PAUP* that you want site specific rates and provide the partition you defined previously: lset rates=sitespec siterates=partition:codons; Copyright 2007 Paul O. Lewis 11

97 Pinvar approach Unlike the site-specific rates approach, this approach does not require you to assign sites to rate categories Assumes there are only two classes of sites: invariable sites (evolve at relative rate 0) variable sites (evolves at relative rate r) Remarks: mean of relative rates = (p invar )(0) + (1-p invar )(r) = 1 this means that r = 1/(1-p invar ) if all sites are variable, p invar = 0 and r = 1 Copyright 2007 Paul O. Lewis 12

98 Constant site a site in which all of the taxa display the same character state. Invariable site a site in which only one character state is allowed. A site that cannot change state. All invariable sites are constant, but not all constant sites have to be invariable.

99 Pr(i i invariable) = e 40ν 3 = e0 = 1 Pr(i j invariable) = e 40ν 3 = 0

100 A site s likelihood under the JC+ I model x i is the data pattern for site i. General form: Pr(x i JC+I) = p inv Pr(x i inv) + (1 p inv ) Pr If x i is a variable site: If x i is a constant site: Pr(x i JC+I) = (1 p inv ) Pr ( x i JC, Pr(x i JC+I) = p inv Pr(x i inv) + (1 p inv ) Pr ( x i JC, ν 1 p inv ( x i JC, ν 1 p inv ) ν 1 p inv ) )

101 Why ν 1 p inv? We want the mean rate of change to be 1.0 over all sites (so we can interpret the branch lengths in terms of the expected # of changes per site). If r is the rate of change for the variable sites then: 1 = 0p inv + r ( 1 p inv ) r = = r ( 1 p inv ) 1 1 p inv

102 Variable (but unknown) rates We expect more shades of grey rather than the on-or-off view of the pinvar model. a priori we do not know which sites are fast and which are slow We may be able to characterize the distribution of rates across sites high variance or low variance.

103 Gamma distributions relative frequency of sites α = 10 α = 1 larger α means less heterogeneity α = 0.1 The mean equals 1.0 for all three of of these distributions smaller α means more heterogeneity relative rate Copyright 2007 Paul O. Lewis 20

104 Gamma distribution f(r) = rα 1 β α e βr Γ(α) mean = α/β mean (in phylogenetics) = 1 (in phylogenetics) β = α variance = α/β 2 variance (in phylogenetics) = 1/α

105 Using Gamma-distributed rates across sites We usually use a discretized version of the gamma with 4-8 categories (the computation time increases linearly with the number of categories). Pr(x i JC + G) = ncat j Pr(x i JC, r j ν) Pr(r j ) where: ncat j r j Pr(r j ) = 1

106 Discrete gamma (continued) We break up the continuous gamma into intervals each of which has an equal probability, and use the mean rate within each interval as the representative rate for that rate category: Pr(r j ) = 1 ncat So: Pr(x i JC + G) = 1 ncat ncat j Pr(x i JC, r j ν)

107 Relative rates in 4-category case Boundary between 1st and 2nd categories Boundaries are placed so that each category represents 1/4 of the distribution (i.e. 1/4 of the area under the curve) 0.6 Relative rates represent the mean of their 0.4 category Boundary between 2nd and 3rd categories Boundary between 3rd and 4th categories 0.2 r 1 = r 2 = r 3 = r 4 = Copyright 2007 Paul O. Lewis 21

108 Discrete gamma rate heterogeneity in PAUP* To use gamma distributed rates with 4 categories: lset rates=gamma ncat=4; To estimate the shape parameter: lset shape=estimate; To combine pinvar with gamma: lset rates=gamma shape=0.2 pinvar=0.4; Note: estimate, previous, or a specific value can be specified for both shape and pinvar Copyright 2007 Paul O. Lewis 23

109 Rate homogeneity in PAUP* Just tell PAUP* that you want all rates to be equal and that you want all sites to be allowed to vary: lset rates=equal pinvar=0; Note: these are the default settings, but it is useful to know how to go back to rate homogeneity after you have experimented with rate heterogeneity! Copyright 2007 Paul O. Lewis 24

110 Likelihood ratio test Always compares an unconstrained to a constrained model Constrained model must be nested within the unconstrained model Parameter(s) take on their maximum likelihood estimates (MLEs) in the unconstrained model Parameters(s) set to some other value of interest in the constrained model Unconstrained model must be able to attain a higher maximum likelihood than the constrained model Copyright 2007 by Paul O. Lewis 2

111 Likelihood Ratio Test is the MLE is some other value Coin-flipping example: Data: 6 heads out of 10 flips Constrained model: fair coin (θ = 0.5) Unconstrained model: biased coin (θ = ) Example of likelihood calculation for case of θ = 0.6 Copyright 2007 by Paul O. Lewis 3

112 Likelihood Ratio Test Coin-flipping example: Data: 6 heads out of 10 flips Constrained model: fair coin (θ = 0.5) Unconstrained model: biased coin (θ = ) Not significant: P = This means that the simpler, constrained model cannot be rejected LRT approximates a chi-square random variable with d.f. equal to the difference in the number of free parameters between the two models Copyright 2007 by Paul O. Lewis 4

113 Examples of unconstrained vs. constrained model comparisons 1. GTR+G (shape=mle) vs. GTR (shape= ) 2. K80 (κ=mle) vs. JC (κ=1.0) 3. HKY+I+G (p inv =MLE) vs. HKY+G (p inv =0) 4. HKY+I+G (p inv =MLE, shape=mle) vs. HKY (p inv =0, shape= ) Note: cases in which the constrained model involves setting a parameter to the edge of its valid range (e.g. cases 1, 2 and 4 above) require special consideration (see Ota et al. 2000) Ota, R., P. J. Waddell, M. Hasegawa, H. Shimodaira, and H. Kishino Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters. Molecular Biology and Evolution 17: Copyright 2007 by Paul O. Lewis 6

114 Testing the molecular clock Unconstrained model: need to estimate 2n-3 = 11 branch lengths Constrained model: need to estimate n-1 = 6 divergence times t 1 t2 t 3 t 4 t 5 t 6 n = 7 taxa Likelihood ratio test thus has (2n-3) - (n-1) = n-2 d.f. Copyright 2007 by Paul O. Lewis 7

115 Akaike Information Criterion AIC = -2 max(lnl) + 2K K is number of free model parameters Measures relative distance to true model Model with smallest AIC wins Advantage over LRT: non-nested models Example: 6 heads/10 flips revisited Unconstrained model: θ = 0.6, AIC = -2(-1.383) + 2(1) = Constrained model: θ = 0.5, AIC = -2(-1.584) + 2(0) = (best) Copyright 2007 by Paul O. Lewis 8

116 Bayesian Information Criterion BIC = -2 max(lnl) + K log(n) K is number of free model parameters n is the sample size Model with smallest BIC wins Advantage over LRT: non-nested models Considered superior to both AIC and LRT Example: 6 heads/10 flips one more time. Note: log(10) 2.3 Unconstrained model: θ = 0.6, BIC = -2(-1.383) + (2.3)(1) = Constrained model: θ = 0.5, BIC = -2(-1.584) + 0 = (best) Copyright 2007 by Paul O. Lewis 9

117 Likelihood ratio test favors more complex models Assume the simpler, constrained model is the true model If the LRT was statistically consistent, it would choose the true model with certainty as n But the simpler model will be rejected 5% of the time, regardless of sample size Thus, LRT biased toward choosing the more complex, unconstrained model Copyright 2007 by Paul O. Lewis 11

118 References

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics Maximum Likelihood in Phylogenetics June 1, 2009 Smithsonian Workshop on Molecular Evolution Paul O. Lewis Department of Ecology & Evolutionary Biology University of Connecticut, Storrs, CT Copyright 2009

More information

Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics Maximum Likelihood in Phylogenetics 26 January 2011 Workshop on Molecular Evolution Český Krumlov, Česká republika Paul O. Lewis Department of Ecology & Evolutionary Biology University of Connecticut,

More information

Pinvar approach. Remarks: invariable sites (evolve at relative rate 0) variable sites (evolves at relative rate r)

Pinvar approach. Remarks: invariable sites (evolve at relative rate 0) variable sites (evolves at relative rate r) Pinvar approach Unlike the site-specific rates approach, this approach does not require you to assign sites to rate categories Assumes there are only two classes of sites: invariable sites (evolve at relative

More information

Aoife McLysaght Dept. of Genetics Trinity College Dublin

Aoife McLysaght Dept. of Genetics Trinity College Dublin Aoife McLysaght Dept. of Genetics Trinity College Dublin Evolution of genome arrangement Evolution of genome content. Evolution of genome arrangement Gene order changes Inversions, translocations Evolution

More information

Objective: You will be able to justify the claim that organisms share many conserved core processes and features.

Objective: You will be able to justify the claim that organisms share many conserved core processes and features. Objective: You will be able to justify the claim that organisms share many conserved core processes and features. Do Now: Read Enduring Understanding B Essential knowledge: Organisms share many conserved

More information

Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics Maximum Likelihood in Phylogenetics 23 July 2013 Workshop on Molecular Evolution Woods Hole, Massachusetts Paul O. Lewis Department of Ecology & Evolutionary Biology University of Connecticut, Storrs,

More information

Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics Maximum Likelihood in Phylogenetics 29 July 2014 Workshop on Molecular Evolution Woods Hole, Massachusetts Paul O. Lewis Department of Ecology & Evolutionary Biology Paul O. Lewis (2014 Woods Hole Workshop

More information

Likelihood in Phylogenetics

Likelihood in Phylogenetics Likelihood in Phylogenetics 22 July 2017 Workshop on Molecular Evolution Woods Hole, Mass. Paul O. Lewis Department of Ecology & Evolutionary Biology Paul O. Lewis (2017 Woods Hole Workshop in Molecular

More information

In previous lecture. Shannon s information measure x. Intuitive notion: H = number of required yes/no questions.

In previous lecture. Shannon s information measure x. Intuitive notion: H = number of required yes/no questions. In previous lecture Shannon s information measure H ( X ) p log p log p x x 2 x 2 x Intuitive notion: H = number of required yes/no questions. The basic information unit is bit = 1 yes/no question or coin

More information

Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics Maximum Likelihood in Phylogenetics 28 January 2015 Paul O. Lewis Department of Ecology & Evolutionary Biology Workshop on Molecular Evolution Český Krumlov Paul O. Lewis (2015 Czech Republic Workshop

More information

Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics Maximum Likelihood in Phylogenetics 21 July 2015 Workshop on Molecular Evolution Woods Hole, Mass. Paul O. Lewis Department of Ecology & Evolutionary Biology Paul O. Lewis (2015 Woods Hole Workshop in

More information

Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics Maximum Likelihood in Phylogenetics 21 July 2015 Workshop on Molecular Evolution Woods Hole, Mass. Paul O. Lewis Department of Ecology & Evolutionary Biology Paul O. Lewis (2015 Woods Hole Workshop in

More information

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe? How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What

More information

Lecture IV A. Shannon s theory of noisy channels and molecular codes

Lecture IV A. Shannon s theory of noisy channels and molecular codes Lecture IV A Shannon s theory of noisy channels and molecular codes Noisy molecular codes: Rate-Distortion theory S Mapping M Channel/Code = mapping between two molecular spaces. Two functionals determine

More information

Using an Artificial Regulatory Network to Investigate Neural Computation

Using an Artificial Regulatory Network to Investigate Neural Computation Using an Artificial Regulatory Network to Investigate Neural Computation W. Garrett Mitchener College of Charleston January 6, 25 W. Garrett Mitchener (C of C) UM January 6, 25 / 4 Evolution and Computing

More information

Edinburgh Research Explorer

Edinburgh Research Explorer Edinburgh Research Explorer Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable

More information

Genetic Code, Attributive Mappings and Stochastic Matrices

Genetic Code, Attributive Mappings and Stochastic Matrices Genetic Code, Attributive Mappings and Stochastic Matrices Matthew He Division of Math, Science and Technology Nova Southeastern University Ft. Lauderdale, FL 33314, USA Email: hem@nova.edu Abstract: In

More information

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26 Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

More information

Reducing Redundancy of Codons through Total Graph

Reducing Redundancy of Codons through Total Graph American Journal of Bioinformatics Original Research Paper Reducing Redundancy of Codons through Total Graph Nisha Gohain, Tazid Ali and Adil Akhtar Department of Mathematics, Dibrugarh University, Dibrugarh-786004,

More information

A modular Fibonacci sequence in proteins

A modular Fibonacci sequence in proteins A modular Fibonacci sequence in proteins P. Dominy 1 and G. Rosen 2 1 Hagerty Library, Drexel University, Philadelphia, PA 19104, USA 2 Department of Physics, Drexel University, Philadelphia, PA 19104,

More information

T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid

T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid Lecture 11 Increasing Model Complexity I. Introduction. At this point, we ve increased the complexity of models of substitution considerably, but we re still left with the assumption that rates are uniform

More information

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certifi cate of Education Advanced Subsidiary Level and Advanced Level

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certifi cate of Education Advanced Subsidiary Level and Advanced Level *1166350738* UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certifi cate of Education Advanced Subsidiary Level and Advanced Level CEMISTRY 9701/43 Paper 4 Structured Questions October/November

More information

Biology 155 Practice FINAL EXAM

Biology 155 Practice FINAL EXAM Biology 155 Practice FINAL EXAM 1. Which of the following is NOT necessary for adaptive evolution? a. differential fitness among phenotypes b. small population size c. phenotypic variation d. heritability

More information

A p-adic Model of DNA Sequence and Genetic Code 1

A p-adic Model of DNA Sequence and Genetic Code 1 ISSN 2070-0466, p-adic Numbers, Ultrametric Analysis and Applications, 2009, Vol. 1, No. 1, pp. 34 41. c Pleiades Publishing, Ltd., 2009. RESEARCH ARTICLES A p-adic Model of DNA Sequence and Genetic Code

More information

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

Preliminaries. Download PAUP* from:   Tuesday, July 19, 16 Preliminaries Download PAUP* from: http://people.sc.fsu.edu/~dswofford/paup_test 1 A model of the Boston T System 1 Idea from Paul Lewis A simpler model? 2 Why do models matter? Model-based methods including

More information

Natural Selection. Nothing in Biology makes sense, except in the light of evolution. T. Dobzhansky

Natural Selection. Nothing in Biology makes sense, except in the light of evolution. T. Dobzhansky It is interesting to contemplate a tangled bank, clothed with many plants of many kinds, with birds singing on the bushes, with various insects flitting about, and with worms crawling through the damp

More information

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018 Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Genetic code on the dyadic plane

Genetic code on the dyadic plane Genetic code on the dyadic plane arxiv:q-bio/0701007v3 [q-bio.qm] 2 Nov 2007 A.Yu.Khrennikov, S.V.Kozyrev June 18, 2018 Abstract We introduce the simple parametrization for the space of codons (triples

More information

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

More information

Lab 9: Maximum Likelihood and Modeltest

Lab 9: Maximum Likelihood and Modeltest Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2010 Updated by Nick Matzke Lab 9: Maximum Likelihood and Modeltest In this lab we re going to use PAUP*

More information

A Minimum Principle in Codon-Anticodon Interaction

A Minimum Principle in Codon-Anticodon Interaction A Minimum Principle in Codon-Anticodon Interaction A. Sciarrino a,b,, P. Sorba c arxiv:0.480v [q-bio.qm] 9 Oct 0 Abstract a Dipartimento di Scienze Fisiche, Università di Napoli Federico II Complesso Universitario

More information

Mathematics of Bioinformatics ---Theory, Practice, and Applications (Part II)

Mathematics of Bioinformatics ---Theory, Practice, and Applications (Part II) Mathematics of Bioinformatics ---Theory, Practice, and Applications (Part II) Matthew He, Ph.D. Professor/Director Division of Math, Science, and Technology Nova Southeastern University, Florida, USA December

More information

Lect. 19. Natural Selection I. 4 April 2017 EEB 2245, C. Simon

Lect. 19. Natural Selection I. 4 April 2017 EEB 2245, C. Simon Lect. 19. Natural Selection I 4 April 2017 EEB 2245, C. Simon Last Time Gene flow reduces among population variability, reduces structure Interaction of climate, ecology, bottlenecks, drift, and gene flow

More information

CHEMISTRY 9701/42 Paper 4 Structured Questions May/June hours Candidates answer on the Question Paper. Additional Materials: Data Booklet

CHEMISTRY 9701/42 Paper 4 Structured Questions May/June hours Candidates answer on the Question Paper. Additional Materials: Data Booklet Cambridge International Examinations Cambridge International Advanced Level CHEMISTRY 9701/42 Paper 4 Structured Questions May/June 2014 2 hours Candidates answer on the Question Paper. Additional Materials:

More information

Probabilistic modeling and molecular phylogeny

Probabilistic modeling and molecular phylogeny Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical

More information

Analysis of Codon Usage Bias of Delta 6 Fatty Acid Elongase Gene in Pyramimonas cordata isolate CS-140

Analysis of Codon Usage Bias of Delta 6 Fatty Acid Elongase Gene in Pyramimonas cordata isolate CS-140 Analysis of Codon Usage Bias of Delta 6 Fatty Acid Elongase Gene in Pyramimonas cordata isolate CS-140 Xue Wei Dong 1, You Zhi Li 1, Yu Ping Bi 2, Zhen Ying Peng 2, Qing Fang He 2,3* 1. College of Life

More information

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

A Mathematical Model of the Genetic Code, the Origin of Protein Coding, and the Ribosome as a Dynamical Molecular Machine

A Mathematical Model of the Genetic Code, the Origin of Protein Coding, and the Ribosome as a Dynamical Molecular Machine A Mathematical Model of the Genetic Code, the Origin of Protein Coding, and the Ribosome as a Dynamical Molecular Machine Diego L. Gonzalez CNR- IMM Is)tuto per la Microele4ronica e i Microsistemi Dipar)mento

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

The degeneracy of the genetic code and Hadamard matrices. Sergey V. Petoukhov

The degeneracy of the genetic code and Hadamard matrices. Sergey V. Petoukhov The degeneracy of the genetic code and Hadamard matrices Sergey V. Petoukhov Department of Biomechanics, Mechanical Engineering Research Institute of the Russian Academy of Sciences petoukhov@hotmail.com,

More information

Natural Selection. Nothing in Biology makes sense, except in the light of evolution. T. Dobzhansky

Natural Selection. Nothing in Biology makes sense, except in the light of evolution. T. Dobzhansky It is interesting to contemplate a tangled bank, clothed with many plants of many kinds, with birds singing on the bushes, with various insects flitting about, and with worms crawling through the damp

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

THE GENETIC CODE INVARIANCE: WHEN EULER AND FIBONACCI MEET

THE GENETIC CODE INVARIANCE: WHEN EULER AND FIBONACCI MEET Symmetry: Culture and Science Vol. 25, No. 3, 261-278, 2014 THE GENETIC CODE INVARIANCE: WHEN EULER AND FIBONACCI MEET Tidjani Négadi Address: Department of Physics, Faculty of Science, University of Oran,

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/39

More information

Mutation models I: basic nucleotide sequence mutation models

Mutation models I: basic nucleotide sequence mutation models Mutation models I: basic nucleotide sequence mutation models Peter Beerli September 3, 009 Mutations are irreversible changes in the DNA. This changes may be introduced by chance, by chemical agents, or

More information

Evolutionary Analysis of Viral Genomes

Evolutionary Analysis of Viral Genomes University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral

More information

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton. 2017-07-29 part 4: and biological inference review types of models phenomenological Newton F= Gm1m2 r2 mechanistic Einstein Gαβ = 8π Tαβ 1 molecular evolution is process and pattern process pattern MutSel

More information

Molecular Evolution, course # Final Exam, May 3, 2006

Molecular Evolution, course # Final Exam, May 3, 2006 Molecular Evolution, course #27615 Final Exam, May 3, 2006 This exam includes a total of 12 problems on 7 pages (including this cover page). The maximum number of points obtainable is 150, and at least

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

Lecture Notes: Markov chains

Lecture Notes: Markov chains Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

More information

ATTRIBUTIVE CONCEPTION OF GENETIC CODE, ITS BI-PERIODIC TABLES AND PROBLEM OF UNIFICATION BASES OF BIOLOGICAL LANGUAGES *

ATTRIBUTIVE CONCEPTION OF GENETIC CODE, ITS BI-PERIODIC TABLES AND PROBLEM OF UNIFICATION BASES OF BIOLOGICAL LANGUAGES * Symmetry: Culture and Science Vols. 14-15, 281-307, 2003-2004 ATTRIBUTIVE CONCEPTION OF GENETIC CODE, ITS BI-PERIODIC TABLES AND PROBLEM OF UNIFICATION BASES OF BIOLOGICAL LANGUAGES * Sergei V. Petoukhov

More information

Three-Dimensional Algebraic Models of the trna Code and 12 Graphs for Representing the Amino Acids

Three-Dimensional Algebraic Models of the trna Code and 12 Graphs for Representing the Amino Acids Life 2014, 4, 341-373; doi:10.3390/life4030341 Article OPEN ACCESS life ISSN 2075-1729 www.mdpi.com/journal/life Three-Dimensional Algebraic Models of the trna Code and 12 Graphs for Representing the Amino

More information

Get started on your Cornell notes right away

Get started on your Cornell notes right away UNIT 10: Evolution DAYSHEET 100: Introduction to Evolution Name Biology I Date: Bellringer: 1. Get out your technology and go to www.biomonsters.com 2. Click the Biomonsters Cinema link. 3. Click the CHS

More information

Ribosome kinetics and aa-trna competition determine rate and fidelity of peptide synthesis

Ribosome kinetics and aa-trna competition determine rate and fidelity of peptide synthesis University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Hendrik J. Viljoen Publications Chemical and Biomolecular Research Papers -- Faculty Authors Series October 2007 Ribosome

More information

The genetic code, 8-dimensional hypercomplex numbers and dyadic shifts. Sergey V. Petoukhov

The genetic code, 8-dimensional hypercomplex numbers and dyadic shifts. Sergey V. Petoukhov The genetic code, 8-dimensional hypercomplex numbers and dyadic shifts Sergey V. Petoukhov Head of Laboratory of Biomechanical System, Mechanical Engineering Research Institute of the Russian Academy of

More information

CODING A LIFE FULL OF ERRORS

CODING A LIFE FULL OF ERRORS CODING A LIFE FULL OF ERRORS PITP ϕ(c 5 ) c 3 c 4 c 5 c 6 ϕ(c 1 ) ϕ(c 2 ) ϕ(c 3 ) ϕ(c 4 ) ϕ(c i ) c i c 7 c 8 c 9 c 10 c 11 c 12 IAS 2012 PART I What is Life? (biological and artificial) Self-replication.

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Abstract Following Petoukhov and his collaborators we use two length n zero-one sequences, α and β,

Abstract Following Petoukhov and his collaborators we use two length n zero-one sequences, α and β, Studying Genetic Code by a Matrix Approach Tanner Crowder 1 and Chi-Kwong Li 2 Department of Mathematics, The College of William and Mary, Williamsburg, Virginia 23185, USA E-mails: tjcrow@wmedu, ckli@mathwmedu

More information

PROTEIN SYNTHESIS INTRO

PROTEIN SYNTHESIS INTRO MR. POMERANTZ Page 1 of 6 Protein synthesis Intro. Use the text book to help properly answer the following questions 1. RNA differs from DNA in that RNA a. is single-stranded. c. contains the nitrogen

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

Crystal Basis Model of the Genetic Code: Structure and Consequences

Crystal Basis Model of the Genetic Code: Structure and Consequences Proceeings of Institute of Mathematics of NAS of Ukraine 2000, Vol. 30, Part 2, 481 488. Crystal Basis Moel of the Genetic Coe: Structure an Consequences L. FRAPPAT, A. SCIARRINO an P. SORBA Laboratoire

More information

Natural selection on the molecular level

Natural selection on the molecular level Natural selection on the molecular level Fundamentals of molecular evolution How DNA and protein sequences evolve? Genetic variability in evolution } Mutations } forming novel alleles } Inversions } change

More information

Practical Bioinformatics

Practical Bioinformatics 5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o

More information

What Is Conservation?

What Is Conservation? What Is Conservation? Lee A. Newberg February 22, 2005 A Central Dogma Junk DNA mutates at a background rate, but functional DNA exhibits conservation. Today s Question What is this conservation? Lee A.

More information

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Foundations of biomaterials: Models of protein solvation

Foundations of biomaterials: Models of protein solvation Foundations of biomaterials: Models of protein solvation L. Ridgway Scott The Institute for Biophysical Dynamics, The Computation Institute, and the Departments of Computer Science and Mathematics, The

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

Using algebraic geometry for phylogenetic reconstruction

Using algebraic geometry for phylogenetic reconstruction Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA

More information

Supplementary Information for

Supplementary Information for Supplementary Information for Evolutionary conservation of codon optimality reveals hidden signatures of co-translational folding Sebastian Pechmann & Judith Frydman Department of Biology and BioX, Stanford

More information

Phylogenetics. Andreas Bernauer, March 28, Expected number of substitutions using matrix algebra 2

Phylogenetics. Andreas Bernauer, March 28, Expected number of substitutions using matrix algebra 2 Phylogenetics Andreas Bernauer, andreas@carrot.mcb.uconn.edu March 28, 2004 Contents 1 ts:tr rate ratio vs. ts:tr ratio 1 2 Expected number of substitutions using matrix algebra 2 3 Why the GTR model can

More information

The Genetic Code Degeneracy and the Amino Acids Chemical Composition are Connected

The Genetic Code Degeneracy and the Amino Acids Chemical Composition are Connected 181 OPINION AND PERSPECTIVES The Genetic Code Degeneracy and the Amino Acids Chemical Composition are Connected Tidjani Négadi Abstract We show that our recently published Arithmetic Model of the genetic

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

In Silico Modelling and Analysis of Ribosome Kinetics and aa-trna Competition

In Silico Modelling and Analysis of Ribosome Kinetics and aa-trna Competition In Silico Modelling and Analysis o Ribosome Kinetics and aa-trna Competition D. Bošnački 1 T.E. Pronk 2 E.P. de Vink 3 Dept. o Biomedical Engineering, Eindhoven University o Technology Swammerdam Institute

More information

Inferring Molecular Phylogeny

Inferring Molecular Phylogeny Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction

More information

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22 Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200 Spring 2018 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley D.D. Ackerly Feb. 26, 2018 Maximum Likelihood Principles, and Applications to

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building How Molecules Evolve Guest Lecture: Principles and Methods of Systematic Biology 11 November 2013 Chris Simon Approaching phylogenetics from the point of view of the data Understanding how sequences evolve

More information

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

More information

Advanced topics in bioinformatics

Advanced topics in bioinformatics Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib

More information

1. Can we use the CFN model for morphological traits?

1. Can we use the CFN model for morphological traits? 1. Can we use the CFN model for morphological traits? 2. Can we use something like the GTR model for morphological traits? 3. Stochastic Dollo. 4. Continuous characters. Mk models k-state variants of the

More information

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood For: Prof. Partensky Group: Jimin zhu Rama Sharma Sravanthi Polsani Xin Gong Shlomit klopman April. 7. 2003 Table of Contents Introduction...3

More information

PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD

PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD Annu. Rev. Ecol. Syst. 1997. 28:437 66 Copyright c 1997 by Annual Reviews Inc. All rights reserved PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD John P. Huelsenbeck Department of

More information

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.

More information

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). 1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

More information

types of codon models

types of codon models part 3: analysis of natural selection pressure omega models! types of codon models if i and j differ by > π j for synonymous tv. Q ij = κπ j for synonymous ts. ωπ j for non-synonymous tv. ωκπ j for non-synonymous

More information

Bayesian Phylogenetics

Bayesian Phylogenetics Bayesian Phylogenetics Paul O. Lewis Department of Ecology & Evolutionary Biology University of Connecticut Woods Hole Molecular Evolution Workshop, July 27, 2006 2006 Paul O. Lewis Bayesian Phylogenetics

More information

Consensus Methods. * You are only responsible for the first two

Consensus Methods. * You are only responsible for the first two Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is

More information

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

Thanks to Paul Lewis and Joe Felsenstein for the use of slides Thanks to Paul Lewis and Joe Felsenstein for the use of slides Review Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPGMA infers a tree from a distance

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

In Silico Modelling and Analysis of Ribosome Kinetics and aa-trna Competition

In Silico Modelling and Analysis of Ribosome Kinetics and aa-trna Competition In Silico Modelling and Analysis of Ribosome Kinetics and aa-trna Competition D. Bošnački 1,, T.E. Pronk 2,, and E.P. de Vink 3, 1 Dept. of Biomedical Engineering, Eindhoven University of Technology 2

More information