Phylogenetics and Darwin. An Introduction to Phylogenetics. Tree of Life. Darwin s Trees

Size: px
Start display at page:

Download "Phylogenetics and Darwin. An Introduction to Phylogenetics. Tree of Life. Darwin s Trees"

Transcription

1 Phylogenetics and Darwin An Introduction to Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 4, 2008 A phylogeny is a tree diagram that shows the evolutionary relationships among a group of species. The first phylogeny is due to Charles Darwin. In 1837, shortly after his famous five-year voyage as naturalist on the Beagle, Darwin sketched a tree diagram in one of his notebooks. This simple sketch is remarkably similar to modern diagrams of phylogenies. In addition, the sole figure in The Origin of Species is a phylogeny. 1 / 70 Introduction History and Darwin 2 / 70 Darwin s Trees Tree of Life Darwin s 1837 Sketch Figure from The Origin of Species In The Origin of Species, Darwin describes a Tree of Life that represents the true evolutionary history of life. The affinities of all the beings of the same class have sometimes been represented by a great tree. I believe this simile largely speaks the truth.... The limbs divided into great branches, and these into lesser and lesser branches, were themselves once, when the tree was small, budding twigs; and this connexion of the former and present buds by ramifying branches may well represent the classification of all extinct and living species in groups subordinate to groups.... As buds give rise by growth to fresh buds, and these, if vigorous, branch out and overtop on all a feebler branch, so by generation I believe it has been with the Tree of Life, which fills with its dead and broken branches the crust of the earth, and covers the surface with its ever branching and beautiful ramifications. Introduction History and Darwin 3 / 70 Introduction History and Darwin 4 / 70

2 Early Phylogenetics Haeckel s Stylized Trees Shortly after the 1859 publication of The Origin of Species, many biologists came to accept the truth of a universal Tree of Life. Ernst Haeckel and many others created highly stylized trees that were based on expert opinion. A century passed before development of formal scientific methods for estimating phylogenies began. Introduction Early Phylogenetics 5 / 70 Introduction Early Phylogenetics 6 / 70 Modern Phylogenetics Phylogenetics and Systematics Phylogenies are usually estimated from aligned DNA sequence data. Phylogenetics is the primary tool for systematics. Phylogenetics is used for studying viruses such as HIV. Phylogenetics has been used in court for forensic purposes. Phylogenetics is being used increasingly in comparative genomics and study of gene function. Phylogenetic methods, particularly for molecular sequence data, have become the primary tool for systemicists to determine evolutionary relationships. These tools have been used to confirm expected relationships for example, that chimpanzees are the closest living relative to humans and have also been key in revealing several more surprising findings, including: birds are descended from dinosaurs; polar bears form a monophyletic group within brown bears; the most closely related land mammal to whales is the hippopotamus. Introduction Some Modern Uses of Phylogenies 7 / 70 Introduction Some Modern Uses of Phylogenies 8 / 70

3 RF BRVA US3 D31 SFMHS8 P896 SFMHS7 ENVVG YU2 ENVUSR2 SFMHS2 JH32 US2 SC WEAU160 JRCSF HXB2 SF2 ENVVA GB8.C1 ENVVF SF128A PHI159 SFMHS1 ADA ALA1 SFMHS20 85WCIPR54 CAM1 NY5CG MNCG SC14C DH US4 89SP061 US1 CDC452 HAN WR MBC MBC18 TH MBC925 MANC RL42 PHI OYI MBCD36 Phylogenetic Tree of Whales Phylogenetics and Forensics ScienceDirect - Full Size Image 09/04/ :10 AM CLOSE Phylogenetic trees have been used in several instances in the courts to provide evidence about the likely transmission of HIV. Examples include: Confirming that a nurse contracted HIV from mishap with a broken glass blood collection tube from an infected patient and not from an alternative source; Providing evidence of deliberate infection in a criminal case; Indicated that an infected friend was likely not the direct source of infection in a case. urlversion=0&_userid=443835&md5=df655f7ee732c807488f9262b841bcfc Page 1 of 2 Introduction Some Modern Uses of Phylogenies 9 / 70 Introduction Phylogenetics and Forensics 10 / 70 Forensic Phylogenetic Tree DNA Data from a Sample of Birds Reviews 30 HIV Forensics Figure 2 10% LC50 LC49 3 A40 A34 A41 A32 A37 A36 A30 A39 A38 A44 B22 B25 B27 B26 B24 B23 A43 A31 A33 A42 LC47 LC46 LC45 LC48 B28 B29 Figure 2. Neighbor-joining phylogram representing the reconstruction of the phylogenetic relationships between the env (C2-V5) sequences obtained from the index case (A31-44), the alleged recipient (B22-29), three local controls (LC45 and LC48; LC46 and LC47; and LC49 and LC50) and 48 sequences chosen from GenBank. Ten iterations of random sequence addition were used. Scale bar represents 10% genetic distance. Bootstrap values are shown at nodes with greater than 70% support First 24 bases of 1558 from Cox I gene. Alligator GTG AAC TTC CAC --- CGT TGA CTC... Emu GTG ACA TTC ATT ACT CGA TGA TTT... Kiwi GTG ACC TTT ACT ACT CGA TGA CTC... Ostrich GTG ACC TTC ATT ACT CGA TGA CTT... Swan GTG ACC TTC ATC AAC CGA TGA CTA... Goose GTG ACC TTC ATC AAC CGA TGA CTA... Chicken GTG ACC TTC ATC AAC CGA TGA TTA... Woodpecker GTG ACC TTC ATC AAC CGA TGA TTA... Finch ATG ACA TAC ATT AAC CGA TGA TTA... Ibis GTG ACC TTC ATC AAC CGA TGA CTA... Stork GTG ACC TTC ATT ACC CGA TGA CTA... Osprey ATG ACA TTC ATC AAC CGA TGA CTA... Falcon GTG ACC TTC ATC AAC CGA TGA CTA... Vulture ATG ACA TTC ATC AAT CGA TGA CTA... Penguin GTG ACC TTC ATT AAC CGA TGA CTA... Introduction Phylogenetics and Forensics 11 / 70 Example Phylogeny of Birds 12 / 70

4 An Estimated Phylogeny Activity 1: Example Tree Penguin Vulture Stork Ibis Woodpecker Osprey Finch Falcon Chicken Goose Swan Ostrich How many descendent taxa does the common ancestor of taxa A and C have? Which taxon is sister to A? Which taxa are more closely related, A and C or C and D? Which taxa are more closely related, A and E or D and E? F E D C B A Kiwi Emu Alligator Example Phylogeny of Birds 13 / 70 Trees Phylogeny Basics 14 / 70 Activity 2: Compare Trees Activity 3: Unrooted Trees Which trees have the same tree topology? F E D C B A E F D A E F D A B C F E D C B A C D E F E F D A Some methods estimate unrooted trees. If C is the outgroup, what is the rooted tree topology? If taxon C is the outgroup, which node is sister to B? If taxon A is the outgroup, which node is sister to B? How many rooted tree topologies are consistent with this unrooted tree topology? E A B D C B B B C A C Trees Unrooted Trees 16 / 70

5 How Many Trees? Formula for Counting Trees # of Taxa # Unrooted Trees # Rooted Trees The number of rooted tree topologies with n taxa is 1 3 (2n 3) (2n 3)!! for n 3. There are more rooted trees with 51 species ( ) than estimated # of hydrogen atoms in the universe ( ). Biologists often estimate trees with more than 100 species. Trees Counting Trees 17 / 70 Trees Counting Trees 18 / 70 Probabilistic Framework Markov Property Essentially, all models are wrong, but some are useful. George Box Commonly used models of molecular evolution treat sites as independent. These common models just need to describe the substitutions among four bases A, C, G, and T at a single site over time. The substitution process is modeled as a continuous-time Markov chain. Use the notation X (t) to represent the base at time t. X (t) {A, C, G, T } for DNA. Formal statement: P {X (s + t) = j X (s) = i, X (u) = x(u) for u < s} = P {X (s + t) = j X (s) = i} Informal understanding: given the present, the past is independent of the future If the expression does not depend on the time s, the Markov process is called homogeneous. Models of Molecular Evolution Continuous-time Markov Chains 19 / 70 Models of Molecular Evolution Continuous-time Markov Chains 20 / 70

6 Rate Matrix Alarm Clock Interpretation A stationary, homogeneous, continuous-time, finite-state-space Markov chain is parameterized by a rate matrix where: off-diagonal rates are nonnegative; diagonal terms are negative row sums of off-diagonal elements; consequently, row sums are zero. Example: Q = {q ij } = How to simulate a continuous-time Markov chain beginning in state i. time to the next transition Exp(qi ) where q i q ii. transition is to state j with probability q ij k i q ik Models of Molecular Evolution Continuous-time Markov Chains 21 / 70 Models of Molecular Evolution Continuous-time Markov Chains 22 / 70 Path Probability Density Calculation Probability Transition Matrices Example: Begin at A, change to G at time 0.3, change to C at time 0.8, and then no more changes before time t = 1. P {path} = P {begin at A} ( 1.1e (1.1)(0.3) 0.6 ) 1.1 ( 0.9e (0.9)(0.5) 0.3 ) 0.9 (e (1.1)(0.2)) The transition matrix is P(t) = e Qt where e A = k=0 A k k! = I + A + A2 2 + A3 6 + A probability transition matrix has non-negative values and each row sums to one. Each row contains the probabilities from a probability distribution on the possible states of the Markov process. Models of Molecular Evolution Continuous-time Markov Chains 23 / 70 Models of Molecular Evolution Continuous-time Markov Chains 24 / 70

7 Examples Spectral Decomposition P(0.1) = P(1) = P(0.5) = P(10) = The matrix Q can be factored as V ΛV 1 where Λ is a diagonal matrix of the eigenvalues and V is the matrix whose columns are corresponding eigenvectors. All rate matrices Q will have an eigenvalue 0 with an eigenvector of all 1s as the rows sum to 0 by construction. Our example rate matrix Q has eigenvalues 0, 1, 1.5, and 2. The probability transition matrix is of the form P(t) = V e Λt V 1. This means that each probability can be written as a linear combination of exponential functions of the product of the time t and an eigenvalue. P(t) = i w ie λ i t. Models of Molecular Evolution Continuous-time Markov Chains 25 / 70 Models of Molecular Evolution Continuous-time Markov Chains 26 / 70 Numerical Example Stationary Distribution Q = V ΛV 1 = Well-behaved continuous-time Markov chains have a stationary distribution π. (For finite-state-space chains, irreducibility is sufficient.) When the time t is large enough, the probability P ij (t) will be close to π j for each i. (See P(10) from earlier.) The stationary distribution can be thought of as a long-run average the proportion of time the state spends in state i converges to π i. The stationary distribution satisfies π Q = 0. Also, π P(t) = π for any time t. Models of Molecular Evolution Continuous-time Markov Chains 27 / 70 Models of Molecular Evolution Continuous-time Markov Chains 28 / 70

8 Numerical Example Usual Parameterization ( ) π Q = 0 = ( ) The matrix Q = {q ij } is typically scaled and parameterized for i j where µ = i q ij = r ij π j /µ π i r ij π j which guarantees that π will be the stationary distribution when r ij = r ji. With this scaling, there is one expected transition per unit time. j i Models of Molecular Evolution Continuous-time Markov Chains 29 / 70 Models of Molecular Evolution Continuous-time Markov Chains 30 / 70 Time-reversibility General Time-Reversible Model A continuous-time Markov chain is time-reversible if the probability of a sequence of events is the same going forward as it is going backwards. The matrix Q is the matrix for a time-reversible Markov chain when π i q ij = π j q ji for all i and j. That is, the overall rate of substitutions from i to j equals the overall rate of substitutions from j to i for every pair of states i and j. The matrix equivalent is ΠQ = Q Π where Π = diag(π). The GTR model is the most general basic time-reversible continuous-time Markov model for nucleotide substitution. The model is typically parameterized with 8 free parameters where { rij π j /µ for i j q ij = j i q ij for i = j with µ = i π i j i r ijπ j. The stationary distribution pi has three free parameters as π sums to one; The vector r = (rac, r AG,..., r GT ) is usually constrained to five degrees of freedom (either by setting r GT = 1 or constraining the sum). Many other popular models are special cases. These models are often named by the initials of the authors and the year in which they were published. Models of Molecular Evolution Continuous-time Markov Chains 31 / 70 Models of Molecular Evolution General Time-Reversible Model 32 / 70

9 Other Common Models Transition Probabilities Long Name Short Name π r Jukes-Cantor JC69 uniform r AC = r AG = r AT = r CG = r CT = r GT Kimura 80 K80 uniform r AG = r CT, r AC = r AT = r CG = r GT Felsenstein 81 F81 free r AC = r AG = r AT = r CG = r CT = r GT Felsenstein 84 F84 free r AC = r AT = r CG = r GT r AG = (1 + κ/(π A + π G ))r AC r CT = (1 + κ/(π C + π T ))r AC Hasegawa et al. HKY85 free r AC = r AT = r CG = r GT r AG = r CT = κr AC Timura-Nei 93 TN93 free r AC = r AT = r CG = r GT r AG = κ 1 r AC r CT = κ 2 r AC There are closed form solutions to the probability transition matrices for each of the previous models except for GTR. All but GTR are special cases of Tamura-Nei. Models of Molecular Evolution General Time-Reversible Model 33 / 70 Models of Molecular Evolution General Time-Reversible Model 34 / 70 Tamura-Nei Model Tamura-Nei Model The rate matrix for TN93 is: 0 Q = µ 1 where (κ R π G + π Y ) π C κ R π G π T π A (κ Y π T + π R ) π G κ Y π T κ R π A π C (κ R π A + π Y ) π T π A κ Y π C π G (κ Y π C + π R ) πr = π A + π Y ; πy = π C + π T ; µ = 2(κR π A π G + κ Y π C π T + π R π Y ). 1 C A The transition probabilites for TN93 are P(t) = π A + π A π Y π R π A + π A π Y π R where β 2 + π G β π 3 π C (1 β 2 ) π G + π G π Y β R π 2 π G β R π 3 π T (1 β 2 ) R β 2 π A β π 3 R π C (1 β 2 ) π G + π G π Y β π 2 + π A β R π 3 R π T (1 β 2 ) π C + π C π R β π 2 + π T β Y π 4 Y π G (1 β 2 ) π T + π T π R β π 2 π T β Y π 4 Y π A (1 β 2 ) π A (1 β 2 ) π C + π C π R β π 2 π C Y β2 = exp( t/µ); β 3 = exp( (π R κ 1 + π Y )t/µ); β 4 = exp( (π Y κ 2 + π R )t/µ). β π 4 π G (1 β 2 ) π T + π T π R Y π Y β 2 + π C π Y β Models of Molecular Evolution General Time-Reversible Model 35 / 70 Models of Molecular Evolution General Time-Reversible Model 36 / 70

10 Rate Variation Among Sites Other Extensions A common extension to the standard CTMC models is to assume that there is rate variation among sites. At these sites, the Q matrix is multiplied by a site-specific rate. The two most popular extensions are: Invariant sites: some sites have rate 0 Gamma-distributed rates: rates are drawn from a mean 1 gamma distribution For computational tractability, the Gamma distribution is typically replaced by a mean 1 discrete distribution with four distinct rates based on quantiles of a Gamma distribution. There are many other model extensions in common use and under development. It is common to partition sites (by gene, by codon position, by genomic location) and to use different models for each part. The covarion model allows different lineages to have different rates at the same site. This is typically modeled with a hidden Markov model where the site can turn off. There are models for amino acid substitution, models for codons, models for RNA pairs, models that incorporate protein structure information, and so on. Current models still do not capture much of the important biological processes that affect evolution of molecular sequences. Models of Molecular Evolution General Time-Reversible Model 37 / 70 Models of Molecular Evolution General Time-Reversible Model 38 / 70 Distance Between Pairs of Taxa In a two-taxon tree, the distance between two taxa can be estimated under any model by maximum likelihood. If the distance is t and at site i one species has base A and the other has base C, the contribution to the likelihood at this site j is for a time-reversible model. The overall likelihood is L j (t) = π A P AC (t) = π C P CA (t) L(t) = j L j (t) Distance Between Pairs of Taxa For models with free π, it is common to estimate π with observed base frequencies. Other parameters are usually estimated by maximum likelihood. The simplest models have closed form solutions, others require numerical optimization. and the log-likelihood is l(t) = j log L j (t) = j ( log πx[j] + log P x[j]y[j] (t) ) Maximum Likelihood Estimation Maximum Likelihood Estimation for Pairs 39 / 70 Maximum Likelihood Estimation Maximum Likelihood Estimation for Pairs 40 / 70

11 Notation for the Alignment Notation for the Tree An alignment of m taxa and n sites will have mn nucleotide bases. Let the observed base for the ith taxon and the jth site be x ij. With a time-reversible model, the location of a root (where the CTMC begins at stationarity) does not affect the likelihood calculation. We can assume an unrooted tree without loss of generality. An unrooted tree with m taxa will have m 2 internal nodes. Number these nodes i = 1,..., 2m 2 with the first m for leaf nodes and the last m 2 for internal nodes. For calculation purposes, we will denote node ρ (which could be any node) as the root. There are 2m 3 edges in the tree, numbered e = 1,..., 2m 3. Relative to root node ρ, edge e connects parent node p(e) and child node c(e) where p(e) is closer to ρ than c(e). Edge e has length t e. Maximum Likelihood Estimation Likelihood Calculations on Trees 41 / 70 Maximum Likelihood Estimation Likelihood Calculations on Trees 42 / 70 Notation for Unobserved Data Likelihood of a Tree The likelihood for a tree is computed by summing over all possible bases at the internal nodes for each of the n sites. For each site, there are 4 m 2 possible allocations of bases at internal nodes we will index by k. Internal node i is set to nucleotide b ik at the kth allocation, i = m + 1,..., 2m 2. Let z(i, j, k) be the nucleotide at node i, site j, and allocation k. { xij if i m (i is a leaf node) z(i, j, k) = if i > m (i is an internal node) b ik Let P(t) be the 4 4 probability transition matrix over an edge of length t. The likelihood of the tree is ( ) π z(ρ,j,k) P z(p(e),j,k)z(c(e),j,k) (t e ) j k Notice that the sum is over the 4 m 2 possible allocations. A naive calculation would not be tractible for large trees. e Maximum Likelihood Estimation Likelihood Calculations on Trees 43 / 70 Maximum Likelihood Estimation Likelihood Calculations on Trees 44 / 70

12 Felsenstein s Pruning Algorithm The Algorithm for One Site Felsenstein s pruning algorithm is an example of dynamic programming. By saving partial calculations, the time complexity of the likelihood evaluation grows linearly with the number of sites, not exponentially. For each site and node, the algorithm depends on calculating the probability in the subtree rooted at that node for each possible base. The algorithm begins at the leaves of the tree and recurses to the root. The likelihood of the site is a weighted average of the conditional subtree probabilities at the root weighted by the stationary distribution. Define f j (i, b) to be the probability of the data at site j in the subtree rooted at node i conditional on the nucleotide at this node being b. For a leaf node, f j (i, b) = 1{x ij = b} For an internal node with children nodes indexed by c attached by edges of length t c, f j (i, b) = ( ) P bz (t c )f j (c, z) c z The likelihood at site j is L j = b π b f j (ρ, b) Maximum Likelihood Estimation Likelihood Calculations on Trees 45 / 70 Maximum Likelihood Estimation Likelihood Calculations on Trees 46 / 70 Example Example Do an example with five taxa for one site. See chalk board for example. P1 P f A C G T [1,] e e+00 [2,] e e+00 [3,] e e+00 [4,] e e+00 [5,] e e+00 [6,] e e-03 [7,] e e-04 [8,] e e-05 Maximum Likelihood Estimation Likelihood Calculations on Trees 47 / 70 Maximum Likelihood Estimation Likelihood Calculations on Trees 48 / 70

13 Maximum Likelihood Estimation for one Tree Tree Search For a single tree topology, the ML estimation requires optimization of branch lengths and of any parameters in the substitution model. Numerical optimization methods are required even for simple models and small trees. The search for the maximum likelihood tree conceptually requires obtaining the maximum likelihood for each possible tree topology and then picking the best of these. For more than a dozen or so taxa, exhaustive search is non feasible. Heuristic search algorithms typically define a neighborhood structure for possible topologies. The search goes through neighbors and jumps to the first neighbor with a higher likelihood. When all neighbors are inferior to the current tree, the search stops. Much improvement has been made in recent years (RAxML and GARLI are two modern ML programs). Maximum Likelihood Estimation Likelihood Calculations on Trees 49 / 70 Maximum Likelihood Estimation Search for Maximum Likelihood 50 / 70 Bayesian Inference Bayesian Phylogenetics In Bayesian inference, the posterior distribution is proportional to the product of the likelihood and the prior distribution. For parameters θ and data D, P {θ D} = P {D θ} P {θ} P {D} The denominator is the marginal likelihood of the data, which is the integral of the likelihood against the prior distribution.. For a phylogenetic problem, the parameter θ typically includes the tree topology, the edge lengths, and parameters for the substitution model. θ = (τ, ν, φ) Often we assume independence of these components: P {θ} = P {τ} P {ν} P {φ}. In a typical phylogenetic problem, the marginal likelihood cannot be computed as P {D} = P {D θ} P {θ} dθ Θ is a sum of very many terms (one for each topology) where each term is a high-dimensional integral of a complicated function. Bayesian Phylogenetics Mathematical Background 51 / 70 Bayesian Phylogenetics Phylogenetics 52 / 70

14 Phylogenetic Inference Sample-based Inference We may be interested in the posterior distribution of the tree topology, P {τ D}. When this posterior distribution is diffuse, we can summarize it by computing posterior distributions of clades. The posterior probability of a clade C is the sum of the posterior probabilities of all tree topologies that contain it. P {C D} = P {τ D} τ:c τ A consensus tree which includes as many clades with high posterior probability as possible is often used as a single tree summary of a distribution of the tree topology. Any aspect of a posterior distribution can be estimated from a sample drawn from the distribution. For example, the sample proportion of trees with topology τ 0 is an estimate of P {τ 0 D}. Also, the sample mean of a transition/transversion parameter κ is an estimate of the posterior mean E [κ D]. But how do we sample from a complicated posterior distribution? Bayesian Phylogenetics Phylogenetics 53 / 70 Bayesian Phylogenetics Phylogenetics 54 / 70 Markov Chain Monte Carlo Metropolis-Hastings Markov chain Monte Carlo (MCMC) is a mathematical method for obtaining dependent samples from a target distribution (such as a posterior distribution). The idea is to construct a Markov chain whose state space is the parameter space Θ where the stationary distribution of the Markov chain matches the target distribution, say P {θ D}. Simulating the Markov chain produces a sample θ 0, θ 1,... which, after discarding an initial burn-in portion, may be treated as a dependent sample from the target distribution. For notational convenience, let the target distribution be π(θ) = P {θ D}. The most common form of MCMC uses the Metropolis-Hastings algorithm in which a proposal distribution q which can depend on the most recently sampled θ i generates a proposal θ which is accepted with some probability. When accepted, θ i+1 = θ. When rejected, θ i+1 = θ i. The proposal distribution q is essentially arbitrary provided it can move around the entire space Θ. MCMC MCMC 55 / 70 MCMC MCMC 56 / 70

15 Metroplis-Hastings Algorithm MCMC Example The acceptance probability is { min 1, π(θ ) π(θ) q(θ } θ ) q(θ θ) J where J is a Jacobian. Notice the target density appears only as a ratio this means that it only need be known up to scalar, and we can simply evaluate h(θ) = P {D θ} P {θ} since π(θ ) π(θ) = P {D θ } P {θ } /P {D} P {D θ} P {θ} /P {D} = h(θ ) h(θ) Note that the proposal ratio can be tricky to compute. q(θ θ ) q(θ θ) MCMC MCMC 57 / 70 Target Distribution MCMC Example 58 / 70 First Point Proposal Distribution Initial Point Proposal Distribution MCMC Example 59 / 70 MCMC Example 60 / 70

16 First Proposal Second Proposal First Proposal Accept with probability 1 Second Proposal Accept with probability MCMC Example 61 / 70 MCMC Example 62 / 70 Third Proposal Beginning of Sample Third Proposal Accept with probability Sample So Far MCMC Example 63 / 70 MCMC Example 64 / 70

17 Larger Sample Comparison to Target Second Proposal MCMC Example 65 / 70 MCMC Example 66 / 70 Subtree Pruning Regrafting Rescaling a Tree See example from board. More details will be posted in a separate document. More details will be posted in a separate document. Acceptance Probabilities Examples 67 / 70 Acceptance Probabilities Examples 68 / 70

18 Cautions Bayesian Inference MCMC does not always converge; Should always run several chains with different random numbers and compare answers; If the true tree has some very short internal edges, Bayesian inference can mislead; Different likelihood models can lead to different results. Development of Bayesian methods has led to continual improvement in our ability to model and learn about molecular evolution. Bayesian inference uses likelihood, but requires a prior distribution. Bayesian inference is computationally intensive, but can be less so than ML plus bootstrapping. Bayesian inference directly measures items of interest on an easily interpretable probability scale. Some folks dislike the requirement of specifying a prior distribution. Summary Cautions 69 / 70 Summary Cautions 70 / 70

Modern Phylogenetics. An Introduction to Phylogenetics. Phylogenetics and Systematics. Phylogenetic Tree of Whales

Modern Phylogenetics. An Introduction to Phylogenetics. Phylogenetics and Systematics. Phylogenetic Tree of Whales Modern Phylogenetics n Introduction to Phylogenetics ret Larget larget@stat.wisc.edu epartments of otany and of Statistics University of Wisconsin Madison January 27, 2010 Phylogenies are usually estimated

More information

Models of Molecular Evolution

Models of Molecular Evolution Models of Molecular Evolution Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison September 15, 2007 Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 1 /

More information

C.DARWIN ( )

C.DARWIN ( ) C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships

More information

Phylogenetic Trees. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. September 8, 2009

Phylogenetic Trees. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. September 8, 2009 Phylogenetic Trees ret Larget epartments of otany and of Statistics University of Wisconsin Madison September 8, 2009 Genetics/otany 629 (all 2009) Phylogenetic Trees September 8, 2009 1 / 13 rief History

More information

Phylogenetic Trees. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. September 8, 2011

Phylogenetic Trees. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. September 8, 2011 Phylogenetic Trees ret Larget epartments of otany and of Statistics University of Wisconsin Madison September 8, 2011 Genetics/otany 629 (all 2011) Phylogenetic Trees September 7, 2011 1 / 13 rief History

More information

Bayesian Phylogenetics

Bayesian Phylogenetics Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 Bayesian Phylogenetics 1 / 27 Who was Bayes? The Reverand Thomas Bayes was born

More information

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem? Who was Bayes? Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 The Reverand Thomas Bayes was born in London in 1702. He was the

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

Statistics 992 Continuous-time Markov Chains Spring 2004

Statistics 992 Continuous-time Markov Chains Spring 2004 Summary Continuous-time finite-state-space Markov chains are stochastic processes that are widely used to model the process of nucleotide substitution. This chapter aims to present much of the mathematics

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Molecular Evolution & Phylogenetics

Molecular Evolution & Phylogenetics Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Using algebraic geometry for phylogenetic reconstruction

Using algebraic geometry for phylogenetic reconstruction Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe? How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What

More information

Inferring Speciation Times under an Episodic Molecular Clock

Inferring Speciation Times under an Episodic Molecular Clock Syst. Biol. 56(3):453 466, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701420643 Inferring Speciation Times under an Episodic Molecular

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Reading for Lecture 13 Release v10

Reading for Lecture 13 Release v10 Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018 Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

Lecture 6 Phylogenetic Inference

Lecture 6 Phylogenetic Inference Lecture 6 Phylogenetic Inference From Darwin s notebook in 1837 Charles Darwin Willi Hennig From The Origin in 1859 Cladistics Phylogenetic inference Willi Hennig, Cladistics 1. Clade, Monophyletic group,

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Mutation models I: basic nucleotide sequence mutation models

Mutation models I: basic nucleotide sequence mutation models Mutation models I: basic nucleotide sequence mutation models Peter Beerli September 3, 009 Mutations are irreversible changes in the DNA. This changes may be introduced by chance, by chemical agents, or

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

More information

Lecture 11 Friday, October 21, 2011

Lecture 11 Friday, October 21, 2011 Lecture 11 Friday, October 21, 2011 Phylogenetic tree (phylogeny) Darwin and classification: In the Origin, Darwin said that descent from a common ancestral species could explain why the Linnaean system

More information

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004, Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837

More information

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

More information

Week 5: Distance methods, DNA and protein models

Week 5: Distance methods, DNA and protein models Week 5: Distance methods, DNA and protein models Genome 570 February, 2016 Week 5: Distance methods, DNA and protein models p.1/69 A tree and the expected distances it predicts E A 0.08 0.05 0.06 0.03

More information

Probabilistic modeling and molecular phylogeny

Probabilistic modeling and molecular phylogeny Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University Phylogenetics: Parsimony and Likelihood COMP 571 - Spring 2016 Luay Nakhleh, Rice University The Problem Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S Assumptions

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Inferring Complex DNA Substitution Processes on Phylogenies Using Uniformization and Data Augmentation

Inferring Complex DNA Substitution Processes on Phylogenies Using Uniformization and Data Augmentation Syst Biol 55(2):259 269, 2006 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 101080/10635150500541599 Inferring Complex DNA Substitution Processes on Phylogenies

More information

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

The Generalized Neighbor Joining method

The Generalized Neighbor Joining method The Generalized Neighbor Joining method Ruriko Yoshida Dept. of Mathematics Duke University Joint work with Dan Levy and Lior Pachter www.math.duke.edu/ ruriko data mining 1 Challenge We would like to

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

4/25/2009. Xi (Cici) Chen Texas Tech University. Ancient: folk taxonomy impulse to classify organisms Carl Linnaenus: binominal nomenclature (1735)

4/25/2009. Xi (Cici) Chen Texas Tech University. Ancient: folk taxonomy impulse to classify organisms Carl Linnaenus: binominal nomenclature (1735) Xi (Cici) Chen Texas Tech University Ancient: folk taxonomy impulse to classify y p y organisms Carl Linnaenus: binominal nomenclature (1735) 1 Darwin, On the origin of species The affinities of all the

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22 Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

An Introduction to Bayesian Inference of Phylogeny

An Introduction to Bayesian Inference of Phylogeny n Introduction to Bayesian Inference of Phylogeny John P. Huelsenbeck, Bruce Rannala 2, and John P. Masly Department of Biology, University of Rochester, Rochester, NY 427, U.S.. 2 Department of Medical

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

Theory of Evolution Charles Darwin

Theory of Evolution Charles Darwin Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26 Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

More information

Bayesian Models for Phylogenetic Trees

Bayesian Models for Phylogenetic Trees Bayesian Models for Phylogenetic Trees Clarence Leung* 1 1 McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada ABSTRACT Introduction: Inferring genetic ancestry of different species

More information

Phylogeny. November 7, 2017

Phylogeny. November 7, 2017 Phylogeny November 7, 2017 Phylogenetics Phylon = tribe/race, genetikos = relative to birth Phylogenetics: study of evolutionary relationships among organisms, sequences, or anything in between Related

More information

Evolutionary Analysis of Viral Genomes

Evolutionary Analysis of Viral Genomes University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Lecture 3: Markov chains.

Lecture 3: Markov chains. 1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.

More information

Phylogenetic Inference using RevBayes

Phylogenetic Inference using RevBayes Phylogenetic Inference using RevBayes Model section using Bayes factors Sebastian Höhna 1 Overview This tutorial demonstrates some general principles of Bayesian model comparison, which is based on estimating

More information

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Statistical estimation of models of sequence evolution Phylogenetic inference using maximum likelihood:

More information

Practical Bioinformatics

Practical Bioinformatics 5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o

More information

Lab 9: Maximum Likelihood and Modeltest

Lab 9: Maximum Likelihood and Modeltest Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2010 Updated by Nick Matzke Lab 9: Maximum Likelihood and Modeltest In this lab we re going to use PAUP*

More information

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder Note 2: Paul Lewis has written nice software for demonstrating Markov

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Multiple Alignment. Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis

Multiple Alignment. Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis gorm@cbs.dtu.dk Refresher: pairwise alignments 43.2% identity; Global alignment score: 374 10 20

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics CS5263 Bioinformatics Guest Lecture Part II Phylogenetics Up to now we have focused on finding similarities, now we start focusing on differences (dissimilarities leading to distance measures). Identifying

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

Phylogenetic Inference using RevBayes

Phylogenetic Inference using RevBayes Phylogenetic Inference using RevBayes Substitution Models Sebastian Höhna 1 Overview This tutorial demonstrates how to set up and perform analyses using common nucleotide substitution models. The substitution

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço jcarrico@fm.ul.pt Charles Darwin (1809-1882) Charles Darwin s tree of life in Notebook B, 1837-1838 Ernst Haeckel (1934-1919)

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

Modern Evolutionary Classification. Section 18-2 pgs

Modern Evolutionary Classification. Section 18-2 pgs Modern Evolutionary Classification Section 18-2 pgs 451-455 Modern Evolutionary Classification In a sense, organisms determine who belongs to their species by choosing with whom they will mate. Taxonomic

More information

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability

More information

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics Bayesian phylogenetics the one true tree? the methods we ve learned so far try to get a single tree that best describes the data however, they admit that they don t search everywhere, and that it is difficult

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Mixture Models in Phylogenetic Inference. Mark Pagel and Andrew Meade Reading University.

Mixture Models in Phylogenetic Inference. Mark Pagel and Andrew Meade Reading University. Mixture Models in Phylogenetic Inference Mark Pagel and Andrew Meade Reading University m.pagel@rdg.ac.uk Mixture models in phylogenetic inference!some background statistics relevant to phylogenetic inference!mixture

More information

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise Bot 421/521 PHYLOGENETIC ANALYSIS I. Origins A. Hennig 1950 (German edition) Phylogenetic Systematics 1966 B. Zimmerman (Germany, 1930 s) C. Wagner (Michigan, 1920-2000) II. Characters and character states

More information

Inferring Molecular Phylogeny

Inferring Molecular Phylogeny Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction

More information

Nomenclature and classification

Nomenclature and classification Class entry quiz results year biology background major biology freshman college advanced environmental sophomore sciences college introductory landscape architecture junior highschool undeclared senior

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Chapter 16: Reconstructing and Using Phylogenies

Chapter 16: Reconstructing and Using Phylogenies Chapter Review 1. Use the phylogenetic tree shown at the right to complete the following. a. Explain how many clades are indicated: Three: (1) chimpanzee/human, (2) chimpanzee/ human/gorilla, and (3)chimpanzee/human/

More information

Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution

Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution Elizabeth S. Allman Dept. of Mathematics and Statistics University of Alaska Fairbanks TM Current Challenges and Problems

More information

arxiv: v1 [q-bio.pe] 4 Sep 2013

arxiv: v1 [q-bio.pe] 4 Sep 2013 Version dated: September 5, 2013 Predicting ancestral states in a tree arxiv:1309.0926v1 [q-bio.pe] 4 Sep 2013 Predicting the ancestral character changes in a tree is typically easier than predicting the

More information