Phylogenetics and Darwin. An Introduction to Phylogenetics. Tree of Life. Darwin s Trees
|
|
- Millicent Hancock
- 5 years ago
- Views:
Transcription
1 Phylogenetics and Darwin An Introduction to Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 4, 2008 A phylogeny is a tree diagram that shows the evolutionary relationships among a group of species. The first phylogeny is due to Charles Darwin. In 1837, shortly after his famous five-year voyage as naturalist on the Beagle, Darwin sketched a tree diagram in one of his notebooks. This simple sketch is remarkably similar to modern diagrams of phylogenies. In addition, the sole figure in The Origin of Species is a phylogeny. 1 / 70 Introduction History and Darwin 2 / 70 Darwin s Trees Tree of Life Darwin s 1837 Sketch Figure from The Origin of Species In The Origin of Species, Darwin describes a Tree of Life that represents the true evolutionary history of life. The affinities of all the beings of the same class have sometimes been represented by a great tree. I believe this simile largely speaks the truth.... The limbs divided into great branches, and these into lesser and lesser branches, were themselves once, when the tree was small, budding twigs; and this connexion of the former and present buds by ramifying branches may well represent the classification of all extinct and living species in groups subordinate to groups.... As buds give rise by growth to fresh buds, and these, if vigorous, branch out and overtop on all a feebler branch, so by generation I believe it has been with the Tree of Life, which fills with its dead and broken branches the crust of the earth, and covers the surface with its ever branching and beautiful ramifications. Introduction History and Darwin 3 / 70 Introduction History and Darwin 4 / 70
2 Early Phylogenetics Haeckel s Stylized Trees Shortly after the 1859 publication of The Origin of Species, many biologists came to accept the truth of a universal Tree of Life. Ernst Haeckel and many others created highly stylized trees that were based on expert opinion. A century passed before development of formal scientific methods for estimating phylogenies began. Introduction Early Phylogenetics 5 / 70 Introduction Early Phylogenetics 6 / 70 Modern Phylogenetics Phylogenetics and Systematics Phylogenies are usually estimated from aligned DNA sequence data. Phylogenetics is the primary tool for systematics. Phylogenetics is used for studying viruses such as HIV. Phylogenetics has been used in court for forensic purposes. Phylogenetics is being used increasingly in comparative genomics and study of gene function. Phylogenetic methods, particularly for molecular sequence data, have become the primary tool for systemicists to determine evolutionary relationships. These tools have been used to confirm expected relationships for example, that chimpanzees are the closest living relative to humans and have also been key in revealing several more surprising findings, including: birds are descended from dinosaurs; polar bears form a monophyletic group within brown bears; the most closely related land mammal to whales is the hippopotamus. Introduction Some Modern Uses of Phylogenies 7 / 70 Introduction Some Modern Uses of Phylogenies 8 / 70
3 RF BRVA US3 D31 SFMHS8 P896 SFMHS7 ENVVG YU2 ENVUSR2 SFMHS2 JH32 US2 SC WEAU160 JRCSF HXB2 SF2 ENVVA GB8.C1 ENVVF SF128A PHI159 SFMHS1 ADA ALA1 SFMHS20 85WCIPR54 CAM1 NY5CG MNCG SC14C DH US4 89SP061 US1 CDC452 HAN WR MBC MBC18 TH MBC925 MANC RL42 PHI OYI MBCD36 Phylogenetic Tree of Whales Phylogenetics and Forensics ScienceDirect - Full Size Image 09/04/ :10 AM CLOSE Phylogenetic trees have been used in several instances in the courts to provide evidence about the likely transmission of HIV. Examples include: Confirming that a nurse contracted HIV from mishap with a broken glass blood collection tube from an infected patient and not from an alternative source; Providing evidence of deliberate infection in a criminal case; Indicated that an infected friend was likely not the direct source of infection in a case. urlversion=0&_userid=443835&md5=df655f7ee732c807488f9262b841bcfc Page 1 of 2 Introduction Some Modern Uses of Phylogenies 9 / 70 Introduction Phylogenetics and Forensics 10 / 70 Forensic Phylogenetic Tree DNA Data from a Sample of Birds Reviews 30 HIV Forensics Figure 2 10% LC50 LC49 3 A40 A34 A41 A32 A37 A36 A30 A39 A38 A44 B22 B25 B27 B26 B24 B23 A43 A31 A33 A42 LC47 LC46 LC45 LC48 B28 B29 Figure 2. Neighbor-joining phylogram representing the reconstruction of the phylogenetic relationships between the env (C2-V5) sequences obtained from the index case (A31-44), the alleged recipient (B22-29), three local controls (LC45 and LC48; LC46 and LC47; and LC49 and LC50) and 48 sequences chosen from GenBank. Ten iterations of random sequence addition were used. Scale bar represents 10% genetic distance. Bootstrap values are shown at nodes with greater than 70% support First 24 bases of 1558 from Cox I gene. Alligator GTG AAC TTC CAC --- CGT TGA CTC... Emu GTG ACA TTC ATT ACT CGA TGA TTT... Kiwi GTG ACC TTT ACT ACT CGA TGA CTC... Ostrich GTG ACC TTC ATT ACT CGA TGA CTT... Swan GTG ACC TTC ATC AAC CGA TGA CTA... Goose GTG ACC TTC ATC AAC CGA TGA CTA... Chicken GTG ACC TTC ATC AAC CGA TGA TTA... Woodpecker GTG ACC TTC ATC AAC CGA TGA TTA... Finch ATG ACA TAC ATT AAC CGA TGA TTA... Ibis GTG ACC TTC ATC AAC CGA TGA CTA... Stork GTG ACC TTC ATT ACC CGA TGA CTA... Osprey ATG ACA TTC ATC AAC CGA TGA CTA... Falcon GTG ACC TTC ATC AAC CGA TGA CTA... Vulture ATG ACA TTC ATC AAT CGA TGA CTA... Penguin GTG ACC TTC ATT AAC CGA TGA CTA... Introduction Phylogenetics and Forensics 11 / 70 Example Phylogeny of Birds 12 / 70
4 An Estimated Phylogeny Activity 1: Example Tree Penguin Vulture Stork Ibis Woodpecker Osprey Finch Falcon Chicken Goose Swan Ostrich How many descendent taxa does the common ancestor of taxa A and C have? Which taxon is sister to A? Which taxa are more closely related, A and C or C and D? Which taxa are more closely related, A and E or D and E? F E D C B A Kiwi Emu Alligator Example Phylogeny of Birds 13 / 70 Trees Phylogeny Basics 14 / 70 Activity 2: Compare Trees Activity 3: Unrooted Trees Which trees have the same tree topology? F E D C B A E F D A E F D A B C F E D C B A C D E F E F D A Some methods estimate unrooted trees. If C is the outgroup, what is the rooted tree topology? If taxon C is the outgroup, which node is sister to B? If taxon A is the outgroup, which node is sister to B? How many rooted tree topologies are consistent with this unrooted tree topology? E A B D C B B B C A C Trees Unrooted Trees 16 / 70
5 How Many Trees? Formula for Counting Trees # of Taxa # Unrooted Trees # Rooted Trees The number of rooted tree topologies with n taxa is 1 3 (2n 3) (2n 3)!! for n 3. There are more rooted trees with 51 species ( ) than estimated # of hydrogen atoms in the universe ( ). Biologists often estimate trees with more than 100 species. Trees Counting Trees 17 / 70 Trees Counting Trees 18 / 70 Probabilistic Framework Markov Property Essentially, all models are wrong, but some are useful. George Box Commonly used models of molecular evolution treat sites as independent. These common models just need to describe the substitutions among four bases A, C, G, and T at a single site over time. The substitution process is modeled as a continuous-time Markov chain. Use the notation X (t) to represent the base at time t. X (t) {A, C, G, T } for DNA. Formal statement: P {X (s + t) = j X (s) = i, X (u) = x(u) for u < s} = P {X (s + t) = j X (s) = i} Informal understanding: given the present, the past is independent of the future If the expression does not depend on the time s, the Markov process is called homogeneous. Models of Molecular Evolution Continuous-time Markov Chains 19 / 70 Models of Molecular Evolution Continuous-time Markov Chains 20 / 70
6 Rate Matrix Alarm Clock Interpretation A stationary, homogeneous, continuous-time, finite-state-space Markov chain is parameterized by a rate matrix where: off-diagonal rates are nonnegative; diagonal terms are negative row sums of off-diagonal elements; consequently, row sums are zero. Example: Q = {q ij } = How to simulate a continuous-time Markov chain beginning in state i. time to the next transition Exp(qi ) where q i q ii. transition is to state j with probability q ij k i q ik Models of Molecular Evolution Continuous-time Markov Chains 21 / 70 Models of Molecular Evolution Continuous-time Markov Chains 22 / 70 Path Probability Density Calculation Probability Transition Matrices Example: Begin at A, change to G at time 0.3, change to C at time 0.8, and then no more changes before time t = 1. P {path} = P {begin at A} ( 1.1e (1.1)(0.3) 0.6 ) 1.1 ( 0.9e (0.9)(0.5) 0.3 ) 0.9 (e (1.1)(0.2)) The transition matrix is P(t) = e Qt where e A = k=0 A k k! = I + A + A2 2 + A3 6 + A probability transition matrix has non-negative values and each row sums to one. Each row contains the probabilities from a probability distribution on the possible states of the Markov process. Models of Molecular Evolution Continuous-time Markov Chains 23 / 70 Models of Molecular Evolution Continuous-time Markov Chains 24 / 70
7 Examples Spectral Decomposition P(0.1) = P(1) = P(0.5) = P(10) = The matrix Q can be factored as V ΛV 1 where Λ is a diagonal matrix of the eigenvalues and V is the matrix whose columns are corresponding eigenvectors. All rate matrices Q will have an eigenvalue 0 with an eigenvector of all 1s as the rows sum to 0 by construction. Our example rate matrix Q has eigenvalues 0, 1, 1.5, and 2. The probability transition matrix is of the form P(t) = V e Λt V 1. This means that each probability can be written as a linear combination of exponential functions of the product of the time t and an eigenvalue. P(t) = i w ie λ i t. Models of Molecular Evolution Continuous-time Markov Chains 25 / 70 Models of Molecular Evolution Continuous-time Markov Chains 26 / 70 Numerical Example Stationary Distribution Q = V ΛV 1 = Well-behaved continuous-time Markov chains have a stationary distribution π. (For finite-state-space chains, irreducibility is sufficient.) When the time t is large enough, the probability P ij (t) will be close to π j for each i. (See P(10) from earlier.) The stationary distribution can be thought of as a long-run average the proportion of time the state spends in state i converges to π i. The stationary distribution satisfies π Q = 0. Also, π P(t) = π for any time t. Models of Molecular Evolution Continuous-time Markov Chains 27 / 70 Models of Molecular Evolution Continuous-time Markov Chains 28 / 70
8 Numerical Example Usual Parameterization ( ) π Q = 0 = ( ) The matrix Q = {q ij } is typically scaled and parameterized for i j where µ = i q ij = r ij π j /µ π i r ij π j which guarantees that π will be the stationary distribution when r ij = r ji. With this scaling, there is one expected transition per unit time. j i Models of Molecular Evolution Continuous-time Markov Chains 29 / 70 Models of Molecular Evolution Continuous-time Markov Chains 30 / 70 Time-reversibility General Time-Reversible Model A continuous-time Markov chain is time-reversible if the probability of a sequence of events is the same going forward as it is going backwards. The matrix Q is the matrix for a time-reversible Markov chain when π i q ij = π j q ji for all i and j. That is, the overall rate of substitutions from i to j equals the overall rate of substitutions from j to i for every pair of states i and j. The matrix equivalent is ΠQ = Q Π where Π = diag(π). The GTR model is the most general basic time-reversible continuous-time Markov model for nucleotide substitution. The model is typically parameterized with 8 free parameters where { rij π j /µ for i j q ij = j i q ij for i = j with µ = i π i j i r ijπ j. The stationary distribution pi has three free parameters as π sums to one; The vector r = (rac, r AG,..., r GT ) is usually constrained to five degrees of freedom (either by setting r GT = 1 or constraining the sum). Many other popular models are special cases. These models are often named by the initials of the authors and the year in which they were published. Models of Molecular Evolution Continuous-time Markov Chains 31 / 70 Models of Molecular Evolution General Time-Reversible Model 32 / 70
9 Other Common Models Transition Probabilities Long Name Short Name π r Jukes-Cantor JC69 uniform r AC = r AG = r AT = r CG = r CT = r GT Kimura 80 K80 uniform r AG = r CT, r AC = r AT = r CG = r GT Felsenstein 81 F81 free r AC = r AG = r AT = r CG = r CT = r GT Felsenstein 84 F84 free r AC = r AT = r CG = r GT r AG = (1 + κ/(π A + π G ))r AC r CT = (1 + κ/(π C + π T ))r AC Hasegawa et al. HKY85 free r AC = r AT = r CG = r GT r AG = r CT = κr AC Timura-Nei 93 TN93 free r AC = r AT = r CG = r GT r AG = κ 1 r AC r CT = κ 2 r AC There are closed form solutions to the probability transition matrices for each of the previous models except for GTR. All but GTR are special cases of Tamura-Nei. Models of Molecular Evolution General Time-Reversible Model 33 / 70 Models of Molecular Evolution General Time-Reversible Model 34 / 70 Tamura-Nei Model Tamura-Nei Model The rate matrix for TN93 is: 0 Q = µ 1 where (κ R π G + π Y ) π C κ R π G π T π A (κ Y π T + π R ) π G κ Y π T κ R π A π C (κ R π A + π Y ) π T π A κ Y π C π G (κ Y π C + π R ) πr = π A + π Y ; πy = π C + π T ; µ = 2(κR π A π G + κ Y π C π T + π R π Y ). 1 C A The transition probabilites for TN93 are P(t) = π A + π A π Y π R π A + π A π Y π R where β 2 + π G β π 3 π C (1 β 2 ) π G + π G π Y β R π 2 π G β R π 3 π T (1 β 2 ) R β 2 π A β π 3 R π C (1 β 2 ) π G + π G π Y β π 2 + π A β R π 3 R π T (1 β 2 ) π C + π C π R β π 2 + π T β Y π 4 Y π G (1 β 2 ) π T + π T π R β π 2 π T β Y π 4 Y π A (1 β 2 ) π A (1 β 2 ) π C + π C π R β π 2 π C Y β2 = exp( t/µ); β 3 = exp( (π R κ 1 + π Y )t/µ); β 4 = exp( (π Y κ 2 + π R )t/µ). β π 4 π G (1 β 2 ) π T + π T π R Y π Y β 2 + π C π Y β Models of Molecular Evolution General Time-Reversible Model 35 / 70 Models of Molecular Evolution General Time-Reversible Model 36 / 70
10 Rate Variation Among Sites Other Extensions A common extension to the standard CTMC models is to assume that there is rate variation among sites. At these sites, the Q matrix is multiplied by a site-specific rate. The two most popular extensions are: Invariant sites: some sites have rate 0 Gamma-distributed rates: rates are drawn from a mean 1 gamma distribution For computational tractability, the Gamma distribution is typically replaced by a mean 1 discrete distribution with four distinct rates based on quantiles of a Gamma distribution. There are many other model extensions in common use and under development. It is common to partition sites (by gene, by codon position, by genomic location) and to use different models for each part. The covarion model allows different lineages to have different rates at the same site. This is typically modeled with a hidden Markov model where the site can turn off. There are models for amino acid substitution, models for codons, models for RNA pairs, models that incorporate protein structure information, and so on. Current models still do not capture much of the important biological processes that affect evolution of molecular sequences. Models of Molecular Evolution General Time-Reversible Model 37 / 70 Models of Molecular Evolution General Time-Reversible Model 38 / 70 Distance Between Pairs of Taxa In a two-taxon tree, the distance between two taxa can be estimated under any model by maximum likelihood. If the distance is t and at site i one species has base A and the other has base C, the contribution to the likelihood at this site j is for a time-reversible model. The overall likelihood is L j (t) = π A P AC (t) = π C P CA (t) L(t) = j L j (t) Distance Between Pairs of Taxa For models with free π, it is common to estimate π with observed base frequencies. Other parameters are usually estimated by maximum likelihood. The simplest models have closed form solutions, others require numerical optimization. and the log-likelihood is l(t) = j log L j (t) = j ( log πx[j] + log P x[j]y[j] (t) ) Maximum Likelihood Estimation Maximum Likelihood Estimation for Pairs 39 / 70 Maximum Likelihood Estimation Maximum Likelihood Estimation for Pairs 40 / 70
11 Notation for the Alignment Notation for the Tree An alignment of m taxa and n sites will have mn nucleotide bases. Let the observed base for the ith taxon and the jth site be x ij. With a time-reversible model, the location of a root (where the CTMC begins at stationarity) does not affect the likelihood calculation. We can assume an unrooted tree without loss of generality. An unrooted tree with m taxa will have m 2 internal nodes. Number these nodes i = 1,..., 2m 2 with the first m for leaf nodes and the last m 2 for internal nodes. For calculation purposes, we will denote node ρ (which could be any node) as the root. There are 2m 3 edges in the tree, numbered e = 1,..., 2m 3. Relative to root node ρ, edge e connects parent node p(e) and child node c(e) where p(e) is closer to ρ than c(e). Edge e has length t e. Maximum Likelihood Estimation Likelihood Calculations on Trees 41 / 70 Maximum Likelihood Estimation Likelihood Calculations on Trees 42 / 70 Notation for Unobserved Data Likelihood of a Tree The likelihood for a tree is computed by summing over all possible bases at the internal nodes for each of the n sites. For each site, there are 4 m 2 possible allocations of bases at internal nodes we will index by k. Internal node i is set to nucleotide b ik at the kth allocation, i = m + 1,..., 2m 2. Let z(i, j, k) be the nucleotide at node i, site j, and allocation k. { xij if i m (i is a leaf node) z(i, j, k) = if i > m (i is an internal node) b ik Let P(t) be the 4 4 probability transition matrix over an edge of length t. The likelihood of the tree is ( ) π z(ρ,j,k) P z(p(e),j,k)z(c(e),j,k) (t e ) j k Notice that the sum is over the 4 m 2 possible allocations. A naive calculation would not be tractible for large trees. e Maximum Likelihood Estimation Likelihood Calculations on Trees 43 / 70 Maximum Likelihood Estimation Likelihood Calculations on Trees 44 / 70
12 Felsenstein s Pruning Algorithm The Algorithm for One Site Felsenstein s pruning algorithm is an example of dynamic programming. By saving partial calculations, the time complexity of the likelihood evaluation grows linearly with the number of sites, not exponentially. For each site and node, the algorithm depends on calculating the probability in the subtree rooted at that node for each possible base. The algorithm begins at the leaves of the tree and recurses to the root. The likelihood of the site is a weighted average of the conditional subtree probabilities at the root weighted by the stationary distribution. Define f j (i, b) to be the probability of the data at site j in the subtree rooted at node i conditional on the nucleotide at this node being b. For a leaf node, f j (i, b) = 1{x ij = b} For an internal node with children nodes indexed by c attached by edges of length t c, f j (i, b) = ( ) P bz (t c )f j (c, z) c z The likelihood at site j is L j = b π b f j (ρ, b) Maximum Likelihood Estimation Likelihood Calculations on Trees 45 / 70 Maximum Likelihood Estimation Likelihood Calculations on Trees 46 / 70 Example Example Do an example with five taxa for one site. See chalk board for example. P1 P f A C G T [1,] e e+00 [2,] e e+00 [3,] e e+00 [4,] e e+00 [5,] e e+00 [6,] e e-03 [7,] e e-04 [8,] e e-05 Maximum Likelihood Estimation Likelihood Calculations on Trees 47 / 70 Maximum Likelihood Estimation Likelihood Calculations on Trees 48 / 70
13 Maximum Likelihood Estimation for one Tree Tree Search For a single tree topology, the ML estimation requires optimization of branch lengths and of any parameters in the substitution model. Numerical optimization methods are required even for simple models and small trees. The search for the maximum likelihood tree conceptually requires obtaining the maximum likelihood for each possible tree topology and then picking the best of these. For more than a dozen or so taxa, exhaustive search is non feasible. Heuristic search algorithms typically define a neighborhood structure for possible topologies. The search goes through neighbors and jumps to the first neighbor with a higher likelihood. When all neighbors are inferior to the current tree, the search stops. Much improvement has been made in recent years (RAxML and GARLI are two modern ML programs). Maximum Likelihood Estimation Likelihood Calculations on Trees 49 / 70 Maximum Likelihood Estimation Search for Maximum Likelihood 50 / 70 Bayesian Inference Bayesian Phylogenetics In Bayesian inference, the posterior distribution is proportional to the product of the likelihood and the prior distribution. For parameters θ and data D, P {θ D} = P {D θ} P {θ} P {D} The denominator is the marginal likelihood of the data, which is the integral of the likelihood against the prior distribution.. For a phylogenetic problem, the parameter θ typically includes the tree topology, the edge lengths, and parameters for the substitution model. θ = (τ, ν, φ) Often we assume independence of these components: P {θ} = P {τ} P {ν} P {φ}. In a typical phylogenetic problem, the marginal likelihood cannot be computed as P {D} = P {D θ} P {θ} dθ Θ is a sum of very many terms (one for each topology) where each term is a high-dimensional integral of a complicated function. Bayesian Phylogenetics Mathematical Background 51 / 70 Bayesian Phylogenetics Phylogenetics 52 / 70
14 Phylogenetic Inference Sample-based Inference We may be interested in the posterior distribution of the tree topology, P {τ D}. When this posterior distribution is diffuse, we can summarize it by computing posterior distributions of clades. The posterior probability of a clade C is the sum of the posterior probabilities of all tree topologies that contain it. P {C D} = P {τ D} τ:c τ A consensus tree which includes as many clades with high posterior probability as possible is often used as a single tree summary of a distribution of the tree topology. Any aspect of a posterior distribution can be estimated from a sample drawn from the distribution. For example, the sample proportion of trees with topology τ 0 is an estimate of P {τ 0 D}. Also, the sample mean of a transition/transversion parameter κ is an estimate of the posterior mean E [κ D]. But how do we sample from a complicated posterior distribution? Bayesian Phylogenetics Phylogenetics 53 / 70 Bayesian Phylogenetics Phylogenetics 54 / 70 Markov Chain Monte Carlo Metropolis-Hastings Markov chain Monte Carlo (MCMC) is a mathematical method for obtaining dependent samples from a target distribution (such as a posterior distribution). The idea is to construct a Markov chain whose state space is the parameter space Θ where the stationary distribution of the Markov chain matches the target distribution, say P {θ D}. Simulating the Markov chain produces a sample θ 0, θ 1,... which, after discarding an initial burn-in portion, may be treated as a dependent sample from the target distribution. For notational convenience, let the target distribution be π(θ) = P {θ D}. The most common form of MCMC uses the Metropolis-Hastings algorithm in which a proposal distribution q which can depend on the most recently sampled θ i generates a proposal θ which is accepted with some probability. When accepted, θ i+1 = θ. When rejected, θ i+1 = θ i. The proposal distribution q is essentially arbitrary provided it can move around the entire space Θ. MCMC MCMC 55 / 70 MCMC MCMC 56 / 70
15 Metroplis-Hastings Algorithm MCMC Example The acceptance probability is { min 1, π(θ ) π(θ) q(θ } θ ) q(θ θ) J where J is a Jacobian. Notice the target density appears only as a ratio this means that it only need be known up to scalar, and we can simply evaluate h(θ) = P {D θ} P {θ} since π(θ ) π(θ) = P {D θ } P {θ } /P {D} P {D θ} P {θ} /P {D} = h(θ ) h(θ) Note that the proposal ratio can be tricky to compute. q(θ θ ) q(θ θ) MCMC MCMC 57 / 70 Target Distribution MCMC Example 58 / 70 First Point Proposal Distribution Initial Point Proposal Distribution MCMC Example 59 / 70 MCMC Example 60 / 70
16 First Proposal Second Proposal First Proposal Accept with probability 1 Second Proposal Accept with probability MCMC Example 61 / 70 MCMC Example 62 / 70 Third Proposal Beginning of Sample Third Proposal Accept with probability Sample So Far MCMC Example 63 / 70 MCMC Example 64 / 70
17 Larger Sample Comparison to Target Second Proposal MCMC Example 65 / 70 MCMC Example 66 / 70 Subtree Pruning Regrafting Rescaling a Tree See example from board. More details will be posted in a separate document. More details will be posted in a separate document. Acceptance Probabilities Examples 67 / 70 Acceptance Probabilities Examples 68 / 70
18 Cautions Bayesian Inference MCMC does not always converge; Should always run several chains with different random numbers and compare answers; If the true tree has some very short internal edges, Bayesian inference can mislead; Different likelihood models can lead to different results. Development of Bayesian methods has led to continual improvement in our ability to model and learn about molecular evolution. Bayesian inference uses likelihood, but requires a prior distribution. Bayesian inference is computationally intensive, but can be less so than ML plus bootstrapping. Bayesian inference directly measures items of interest on an easily interpretable probability scale. Some folks dislike the requirement of specifying a prior distribution. Summary Cautions 69 / 70 Summary Cautions 70 / 70
Modern Phylogenetics. An Introduction to Phylogenetics. Phylogenetics and Systematics. Phylogenetic Tree of Whales
Modern Phylogenetics n Introduction to Phylogenetics ret Larget larget@stat.wisc.edu epartments of otany and of Statistics University of Wisconsin Madison January 27, 2010 Phylogenies are usually estimated
More informationModels of Molecular Evolution
Models of Molecular Evolution Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison September 15, 2007 Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 1 /
More informationC.DARWIN ( )
C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships
More informationPhylogenetic Trees. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. September 8, 2009
Phylogenetic Trees ret Larget epartments of otany and of Statistics University of Wisconsin Madison September 8, 2009 Genetics/otany 629 (all 2009) Phylogenetic Trees September 8, 2009 1 / 13 rief History
More informationPhylogenetic Trees. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. September 8, 2011
Phylogenetic Trees ret Larget epartments of otany and of Statistics University of Wisconsin Madison September 8, 2011 Genetics/otany 629 (all 2011) Phylogenetic Trees September 7, 2011 1 / 13 rief History
More informationBayesian Phylogenetics
Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 Bayesian Phylogenetics 1 / 27 Who was Bayes? The Reverand Thomas Bayes was born
More informationWho was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?
Who was Bayes? Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 The Reverand Thomas Bayes was born in London in 1702. He was the
More informationPhylogenetics. BIOL 7711 Computational Bioscience
Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium
More informationStatistics 992 Continuous-time Markov Chains Spring 2004
Summary Continuous-time finite-state-space Markov chains are stochastic processes that are widely used to model the process of nucleotide substitution. This chapter aims to present much of the mathematics
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationMolecular Evolution & Phylogenetics
Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationUsing algebraic geometry for phylogenetic reconstruction
Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA
More informationBayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies
Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development
More informationHow should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?
How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What
More informationInferring Speciation Times under an Episodic Molecular Clock
Syst. Biol. 56(3):453 466, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701420643 Inferring Speciation Times under an Episodic Molecular
More informationTree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More informationReading for Lecture 13 Release v10
Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the
More informationAdditive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.
Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then
More informationMaximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018
Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras
More informationPhylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction
More informationLecture 6 Phylogenetic Inference
Lecture 6 Phylogenetic Inference From Darwin s notebook in 1837 Charles Darwin Willi Hennig From The Origin in 1859 Cladistics Phylogenetic inference Willi Hennig, Cladistics 1. Clade, Monophyletic group,
More informationA Bayesian Approach to Phylogenetics
A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally
More informationBioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics
Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods
More informationIntegrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley
Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;
More informationLECTURE 15 Markov chain Monte Carlo
LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte
More informationMutation models I: basic nucleotide sequence mutation models
Mutation models I: basic nucleotide sequence mutation models Peter Beerli September 3, 009 Mutations are irreversible changes in the DNA. This changes may be introduced by chance, by chemical agents, or
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationEvolutionary Models. Evolutionary Models
Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment
More informationBINF6201/8201. Molecular phylogenetic methods
BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics
More informationReconstruire le passé biologique modèles, méthodes, performances, limites
Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire
More informationLecture 11 Friday, October 21, 2011
Lecture 11 Friday, October 21, 2011 Phylogenetic tree (phylogeny) Darwin and classification: In the Origin, Darwin said that descent from a common ancestral species could explain why the Linnaean system
More informationAlgorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,
Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837
More informationMaximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.
Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf
More informationUsing phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)
Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures
More informationLecture 4. Models of DNA and protein change. Likelihood methods
Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36
More informationPhylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University
Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary
More informationNJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees
NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana
More information8/23/2014. Phylogeny and the Tree of Life
Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major
More informationHow to read and make phylogenetic trees Zuzana Starostová
How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation
More informationWeek 5: Distance methods, DNA and protein models
Week 5: Distance methods, DNA and protein models Genome 570 February, 2016 Week 5: Distance methods, DNA and protein models p.1/69 A tree and the expected distances it predicts E A 0.08 0.05 0.06 0.03
More informationProbabilistic modeling and molecular phylogeny
Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical
More informationWhat is Phylogenetics
What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)
More informationPhylogenetics: Building Phylogenetic Trees
1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should
More informationPhylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University
Phylogenetics: Parsimony and Likelihood COMP 571 - Spring 2016 Luay Nakhleh, Rice University The Problem Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S Assumptions
More informationPhylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationInferring Complex DNA Substitution Processes on Phylogenies Using Uniformization and Data Augmentation
Syst Biol 55(2):259 269, 2006 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 101080/10635150500541599 Inferring Complex DNA Substitution Processes on Phylogenies
More informationSubstitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A
GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods
More informationThe Generalized Neighbor Joining method
The Generalized Neighbor Joining method Ruriko Yoshida Dept. of Mathematics Duke University Joint work with Dan Levy and Lior Pachter www.math.duke.edu/ ruriko data mining 1 Challenge We would like to
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More information4/25/2009. Xi (Cici) Chen Texas Tech University. Ancient: folk taxonomy impulse to classify organisms Carl Linnaenus: binominal nomenclature (1735)
Xi (Cici) Chen Texas Tech University Ancient: folk taxonomy impulse to classify y p y organisms Carl Linnaenus: binominal nomenclature (1735) 1 Darwin, On the origin of species The affinities of all the
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationLecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22
Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationAn Introduction to Bayesian Inference of Phylogeny
n Introduction to Bayesian Inference of Phylogeny John P. Huelsenbeck, Bruce Rannala 2, and John P. Masly Department of Biology, University of Rochester, Rochester, NY 427, U.S.. 2 Department of Medical
More informationPhylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.
Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony
More informationTheory of Evolution Charles Darwin
Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationPhylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center
Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods
More informationC3020 Molecular Evolution. Exercises #3: Phylogenetics
C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from
More informationLetter to the Editor. Department of Biology, Arizona State University
Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona
More informationEstimating Evolutionary Trees. Phylogenetic Methods
Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent
More informationLecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26
Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and
More informationBayesian Models for Phylogenetic Trees
Bayesian Models for Phylogenetic Trees Clarence Leung* 1 1 McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada ABSTRACT Introduction: Inferring genetic ancestry of different species
More informationPhylogeny. November 7, 2017
Phylogeny November 7, 2017 Phylogenetics Phylon = tribe/race, genetikos = relative to birth Phylogenetics: study of evolutionary relationships among organisms, sequences, or anything in between Related
More informationEvolutionary Analysis of Viral Genomes
University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral
More information9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)
I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by
More informationLecture 3: Markov chains.
1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.
More informationPhylogenetic Inference using RevBayes
Phylogenetic Inference using RevBayes Model section using Bayes factors Sebastian Höhna 1 Overview This tutorial demonstrates some general principles of Bayesian model comparison, which is based on estimating
More informationBiology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):
Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Statistical estimation of models of sequence evolution Phylogenetic inference using maximum likelihood:
More informationPractical Bioinformatics
5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o
More informationLab 9: Maximum Likelihood and Modeltest
Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2010 Updated by Nick Matzke Lab 9: Maximum Likelihood and Modeltest In this lab we re going to use PAUP*
More informationBayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder
Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder Note 2: Paul Lewis has written nice software for demonstrating Markov
More informationA (short) introduction to phylogenetics
A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field
More informationMultiple Alignment. Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis
Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis gorm@cbs.dtu.dk Refresher: pairwise alignments 43.2% identity; Global alignment score: 374 10 20
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationCS5263 Bioinformatics. Guest Lecture Part II Phylogenetics
CS5263 Bioinformatics Guest Lecture Part II Phylogenetics Up to now we have focused on finding similarities, now we start focusing on differences (dissimilarities leading to distance measures). Identifying
More informationEstimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057
Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number
More informationPhylogenetic Inference using RevBayes
Phylogenetic Inference using RevBayes Substitution Models Sebastian Höhna 1 Overview This tutorial demonstrates how to set up and perform analyses using common nucleotide substitution models. The substitution
More informationBioinformatics tools for phylogeny and visualization. Yanbin Yin
Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and
More informationMolecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço
Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço jcarrico@fm.ul.pt Charles Darwin (1809-1882) Charles Darwin s tree of life in Notebook B, 1837-1838 Ernst Haeckel (1934-1919)
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationModern Evolutionary Classification. Section 18-2 pgs
Modern Evolutionary Classification Section 18-2 pgs 451-455 Modern Evolutionary Classification In a sense, organisms determine who belongs to their species by choosing with whom they will mate. Taxonomic
More informationMarkov Chain Monte Carlo The Metropolis-Hastings Algorithm
Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability
More informationBayesian phylogenetics. the one true tree? Bayesian phylogenetics
Bayesian phylogenetics the one true tree? the methods we ve learned so far try to get a single tree that best describes the data however, they admit that they don t search everywhere, and that it is difficult
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationMixture Models in Phylogenetic Inference. Mark Pagel and Andrew Meade Reading University.
Mixture Models in Phylogenetic Inference Mark Pagel and Andrew Meade Reading University m.pagel@rdg.ac.uk Mixture models in phylogenetic inference!some background statistics relevant to phylogenetic inference!mixture
More information(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise
Bot 421/521 PHYLOGENETIC ANALYSIS I. Origins A. Hennig 1950 (German edition) Phylogenetic Systematics 1966 B. Zimmerman (Germany, 1930 s) C. Wagner (Michigan, 1920-2000) II. Characters and character states
More informationInferring Molecular Phylogeny
Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction
More informationNomenclature and classification
Class entry quiz results year biology background major biology freshman college advanced environmental sophomore sciences college introductory landscape architecture junior highschool undeclared senior
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationChapter 16: Reconstructing and Using Phylogenies
Chapter Review 1. Use the phylogenetic tree shown at the right to complete the following. a. Explain how many clades are indicated: Three: (1) chimpanzee/human, (2) chimpanzee/ human/gorilla, and (3)chimpanzee/human/
More informationIdentifiability of the GTR+Γ substitution model (and other models) of DNA evolution
Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution Elizabeth S. Allman Dept. of Mathematics and Statistics University of Alaska Fairbanks TM Current Challenges and Problems
More informationarxiv: v1 [q-bio.pe] 4 Sep 2013
Version dated: September 5, 2013 Predicting ancestral states in a tree arxiv:1309.0926v1 [q-bio.pe] 4 Sep 2013 Predicting the ancestral character changes in a tree is typically easier than predicting the
More information