Mixture Models in Phylogenetic Inference. Mark Pagel and Andrew Meade Reading University.
|
|
- Walter Fowler
- 6 years ago
- Views:
Transcription
1 Mixture Models in Phylogenetic Inference Mark Pagel and Andrew Meade Reading University
2 Mixture models in phylogenetic inference!some background statistics relevant to phylogenetic inference!mixture models!a mixture model for pattern heterogeneity Q!performance and results!real data!a mixture model for heterotachy t!4-taxon simulation study!performance and results!application to real data!mixtures of topologies T
3 The growth of GenBank
4 Growth in the use of molecular phylogenies in scientific research Cumulative publications: molecular AND phylogen*24000 PCR made available Nobel Prize to Kerry Mullis 2 years 3 years 4 years Year projected 2007 source: Web of Science, April, Search terms molecular AND phylogen*. Line of best fit is a quadratic spline or quadratic polynomial
5 Mixture models in phylogenetic inference L(Q)! P(D Q) conventional likelihood model mixture model in Q P(D Q,T) = " i P(D i Q,T) P(D Q 1,Q 2, Q J,T) = " i #w j P(D i Q j,t) Let T = (S, t) mixture model in t P(D Q,t 1,t 2 t J ) = " i #w j P(D i Q,t j )
6 Overview: Phylogenetic Inference from gene sequences 1. Homogeneous Process Model: A C G T Sp 1 AACGTTGTCCCTT Sp 2 AACGTTCCTTGCT.. Sp n CCGGTTGCAAGCT A --- q ac q ag q at C --- q cg q ct G --- q gt =1 T popular modifications to basic homogeneous model a) rate-heterogeneity (Yang, 1994): apply homogeneous model but allow rates to vary over sites according to a discrete gamma. Equivalent to fitting k rate matrices to each site, where the matrices differ from each other only by a proportional scalar b) partitioning data: apply a different rate matrix to different sites, chosen beforehand by the investigator
7 Limitations to homogeneous, rate-heterogeneity and partitioning approaches Homogeneous model: not all sites may evolve according to the same model Gamma model: rate variation may not be gamma or sites may vary in some other way Partitioning data: presumes investigator knows with certainty which model best applies to each site
8 Pattern-Heterogeneity Model of Gene-Sequence Evolution Allow for different sites to evolve in quantitatively or qualitatively different ways without prior partitioning by the investigator. Method: fit more than one model of evolution to each site, summing the likelihood at each site over all models. Allow the data to determine the best model for each site. Motivation for model: Pattern-heterogeneity model will always equal or better the performance of homogeneous or gamma rate heterogeneity models and normally improve on partitioning models. Frequently yields substantial improvements (100 s of log-units) Applications Phylogenetic inference Detecting regions within genes that evolve differently Detecting differences among genes
9 Applications of pattern-heterogeneity model Single gene alignment species 1 species n pattern 1 pattern 2 pattern 3 Concatenated gene alignment species 1 species 2 gene 1 gene 2 gene 3 gene k... species n Supermatrix alignment species 1 species 2 species n-k species n
10 Detecting pattern-heterogeneity in simulated data Method: I) generate simulated gene-sequence data on a random tree according to two different models of sequence evolution, creating two genes with different patterns of substitutions 2) analyse simulated data using homogeneous, gamma rates and pattern-heterogeneity model Tree used in simulations rate matrices A C G T A C G T --- Data: two genes of length 1200 and 800 sites
11 Posterior likelihoods for data simulated from two qualitatively distinct rate matrices log-likelihood Homogeneous Gamma (4) Pattern-Homogeneity Iteration of Markov Chain A <--> C A <--> G A <--> T C <--> G C <--> T G <--> T Q Q
12 Pattern-heterogeneity model: Simulated and estimated values of the rate parameters 5 A C G T Estimated values of rate parameters A C G T --- Q2 Q Actual values of rate parameters
13 Detecting pattern-heterogeneity in simulated data 15 log(q1 likelihood) - log(q2 likelihood) gene 1 gene Site in simulated sequence
14 Detecting rate-heterogeneity in simulated data Method: I) generate simulated gene-sequence data on a random tree according to a gamma rate heterogeneity model (continuous-gamma, $=1.0) 2) analyse data using homogeneous model, gamma rates (4 categories) and patternheterogeneity model (4 rate matrices) Tree used in simulations rate matrices A C G T A C G T --- Data: a gene of length 2000 sites
15 Posterior likelihoods for data simulated using four gamma-distributed rate categories log-likelihood Q(4) Q(1) Gamma(4) Iteration of Markov Chain
16 Average estimated transition rates from pattern-heterogeneity model applied to simulated gamma rate heterogeneity data: input values shown
17 Detecting secondary structure in Mitochondrial ribosomal 12s sequences n=57 mammals
18 Detecting secondary structure using the pattern-heterogeneity model Stem and loop structure leads to predictions about the pattern of nucleotide substitutions stems: dominated by Watson-Crick base pairings. Therefore expect compensatory substitutions to maintain Watson-Crick pairings. Non-compensatory changes have lower fitness. This predicts that transitional changes will occur at a far higher rate than transversional changes. Often observe Tr/Tv ratio of in mitochondrial DNA loops: no a priori substitutional pattern expected
19 Detecting secondary structure; likelihoods of pattern-heterogeneity and Gamma models log-likelihood Gamma (4) Pattern-heterogeneity (4) E7 2E7 3E7 4E7 5E7 Iteration number
20 Mammal 12S data: analysis of rate matrices A <-> C A <-> G A <-> T C <-> G C <-> T G <-> T Tr/Tv Q Q Q Q Best fit to loop or stem Q1 Q2 Q3 Q4 Stem Purines A G Pyrimidines C T Loop
21 Application of pattern heterogeneity to n=159 datasets (using a reversible-jump algorithm) RJ-algorithm allows GTR rate matirices to be gained and lost using split and merge moves Q split Q i merge Q Q j Jumps to higher or lower dimensions governed by: likelihood ratio % prior ratio % proposal ratio % jacobian Mean no. of taxa =39.9±32, median = 29 (range 7-178) Mean alignment size = 1983±2545 nucleotides, median = 1140 (range ) Mean no. of genes = 2.7±4.5, median = 1, (range 1-37) Allow RJ algorithm to run to apparent convergence Improvement over conventional modelling How many Qs?
22 Distribution of number of rate matrices (RJ-algorithm) and empirical weight 1 mean = 2.2±1.4 median =2 correlation with no. of taxa, r 2 =0.15 no. of sites in alignment, = r 2 =0.27 no. of genes, r 2 =0.25 no. of taxa has highest partial correlation number of rate matrices Weight Position
23 Improvement in log-likelihood over a single Q model 100 mean improvement = 265±608 log-units 75 mean improvement = 379± 698 for n=111 datasets with > 1 Q (light green) Increase in log-likelihood over single Q model
24 Improvement in log-likelihood versus no. of rate matrices 4000 log-likelihood difference No Q's note: curve is a quadratic fit
25 Conclusions Mixture model can detect pattern-heterogeneity in simulated and real data and recover the parameters of the model of sequence evolution Can lead in general to large improvements in likelihood over single Q model Can improve topology and reduce long-branch attraction Can reduce the node density artefact Can be used to investigate gene evolution (such as secondary structure) or be applied to concatenated data sets of multiple genes.
26 Heterotachy mixture model in t Mixture model in branch lengths, t P(D Q,t 1,t 2 t J ) = " i #w j P(D i Q,t j ) A B C A B C A B C What does the model deliver?!variation in rates across the tree (heterotachy): a non-parametric covarion? basic covarion model (Tuffley and Steel, 1998) Q (off or slow) - s ij I s ij I Questions!does it work?!can we estimate sets of branch lengths?!are there a small number of sets in real data? Q= s ji I Q (on or fast)- s ji I
27 Does the branch-lengths model work? Variable evolutionary rates over time: Where Parsimony beats ML site pattern 1 site pattern 2 A C A C p r B q D B D Site-specific variation in rates of evolution, causes maximum likelihood but not parsimony to find the wrong tree. One-half of the sites vary their rates as in the left diagram, and the remaining sites vary according to the right diagram. Maximum likelihood tends to put A and C and B and D together, whereas parsimony tends to find the true tree (Kolaczkowski and Thornton, 2004)
28 Simulation!four taxon tree A site pattern 1 site pattern 2 C A C!two sets of site patterns p r!750 sites per pattern B q D B D!generate 5000 random data sets of a hard case: p=0.75, q=0.05, r={0,0.4}!analyse with MP, JC-MCMC, Covarion, BL-MCMC (blind) Do we get back the true tree? A B C D
29 Information ratio = xxyy/(xyxy + xyyx) A r C e.g., aacc/(acac+acca) B D 6 5 information ratio r
30 A C 1 Probability of True Tree versus r B r D p 50 scores in r Parsimony = p 50 Y 0.5 JC MCMC = Covarion = BL MCMC = R r BL MCMC: recovered two branch length sets of correct lengths
31 Probability of recovering true tree vs. information ratio: BL MCMC model 1.8 probability correct information ratio
32 Can we estimate sets of branch lengths? Simulated branch-length sets data* Two branch length sets Analysis Likelihood Variance Weight BLS BLS BLS BLS Three branch length sets Analysis Likelihood Variance Weights BLS BLS BLS BLS Four branch length sets Analysis Likelihood Variance Weights BLS BLS BLS BLS *Data simulated on a tree of about 60 tips sites. Likelihood is mean from Markov chain; Variance is of the likelihood.
33 A reversible-jump algorithm for heterotachy -- allow each branch to choose one or more length -- need mechanism for splitting and merging branches -- test on simulated data 0.2 branch length set branch length set 1
34 Data simulated to have two branch-length sets log-likelihoods returned by one and two branch-length sets models and RJ algorithm Y Y log-likelihood BLS 1 log-likelihood BLS 2 log-likelihood RJ +925 log-units over BLS extra branches +870 log-units 73±2.46 branches 94% of improvement in log-likelihood with 65 fewer parameters iteration
35 Time series plot of no. of extra branches inferred from RJ algorithm No. branches iteration
36 Eukaryote phylogeny of 48 species based on four genes (EF1-$, EF2- $, and two trna synthetases) One branch-length set (homogeneous model) log-likelihood= two branch-length set log-likelihood= log-units for 94 extra branches RJ-algorithm log-likelihood = log-units(96%) for 45± 1.4 extra branches
37 Summary Branch-lengths model seems to work Can estimate branch length sets RJ algorithm can identify in which branches the heterotachy is most pronounced Uses Improve phylogenetic inference (e.g., should reduce long branch attraction) Identify where in tree and for which sites in the alignment the heterotachy is occurring Sp 1 AACGTTGTCCCTT Sp 2 AACGTTCCTTGCT.. Sp n CCGGTTGCAAGCT
38 A mixture of topologies L(Q)! P(D Q) conventional likelihood model mixture model in Q (Pagel and Meade, 2004) P(D Q,T) = "P(D i Q,T) i P(D Q 1,Q 2, Q J,T) = "#w j P(D i Q j,t) i j mixture model in T P(D Q,T 1,T 2 T J ) = "#w j P(D i Q,T j ) i j
39 Some causes of conflicting phylogenetic signals -- gene trees vs species trees (coalescent events) -- convergent evolution (e.g., lysozyme in cows and monkeys) -- gene duplication -- lateral gene transfer gene-duplication lateral gene transfer
40 How does the multiple-topologies mixture model work? conventional (Bayesian) inference alignment T likelihood iteration Li=1=P(D T1) Li=2=P(D T2) Li=3=P(D T3) n multiple-topologies mixture model alignment.. Li=n=P(D Tn) k distinct Markov chains T1 T2..Tk combined likelihood iteration n Converged combined likelihood: Li=n=w1L1+w2L2+..wkLk combined likelihood allows tree to diverge if this increases L calculate posterior probabilities of each tree separately Li=1=w1L1+w2L2+..wkLk Li=2=w1L1+w2L2+..wkLk Li=3=w1L1+w2L2+..wkLk.. Li=n=w1L1+w2L2+..wkLk
41 Mixture model in T mixture model in T P(D Q,T 1,T 2 T J ) = "#w j P(D i Q,T j ) i j A B C A B C A C A C B B What does the model deliver?!variation in the topology!variation in rates across the tree: a non-parametric covarion? Questions!does it work?!can we estimate sets of topologies/branch lengths?!are there a small number of sets in real data?
42 Does the multiple-topology model work? Simulation of two-topologies A topology 1 topology 2 C A B!two four taxon trees!two topologies p B q r D p C q r D!generate 20,000 random alignments of 1000 sites sets. Vary the proportion of sites per topology!analyse with Bayesian MCMC using: one-tree model (conventional), two-tree model Can we detect two trees?
43 Converged Markov chains for one and two-topology models Data: 1000 sites simulated up two different trees (80/20 split) two-topology model log-likelihood one-topology model Iteration of Markov chain
44 Proportion of sites favouring topology 1 topology 2 Outcome one-topology model two-topology model A B C D A B C D A D A C A B B C B D C D A B A B C D C D
45 Performance of one and two-topology models given conflicting phylogenetic signal A B (1) C A (2) D C B D Posterior support for topology two-topology model Analysis one-tree model two-tree model topology 1 two-tree model topology one-topology model Proportion of sites favouring topology 1
46 significant differences detected with ~ 10% of sites Posterior support for topology Proportion of sites favouring topology 1 likelihood difference Proportion of sites favouring topology 1
47 Estimating the number of topologies Data: 1000 sites simulated up two different trees of 50 tips, 60/40 split Model Log-likelihood s.d W 1 W 2 W 3 One-topology Two topologies Three topologies
48 Conflicting signal for ecdysozoa/coelomata phylogenies Nematodes (e.g., C. elegans) Arthropods (e.g., Drosophila) Chordates (e.g., humans) Nematodes (e.g., C. elegans) Arthropods (e.g., Drosophila) Chordates (e.g., humans) ecdysozoa coelomata dataset: 13,946 BP alignment of fourteen genes from four eukaryotic species, Homologene (NCBI) database Number Name Length 1 Beta tubulin EEF1A HSP40-4B HSP HSP HSP70-9B HSP HSP90-1A s s s RNA-Bind-Motif TUBA TUBB. 900
49 one-topology model: likelihood = ±2.58 data from Homologene database (NCBI) posterior support for coelomata = 87 tree length 1.01±0.03
50 One-topology model versus two-topology results likelihood = ±2.58 likelihood = ±3.01 weight = 0.57 weight = 0.43 coelomata coelomata ecdysozoa tree length 1.01±0.03 tree length 0.93±0.13 tree length 1.33±0.3 note: two recent papers support ecdysozoa!log-l = 53.1 log units
51 Simulation of null hypothesis distribution of differences in likelihood n=1000 simulated data sets of 13,946 sites. Fit one and two topology model to each. Find difference in likelihoods. n=1000 < 53, p<0.001 log(l2)-log(l1)
52 Prokaryote phylogenetics: placement of hyperthermophiles Log likelihood =-199,345
53 Prokaryotes: two topologies showing that hyperthermophiles become derived (longer tree) w 1 =0.55 w 2 =0.45 Log likelihood =-194,022!L = 5323
54 Summary Multiple topologies model seems to work Can estimate more than one topology and identify conflicting signal directly Some questions How to test? How many distinct trees in real data?
55 Acknowledgement BBSRC and NERC (UK)
56 Phylogeny of the Ascomycota Fungi showing the evolution of lichen-formation lichen forming ambiguous not lichen forming
57 log-likelihoods for combined LSU/SSU nrdna data set: 54 Ascomycota species n=800 sites log-likelihood Q(2) Gamma(4) Q(4) iteration number
58 Site by site analysis fitting two independent rate matrices: LSU/SSU nrdna Ascomycota combined data set SSU/LSU divide log-likelihood Q1 - log-likelihood Q Note: data modelled with two independent rate matrices site in the alignment
59
60 Compensatory substitutions transitional changes transitional changes tranversinal changes low fitness Purines A G Pyrimidines C U
61 Detecting secondary structure: application of pattern-heterogeneity model to 12s mitochondrial DNA data on 57 mammals Tree here
62 Mixture models in phylogenetic inference!mixture models!a mixture model in T!simulation study!performance at estimating T s!application to ecdysozoa/coelomata and prokaryotes
63 Detecting Conflicting Phylogenetic Signals a mixture-model approach
64 Average estimated transition rates from gamma and pattern-heterogeneity models applied to simulated gamma rate heterogeneity data: input values shown A C G T A C G T --- average rate = Average Q1 Gamma1 Q2 Gamma2 Q3 Gamma3 Q4 Gamma4
65
66 Results for branch length sets RJ model for branch lengths RJ for each branch. How many branches? identifying where and when
67 Is the number of branch lengths sets small in real data? Two prokaryote data sets 1) rrna 751 sites 2) ribosomal + proteins ~ 6000 sites Conventional likelihood model Beta proteobacteria Gamma proteobacteria Alpha proteobacteria Epsilon proteobacteria Chlamydiae Spirochaete Cyanobacteria DeinococcusThermus Actinobacteria Firmicutes Fusobacteria Thermotoga Aquificae Nanoarchaeota Crenarchaeota Euryarchaeota Branch-lengths mixture model Beta Proteobacteria Gamma Proteobacteria Alpha Proteobacteria Epsilon Proteobacteria Chlamydiae Spirochaetes Actinobacteria Aquifex Thermotoga Fusobacterium Firmicutes DeinococcusThermus Cyanobacteria Nanoarchaeota Crenarchaeota Euryarchaeota
Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis
Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila
More informationPhylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University
Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally
More informationMETHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.
Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern
More informationPhylogenetics: Building Phylogenetic Trees
1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should
More informationConcepts and Methods in Molecular Divergence Time Estimation
Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationMolecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences
Molecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences Basic Bioinformatics Workshop, ILRI Addis Ababa, 12 December 2017 1 Learning Objectives
More informationMaximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018
Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras
More informationBINF6201/8201. Molecular phylogenetic methods
BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics
More informationPhylogenetics. BIOL 7711 Computational Bioscience
Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium
More informationUsing phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)
Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationInferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT
Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationLecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)
Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the
More informationAnatomy of a species tree
Anatomy of a species tree T 1 Size of current and ancestral Populations (N) N Confidence in branches of species tree t/2n = 1 coalescent unit T 2 Branch lengths and divergence times of species & populations
More informationA Bayesian Approach to Phylogenetics
A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte
More informationA (short) introduction to phylogenetics
A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field
More informationAccounting for Phylogenetic Uncertainty in Comparative Studies: MCMC and MCMCMC Approaches. Mark Pagel Reading University.
Accounting for Phylogenetic Uncertainty in Comparative Studies: MCMC and MCMCMC Approaches Mark Pagel Reading University m.pagel@rdg.ac.uk Phylogeny of the Ascomycota Fungi showing the evolution of lichen-formation
More informationMaximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.
Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf
More informationPhylogenetic Assumptions
Substitution Models and the Phylogenetic Assumptions Vivek Jayaswal Lars S. Jermiin COMMONWEALTH OF AUSTRALIA Copyright htregulation WARNING This material has been reproduced and communicated to you by
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationPhylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction
More informationC3020 Molecular Evolution. Exercises #3: Phylogenetics
C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from
More informationGenomics and Bioinformatics
Genomics and Bioinformatics Associate Professor Bharat Patel Genomes in terms of earth s history: Earth s environment & cellular evolution Genomes in terms of natural relationships: (Technology driven
More informationThe Importance of Data Partitioning and the Utility of Bayes Factors in Bayesian Phylogenetics
Syst. Biol. 56(4):643 655, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701546249 The Importance of Data Partitioning and the Utility
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationBIOLOGICAL SCIENCE. Lecture Presentation by Cindy S. Malone, PhD, California State University Northridge. FIFTH EDITION Freeman Quillin Allison
BIOLOGICAL SCIENCE FIFTH EDITION Freeman Quillin Allison 1 Lecture Presentation by Cindy S. Malone, PhD, California State University Northridge Roadmap 1 Key themes to structure your thinking about Biology
More informationLecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).
1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff
More informationInferring Molecular Phylogeny
Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationMOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE
CELLULAR & MOLECULAR BIOLOGY LETTERS http://www.cmbl.org.pl Received: 16 August 2009 Volume 15 (2010) pp 311-341 Final form accepted: 01 March 2010 DOI: 10.2478/s11658-010-0010-8 Published online: 19 March
More information08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega
BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments
More information1 ATGGGTCTC 2 ATGAGTCTC
We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that
More informationMolecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016
Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,
More informationAssessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition
Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National
More informationEstimating Evolutionary Trees. Phylogenetic Methods
Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent
More informationConsensus Methods. * You are only responsible for the first two
Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationIntraspecific gene genealogies: trees grafting into networks
Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation
More informationCHAPTERS 24-25: Evidence for Evolution and Phylogeny
CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationToday's project. Test input data Six alignments (from six independent markers) of Curcuma species
DNA sequences II Analyses of multiple sequence data datasets, incongruence tests, gene trees vs. species tree reconstruction, networks, detection of hybrid species DNA sequences II Test of congruence of
More informationBiological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor
Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms
More informationRooting Major Cellular Radiations using Statistical Phylogenetics
Rooting Major Cellular Radiations using Statistical Phylogenetics Svetlana Cherlin Thesis submitted for the degree of Doctor of Philosophy School of Mathematics & Statistics Institute for Cell & Molecular
More information1. Can we use the CFN model for morphological traits?
1. Can we use the CFN model for morphological traits? 2. Can we use something like the GTR model for morphological traits? 3. Stochastic Dollo. 4. Continuous characters. Mk models k-state variants of the
More informationMolecular Evolution & Phylogenetics
Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic
More informationPhylogenetic Inference using RevBayes
Phylogenetic Inference using RevBayes Model section using Bayes factors Sebastian Höhna 1 Overview This tutorial demonstrates some general principles of Bayesian model comparison, which is based on estimating
More informationPhylogenetic methods in molecular systematics
Phylogenetic methods in molecular systematics Niklas Wahlberg Stockholm University Acknowledgement Many of the slides in this lecture series modified from slides by others www.dbbm.fiocruz.br/james/lectures.html
More informationGraph Alignment and Biological Networks
Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale
More informationJed Chou. April 13, 2015
of of CS598 AGB April 13, 2015 Overview of 1 2 3 4 5 Competing Approaches of Two competing approaches to species tree inference: Summary methods: estimate a tree on each gene alignment then combine gene
More informationSupplementary material to Whitney, K. D., B. Boussau, E. J. Baack, and T. Garland Jr. in press. Drift and genome complexity revisited. PLoS Genetics.
Supplementary material to Whitney, K. D., B. Boussau, E. J. Baack, and T. Garland Jr. in press. Drift and genome complexity revisited. PLoS Genetics. Tree topologies Two topologies were examined, one favoring
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationChapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships
Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic
More informationASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support
ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support Siavash Mirarab University of California, San Diego Joint work with Tandy Warnow Erfan Sayyari
More informationPGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species
PGA: A Program for Genome Annotation by Comparative Analysis of Maximum Likelihood Phylogenies of Genes and Species Paulo Bandiera-Paiva 1 and Marcelo R.S. Briones 2 1 Departmento de Informática em Saúde
More informationReconstruire le passé biologique modèles, méthodes, performances, limites
Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire
More informationNJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees
NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana
More informationIdentifiability of the GTR+Γ substitution model (and other models) of DNA evolution
Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution Elizabeth S. Allman Dept. of Mathematics and Statistics University of Alaska Fairbanks TM Current Challenges and Problems
More informationEvaluating the usefulness of alignment filtering methods to reduce the impact of errors on evolutionary inferences
Di Franco et al. BMC Evolutionary Biology (2019) 19:21 https://doi.org/10.1186/s12862-019-1350-2 RESEARCH ARTICLE Evaluating the usefulness of alignment filtering methods to reduce the impact of errors
More information8/23/2014. Phylogeny and the Tree of Life
Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major
More informationPhylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?
Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them? Carolus Linneaus:Systema Naturae (1735) Swedish botanist &
More informationMarkov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree
Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree Nicolas Salamin Department of Ecology and Evolution University of Lausanne
More informationIntegrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]
Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley K.W. Will Parsimony & Likelihood [draft] 1. Hennig and Parsimony: Hennig was not concerned with parsimony
More informationT R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid
Lecture 11 Increasing Model Complexity I. Introduction. At this point, we ve increased the complexity of models of substitution considerably, but we re still left with the assumption that rates are uniform
More informationUoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)
- Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the
More informationUnderstanding relationship between homologous sequences
Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective
More informationBio 1B Lecture Outline (please print and bring along) Fall, 2007
Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution
More informationHow should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?
How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What
More informationBacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria
Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria Seminar presentation Pierre Barbera Supervised by:
More informationBiology 211 (2) Week 1 KEY!
Biology 211 (2) Week 1 KEY Chapter 1 KEY FIGURES: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 VOCABULARY: Adaptation: a trait that increases the fitness Cells: a developed, system bound with a thin outer layer made of
More informationInferring Speciation Times under an Episodic Molecular Clock
Syst. Biol. 56(3):453 466, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701420643 Inferring Speciation Times under an Episodic Molecular
More informationIntegrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley
Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;
More informationTree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More informationMolecular phylogeny How to infer phylogenetic trees using molecular sequences
Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues
More information"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley B.D. Mishler Feb. 1, 2011. Qualitative character evolution (cont.) - comparing
More informationLecture 4. Models of DNA and protein change. Likelihood methods
Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36
More informationMODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL
MODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL MATTHEW W. DIMMIC*, DAVID P. MINDELL RICHARD A. GOLDSTEIN* * Biophysics Research Division Department of Biology and
More informationPHYLOGENOMICS AND THE RECONSTRUCTION OF THE TREE OF LIFE
PHYLOGENOMICS AND THE RECONSTRUCTION OF THE TREE OF LIFE Frédéric Delsuc, Henner Brinkmann and Hervé Philippe Abstract As more complete genomes are sequenced, phylogenetic analysis is entering a new era
More informationMotivating the need for optimal sequence alignments...
1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationMacroevolution Part I: Phylogenies
Macroevolution Part I: Phylogenies Taxonomy Classification originated with Carolus Linnaeus in the 18 th century. Based on structural (outward and inward) similarities Hierarchal scheme, the largest most
More informationGENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.
!! www.clutchprep.com CONCEPT: OVERVIEW OF EVOLUTION Evolution is a process through which variation in individuals makes it more likely for them to survive and reproduce There are principles to the theory
More informationMolecular phylogeny How to infer phylogenetic trees using molecular sequences
Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues
More informationLecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22
Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and
More informationBioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics
Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods
More informationLecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26
Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and
More informationMolecular evidence for multiple origins of Insectivora and for a new order of endemic African insectivore mammals
Project Report Molecular evidence for multiple origins of Insectivora and for a new order of endemic African insectivore mammals By Shubhra Gupta CBS 598 Phylogenetic Biology and Analysis Life Science
More informationMolecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given
Molecular Clocks Rose Hoberman The Holy Grail Fossil evidence is sparse and imprecise (or nonexistent) Predict divergence times by comparing molecular data Given a phylogenetic tree branch lengths (rt)
More informationRecovering evolutionary history by complex network modularity analysis
Recovering evolutionary history by complex network modularity analysis FESC Roberto F. S. Andrade, Suani T. R. de Pinho, Aristóteles Góes-Neto, Charbel N. El-Hani, Thierry P. Lobão, Gilberto C. Bomfim,
More informationCLASSIFICATION. Why Classify? 2/18/2013. History of Taxonomy Biodiversity: variety of organisms at all levels from populations to ecosystems.
Why Classify? Classification has been around ever since people paid attention to organisms. CLASSIFICATION One primeval system was based on harmful and non-harmful organisms. Life is easier when we organize
More informationSequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods
More information"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian
More informationLecture 3: Markov chains.
1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.
More informationPitfalls of Heterogeneous Processes for Phylogenetic Reconstruction
Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction DANIEL ŠTEFANKOVIČ 1 AND ERIC VIGODA 2 1 Department of Computer Science, University of Rochester, Rochester, New York 14627, USA; and
More informationAdditive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.
Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then
More information