Mixture Models in Phylogenetic Inference. Mark Pagel and Andrew Meade Reading University.

Size: px
Start display at page:

Download "Mixture Models in Phylogenetic Inference. Mark Pagel and Andrew Meade Reading University."

Transcription

1 Mixture Models in Phylogenetic Inference Mark Pagel and Andrew Meade Reading University

2 Mixture models in phylogenetic inference!some background statistics relevant to phylogenetic inference!mixture models!a mixture model for pattern heterogeneity Q!performance and results!real data!a mixture model for heterotachy t!4-taxon simulation study!performance and results!application to real data!mixtures of topologies T

3 The growth of GenBank

4 Growth in the use of molecular phylogenies in scientific research Cumulative publications: molecular AND phylogen*24000 PCR made available Nobel Prize to Kerry Mullis 2 years 3 years 4 years Year projected 2007 source: Web of Science, April, Search terms molecular AND phylogen*. Line of best fit is a quadratic spline or quadratic polynomial

5 Mixture models in phylogenetic inference L(Q)! P(D Q) conventional likelihood model mixture model in Q P(D Q,T) = " i P(D i Q,T) P(D Q 1,Q 2, Q J,T) = " i #w j P(D i Q j,t) Let T = (S, t) mixture model in t P(D Q,t 1,t 2 t J ) = " i #w j P(D i Q,t j )

6 Overview: Phylogenetic Inference from gene sequences 1. Homogeneous Process Model: A C G T Sp 1 AACGTTGTCCCTT Sp 2 AACGTTCCTTGCT.. Sp n CCGGTTGCAAGCT A --- q ac q ag q at C --- q cg q ct G --- q gt =1 T popular modifications to basic homogeneous model a) rate-heterogeneity (Yang, 1994): apply homogeneous model but allow rates to vary over sites according to a discrete gamma. Equivalent to fitting k rate matrices to each site, where the matrices differ from each other only by a proportional scalar b) partitioning data: apply a different rate matrix to different sites, chosen beforehand by the investigator

7 Limitations to homogeneous, rate-heterogeneity and partitioning approaches Homogeneous model: not all sites may evolve according to the same model Gamma model: rate variation may not be gamma or sites may vary in some other way Partitioning data: presumes investigator knows with certainty which model best applies to each site

8 Pattern-Heterogeneity Model of Gene-Sequence Evolution Allow for different sites to evolve in quantitatively or qualitatively different ways without prior partitioning by the investigator. Method: fit more than one model of evolution to each site, summing the likelihood at each site over all models. Allow the data to determine the best model for each site. Motivation for model: Pattern-heterogeneity model will always equal or better the performance of homogeneous or gamma rate heterogeneity models and normally improve on partitioning models. Frequently yields substantial improvements (100 s of log-units) Applications Phylogenetic inference Detecting regions within genes that evolve differently Detecting differences among genes

9 Applications of pattern-heterogeneity model Single gene alignment species 1 species n pattern 1 pattern 2 pattern 3 Concatenated gene alignment species 1 species 2 gene 1 gene 2 gene 3 gene k... species n Supermatrix alignment species 1 species 2 species n-k species n

10 Detecting pattern-heterogeneity in simulated data Method: I) generate simulated gene-sequence data on a random tree according to two different models of sequence evolution, creating two genes with different patterns of substitutions 2) analyse simulated data using homogeneous, gamma rates and pattern-heterogeneity model Tree used in simulations rate matrices A C G T A C G T --- Data: two genes of length 1200 and 800 sites

11 Posterior likelihoods for data simulated from two qualitatively distinct rate matrices log-likelihood Homogeneous Gamma (4) Pattern-Homogeneity Iteration of Markov Chain A <--> C A <--> G A <--> T C <--> G C <--> T G <--> T Q Q

12 Pattern-heterogeneity model: Simulated and estimated values of the rate parameters 5 A C G T Estimated values of rate parameters A C G T --- Q2 Q Actual values of rate parameters

13 Detecting pattern-heterogeneity in simulated data 15 log(q1 likelihood) - log(q2 likelihood) gene 1 gene Site in simulated sequence

14 Detecting rate-heterogeneity in simulated data Method: I) generate simulated gene-sequence data on a random tree according to a gamma rate heterogeneity model (continuous-gamma, $=1.0) 2) analyse data using homogeneous model, gamma rates (4 categories) and patternheterogeneity model (4 rate matrices) Tree used in simulations rate matrices A C G T A C G T --- Data: a gene of length 2000 sites

15 Posterior likelihoods for data simulated using four gamma-distributed rate categories log-likelihood Q(4) Q(1) Gamma(4) Iteration of Markov Chain

16 Average estimated transition rates from pattern-heterogeneity model applied to simulated gamma rate heterogeneity data: input values shown

17 Detecting secondary structure in Mitochondrial ribosomal 12s sequences n=57 mammals

18 Detecting secondary structure using the pattern-heterogeneity model Stem and loop structure leads to predictions about the pattern of nucleotide substitutions stems: dominated by Watson-Crick base pairings. Therefore expect compensatory substitutions to maintain Watson-Crick pairings. Non-compensatory changes have lower fitness. This predicts that transitional changes will occur at a far higher rate than transversional changes. Often observe Tr/Tv ratio of in mitochondrial DNA loops: no a priori substitutional pattern expected

19 Detecting secondary structure; likelihoods of pattern-heterogeneity and Gamma models log-likelihood Gamma (4) Pattern-heterogeneity (4) E7 2E7 3E7 4E7 5E7 Iteration number

20 Mammal 12S data: analysis of rate matrices A <-> C A <-> G A <-> T C <-> G C <-> T G <-> T Tr/Tv Q Q Q Q Best fit to loop or stem Q1 Q2 Q3 Q4 Stem Purines A G Pyrimidines C T Loop

21 Application of pattern heterogeneity to n=159 datasets (using a reversible-jump algorithm) RJ-algorithm allows GTR rate matirices to be gained and lost using split and merge moves Q split Q i merge Q Q j Jumps to higher or lower dimensions governed by: likelihood ratio % prior ratio % proposal ratio % jacobian Mean no. of taxa =39.9±32, median = 29 (range 7-178) Mean alignment size = 1983±2545 nucleotides, median = 1140 (range ) Mean no. of genes = 2.7±4.5, median = 1, (range 1-37) Allow RJ algorithm to run to apparent convergence Improvement over conventional modelling How many Qs?

22 Distribution of number of rate matrices (RJ-algorithm) and empirical weight 1 mean = 2.2±1.4 median =2 correlation with no. of taxa, r 2 =0.15 no. of sites in alignment, = r 2 =0.27 no. of genes, r 2 =0.25 no. of taxa has highest partial correlation number of rate matrices Weight Position

23 Improvement in log-likelihood over a single Q model 100 mean improvement = 265±608 log-units 75 mean improvement = 379± 698 for n=111 datasets with > 1 Q (light green) Increase in log-likelihood over single Q model

24 Improvement in log-likelihood versus no. of rate matrices 4000 log-likelihood difference No Q's note: curve is a quadratic fit

25 Conclusions Mixture model can detect pattern-heterogeneity in simulated and real data and recover the parameters of the model of sequence evolution Can lead in general to large improvements in likelihood over single Q model Can improve topology and reduce long-branch attraction Can reduce the node density artefact Can be used to investigate gene evolution (such as secondary structure) or be applied to concatenated data sets of multiple genes.

26 Heterotachy mixture model in t Mixture model in branch lengths, t P(D Q,t 1,t 2 t J ) = " i #w j P(D i Q,t j ) A B C A B C A B C What does the model deliver?!variation in rates across the tree (heterotachy): a non-parametric covarion? basic covarion model (Tuffley and Steel, 1998) Q (off or slow) - s ij I s ij I Questions!does it work?!can we estimate sets of branch lengths?!are there a small number of sets in real data? Q= s ji I Q (on or fast)- s ji I

27 Does the branch-lengths model work? Variable evolutionary rates over time: Where Parsimony beats ML site pattern 1 site pattern 2 A C A C p r B q D B D Site-specific variation in rates of evolution, causes maximum likelihood but not parsimony to find the wrong tree. One-half of the sites vary their rates as in the left diagram, and the remaining sites vary according to the right diagram. Maximum likelihood tends to put A and C and B and D together, whereas parsimony tends to find the true tree (Kolaczkowski and Thornton, 2004)

28 Simulation!four taxon tree A site pattern 1 site pattern 2 C A C!two sets of site patterns p r!750 sites per pattern B q D B D!generate 5000 random data sets of a hard case: p=0.75, q=0.05, r={0,0.4}!analyse with MP, JC-MCMC, Covarion, BL-MCMC (blind) Do we get back the true tree? A B C D

29 Information ratio = xxyy/(xyxy + xyyx) A r C e.g., aacc/(acac+acca) B D 6 5 information ratio r

30 A C 1 Probability of True Tree versus r B r D p 50 scores in r Parsimony = p 50 Y 0.5 JC MCMC = Covarion = BL MCMC = R r BL MCMC: recovered two branch length sets of correct lengths

31 Probability of recovering true tree vs. information ratio: BL MCMC model 1.8 probability correct information ratio

32 Can we estimate sets of branch lengths? Simulated branch-length sets data* Two branch length sets Analysis Likelihood Variance Weight BLS BLS BLS BLS Three branch length sets Analysis Likelihood Variance Weights BLS BLS BLS BLS Four branch length sets Analysis Likelihood Variance Weights BLS BLS BLS BLS *Data simulated on a tree of about 60 tips sites. Likelihood is mean from Markov chain; Variance is of the likelihood.

33 A reversible-jump algorithm for heterotachy -- allow each branch to choose one or more length -- need mechanism for splitting and merging branches -- test on simulated data 0.2 branch length set branch length set 1

34 Data simulated to have two branch-length sets log-likelihoods returned by one and two branch-length sets models and RJ algorithm Y Y log-likelihood BLS 1 log-likelihood BLS 2 log-likelihood RJ +925 log-units over BLS extra branches +870 log-units 73±2.46 branches 94% of improvement in log-likelihood with 65 fewer parameters iteration

35 Time series plot of no. of extra branches inferred from RJ algorithm No. branches iteration

36 Eukaryote phylogeny of 48 species based on four genes (EF1-$, EF2- $, and two trna synthetases) One branch-length set (homogeneous model) log-likelihood= two branch-length set log-likelihood= log-units for 94 extra branches RJ-algorithm log-likelihood = log-units(96%) for 45± 1.4 extra branches

37 Summary Branch-lengths model seems to work Can estimate branch length sets RJ algorithm can identify in which branches the heterotachy is most pronounced Uses Improve phylogenetic inference (e.g., should reduce long branch attraction) Identify where in tree and for which sites in the alignment the heterotachy is occurring Sp 1 AACGTTGTCCCTT Sp 2 AACGTTCCTTGCT.. Sp n CCGGTTGCAAGCT

38 A mixture of topologies L(Q)! P(D Q) conventional likelihood model mixture model in Q (Pagel and Meade, 2004) P(D Q,T) = "P(D i Q,T) i P(D Q 1,Q 2, Q J,T) = "#w j P(D i Q j,t) i j mixture model in T P(D Q,T 1,T 2 T J ) = "#w j P(D i Q,T j ) i j

39 Some causes of conflicting phylogenetic signals -- gene trees vs species trees (coalescent events) -- convergent evolution (e.g., lysozyme in cows and monkeys) -- gene duplication -- lateral gene transfer gene-duplication lateral gene transfer

40 How does the multiple-topologies mixture model work? conventional (Bayesian) inference alignment T likelihood iteration Li=1=P(D T1) Li=2=P(D T2) Li=3=P(D T3) n multiple-topologies mixture model alignment.. Li=n=P(D Tn) k distinct Markov chains T1 T2..Tk combined likelihood iteration n Converged combined likelihood: Li=n=w1L1+w2L2+..wkLk combined likelihood allows tree to diverge if this increases L calculate posterior probabilities of each tree separately Li=1=w1L1+w2L2+..wkLk Li=2=w1L1+w2L2+..wkLk Li=3=w1L1+w2L2+..wkLk.. Li=n=w1L1+w2L2+..wkLk

41 Mixture model in T mixture model in T P(D Q,T 1,T 2 T J ) = "#w j P(D i Q,T j ) i j A B C A B C A C A C B B What does the model deliver?!variation in the topology!variation in rates across the tree: a non-parametric covarion? Questions!does it work?!can we estimate sets of topologies/branch lengths?!are there a small number of sets in real data?

42 Does the multiple-topology model work? Simulation of two-topologies A topology 1 topology 2 C A B!two four taxon trees!two topologies p B q r D p C q r D!generate 20,000 random alignments of 1000 sites sets. Vary the proportion of sites per topology!analyse with Bayesian MCMC using: one-tree model (conventional), two-tree model Can we detect two trees?

43 Converged Markov chains for one and two-topology models Data: 1000 sites simulated up two different trees (80/20 split) two-topology model log-likelihood one-topology model Iteration of Markov chain

44 Proportion of sites favouring topology 1 topology 2 Outcome one-topology model two-topology model A B C D A B C D A D A C A B B C B D C D A B A B C D C D

45 Performance of one and two-topology models given conflicting phylogenetic signal A B (1) C A (2) D C B D Posterior support for topology two-topology model Analysis one-tree model two-tree model topology 1 two-tree model topology one-topology model Proportion of sites favouring topology 1

46 significant differences detected with ~ 10% of sites Posterior support for topology Proportion of sites favouring topology 1 likelihood difference Proportion of sites favouring topology 1

47 Estimating the number of topologies Data: 1000 sites simulated up two different trees of 50 tips, 60/40 split Model Log-likelihood s.d W 1 W 2 W 3 One-topology Two topologies Three topologies

48 Conflicting signal for ecdysozoa/coelomata phylogenies Nematodes (e.g., C. elegans) Arthropods (e.g., Drosophila) Chordates (e.g., humans) Nematodes (e.g., C. elegans) Arthropods (e.g., Drosophila) Chordates (e.g., humans) ecdysozoa coelomata dataset: 13,946 BP alignment of fourteen genes from four eukaryotic species, Homologene (NCBI) database Number Name Length 1 Beta tubulin EEF1A HSP40-4B HSP HSP HSP70-9B HSP HSP90-1A s s s RNA-Bind-Motif TUBA TUBB. 900

49 one-topology model: likelihood = ±2.58 data from Homologene database (NCBI) posterior support for coelomata = 87 tree length 1.01±0.03

50 One-topology model versus two-topology results likelihood = ±2.58 likelihood = ±3.01 weight = 0.57 weight = 0.43 coelomata coelomata ecdysozoa tree length 1.01±0.03 tree length 0.93±0.13 tree length 1.33±0.3 note: two recent papers support ecdysozoa!log-l = 53.1 log units

51 Simulation of null hypothesis distribution of differences in likelihood n=1000 simulated data sets of 13,946 sites. Fit one and two topology model to each. Find difference in likelihoods. n=1000 < 53, p<0.001 log(l2)-log(l1)

52 Prokaryote phylogenetics: placement of hyperthermophiles Log likelihood =-199,345

53 Prokaryotes: two topologies showing that hyperthermophiles become derived (longer tree) w 1 =0.55 w 2 =0.45 Log likelihood =-194,022!L = 5323

54 Summary Multiple topologies model seems to work Can estimate more than one topology and identify conflicting signal directly Some questions How to test? How many distinct trees in real data?

55 Acknowledgement BBSRC and NERC (UK)

56 Phylogeny of the Ascomycota Fungi showing the evolution of lichen-formation lichen forming ambiguous not lichen forming

57 log-likelihoods for combined LSU/SSU nrdna data set: 54 Ascomycota species n=800 sites log-likelihood Q(2) Gamma(4) Q(4) iteration number

58 Site by site analysis fitting two independent rate matrices: LSU/SSU nrdna Ascomycota combined data set SSU/LSU divide log-likelihood Q1 - log-likelihood Q Note: data modelled with two independent rate matrices site in the alignment

59

60 Compensatory substitutions transitional changes transitional changes tranversinal changes low fitness Purines A G Pyrimidines C U

61 Detecting secondary structure: application of pattern-heterogeneity model to 12s mitochondrial DNA data on 57 mammals Tree here

62 Mixture models in phylogenetic inference!mixture models!a mixture model in T!simulation study!performance at estimating T s!application to ecdysozoa/coelomata and prokaryotes

63 Detecting Conflicting Phylogenetic Signals a mixture-model approach

64 Average estimated transition rates from gamma and pattern-heterogeneity models applied to simulated gamma rate heterogeneity data: input values shown A C G T A C G T --- average rate = Average Q1 Gamma1 Q2 Gamma2 Q3 Gamma3 Q4 Gamma4

65

66 Results for branch length sets RJ model for branch lengths RJ for each branch. How many branches? identifying where and when

67 Is the number of branch lengths sets small in real data? Two prokaryote data sets 1) rrna 751 sites 2) ribosomal + proteins ~ 6000 sites Conventional likelihood model Beta proteobacteria Gamma proteobacteria Alpha proteobacteria Epsilon proteobacteria Chlamydiae Spirochaete Cyanobacteria DeinococcusThermus Actinobacteria Firmicutes Fusobacteria Thermotoga Aquificae Nanoarchaeota Crenarchaeota Euryarchaeota Branch-lengths mixture model Beta Proteobacteria Gamma Proteobacteria Alpha Proteobacteria Epsilon Proteobacteria Chlamydiae Spirochaetes Actinobacteria Aquifex Thermotoga Fusobacterium Firmicutes DeinococcusThermus Cyanobacteria Nanoarchaeota Crenarchaeota Euryarchaeota

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Molecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences

Molecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences Molecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences Basic Bioinformatics Workshop, ILRI Addis Ababa, 12 December 2017 1 Learning Objectives

More information

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018 Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Anatomy of a species tree

Anatomy of a species tree Anatomy of a species tree T 1 Size of current and ancestral Populations (N) N Confidence in branches of species tree t/2n = 1 coalescent unit T 2 Branch lengths and divergence times of species & populations

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Accounting for Phylogenetic Uncertainty in Comparative Studies: MCMC and MCMCMC Approaches. Mark Pagel Reading University.

Accounting for Phylogenetic Uncertainty in Comparative Studies: MCMC and MCMCMC Approaches. Mark Pagel Reading University. Accounting for Phylogenetic Uncertainty in Comparative Studies: MCMC and MCMCMC Approaches Mark Pagel Reading University m.pagel@rdg.ac.uk Phylogeny of the Ascomycota Fungi showing the evolution of lichen-formation

More information

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

More information

Phylogenetic Assumptions

Phylogenetic Assumptions Substitution Models and the Phylogenetic Assumptions Vivek Jayaswal Lars S. Jermiin COMMONWEALTH OF AUSTRALIA Copyright htregulation WARNING This material has been reproduced and communicated to you by

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Genomics and Bioinformatics

Genomics and Bioinformatics Genomics and Bioinformatics Associate Professor Bharat Patel Genomes in terms of earth s history: Earth s environment & cellular evolution Genomes in terms of natural relationships: (Technology driven

More information

The Importance of Data Partitioning and the Utility of Bayes Factors in Bayesian Phylogenetics

The Importance of Data Partitioning and the Utility of Bayes Factors in Bayesian Phylogenetics Syst. Biol. 56(4):643 655, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701546249 The Importance of Data Partitioning and the Utility

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

BIOLOGICAL SCIENCE. Lecture Presentation by Cindy S. Malone, PhD, California State University Northridge. FIFTH EDITION Freeman Quillin Allison

BIOLOGICAL SCIENCE. Lecture Presentation by Cindy S. Malone, PhD, California State University Northridge. FIFTH EDITION Freeman Quillin Allison BIOLOGICAL SCIENCE FIFTH EDITION Freeman Quillin Allison 1 Lecture Presentation by Cindy S. Malone, PhD, California State University Northridge Roadmap 1 Key themes to structure your thinking about Biology

More information

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). 1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

More information

Inferring Molecular Phylogeny

Inferring Molecular Phylogeny Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE CELLULAR & MOLECULAR BIOLOGY LETTERS http://www.cmbl.org.pl Received: 16 August 2009 Volume 15 (2010) pp 311-341 Final form accepted: 01 March 2010 DOI: 10.2478/s11658-010-0010-8 Published online: 19 March

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

1 ATGGGTCTC 2 ATGAGTCTC

1 ATGGGTCTC 2 ATGAGTCTC We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Consensus Methods. * You are only responsible for the first two

Consensus Methods. * You are only responsible for the first two Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Intraspecific gene genealogies: trees grafting into networks

Intraspecific gene genealogies: trees grafting into networks Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species DNA sequences II Analyses of multiple sequence data datasets, incongruence tests, gene trees vs. species tree reconstruction, networks, detection of hybrid species DNA sequences II Test of congruence of

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

Rooting Major Cellular Radiations using Statistical Phylogenetics

Rooting Major Cellular Radiations using Statistical Phylogenetics Rooting Major Cellular Radiations using Statistical Phylogenetics Svetlana Cherlin Thesis submitted for the degree of Doctor of Philosophy School of Mathematics & Statistics Institute for Cell & Molecular

More information

1. Can we use the CFN model for morphological traits?

1. Can we use the CFN model for morphological traits? 1. Can we use the CFN model for morphological traits? 2. Can we use something like the GTR model for morphological traits? 3. Stochastic Dollo. 4. Continuous characters. Mk models k-state variants of the

More information

Molecular Evolution & Phylogenetics

Molecular Evolution & Phylogenetics Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic

More information

Phylogenetic Inference using RevBayes

Phylogenetic Inference using RevBayes Phylogenetic Inference using RevBayes Model section using Bayes factors Sebastian Höhna 1 Overview This tutorial demonstrates some general principles of Bayesian model comparison, which is based on estimating

More information

Phylogenetic methods in molecular systematics

Phylogenetic methods in molecular systematics Phylogenetic methods in molecular systematics Niklas Wahlberg Stockholm University Acknowledgement Many of the slides in this lecture series modified from slides by others www.dbbm.fiocruz.br/james/lectures.html

More information

Graph Alignment and Biological Networks

Graph Alignment and Biological Networks Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale

More information

Jed Chou. April 13, 2015

Jed Chou. April 13, 2015 of of CS598 AGB April 13, 2015 Overview of 1 2 3 4 5 Competing Approaches of Two competing approaches to species tree inference: Summary methods: estimate a tree on each gene alignment then combine gene

More information

Supplementary material to Whitney, K. D., B. Boussau, E. J. Baack, and T. Garland Jr. in press. Drift and genome complexity revisited. PLoS Genetics.

Supplementary material to Whitney, K. D., B. Boussau, E. J. Baack, and T. Garland Jr. in press. Drift and genome complexity revisited. PLoS Genetics. Supplementary material to Whitney, K. D., B. Boussau, E. J. Baack, and T. Garland Jr. in press. Drift and genome complexity revisited. PLoS Genetics. Tree topologies Two topologies were examined, one favoring

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support Siavash Mirarab University of California, San Diego Joint work with Tandy Warnow Erfan Sayyari

More information

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species PGA: A Program for Genome Annotation by Comparative Analysis of Maximum Likelihood Phylogenies of Genes and Species Paulo Bandiera-Paiva 1 and Marcelo R.S. Briones 2 1 Departmento de Informática em Saúde

More information

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

More information

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

More information

Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution

Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution Elizabeth S. Allman Dept. of Mathematics and Statistics University of Alaska Fairbanks TM Current Challenges and Problems

More information

Evaluating the usefulness of alignment filtering methods to reduce the impact of errors on evolutionary inferences

Evaluating the usefulness of alignment filtering methods to reduce the impact of errors on evolutionary inferences Di Franco et al. BMC Evolutionary Biology (2019) 19:21 https://doi.org/10.1186/s12862-019-1350-2 RESEARCH ARTICLE Evaluating the usefulness of alignment filtering methods to reduce the impact of errors

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them? Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them? Carolus Linneaus:Systema Naturae (1735) Swedish botanist &

More information

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree Nicolas Salamin Department of Ecology and Evolution University of Lausanne

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft] Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley K.W. Will Parsimony & Likelihood [draft] 1. Hennig and Parsimony: Hennig was not concerned with parsimony

More information

T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid

T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid Lecture 11 Increasing Model Complexity I. Introduction. At this point, we ve increased the complexity of models of substitution considerably, but we re still left with the assumption that rates are uniform

More information

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe? How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What

More information

Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria

Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria Seminar presentation Pierre Barbera Supervised by:

More information

Biology 211 (2) Week 1 KEY!

Biology 211 (2) Week 1 KEY! Biology 211 (2) Week 1 KEY Chapter 1 KEY FIGURES: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 VOCABULARY: Adaptation: a trait that increases the fitness Cells: a developed, system bound with a thin outer layer made of

More information

Inferring Speciation Times under an Episodic Molecular Clock

Inferring Speciation Times under an Episodic Molecular Clock Syst. Biol. 56(3):453 466, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701420643 Inferring Speciation Times under an Episodic Molecular

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2011 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley B.D. Mishler Feb. 1, 2011. Qualitative character evolution (cont.) - comparing

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

MODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL

MODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL MODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL MATTHEW W. DIMMIC*, DAVID P. MINDELL RICHARD A. GOLDSTEIN* * Biophysics Research Division Department of Biology and

More information

PHYLOGENOMICS AND THE RECONSTRUCTION OF THE TREE OF LIFE

PHYLOGENOMICS AND THE RECONSTRUCTION OF THE TREE OF LIFE PHYLOGENOMICS AND THE RECONSTRUCTION OF THE TREE OF LIFE Frédéric Delsuc, Henner Brinkmann and Hervé Philippe Abstract As more complete genomes are sequenced, phylogenetic analysis is entering a new era

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Macroevolution Part I: Phylogenies

Macroevolution Part I: Phylogenies Macroevolution Part I: Phylogenies Taxonomy Classification originated with Carolus Linnaeus in the 18 th century. Based on structural (outward and inward) similarities Hierarchal scheme, the largest most

More information

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS. !! www.clutchprep.com CONCEPT: OVERVIEW OF EVOLUTION Evolution is a process through which variation in individuals makes it more likely for them to survive and reproduce There are principles to the theory

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22 Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26 Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

More information

Molecular evidence for multiple origins of Insectivora and for a new order of endemic African insectivore mammals

Molecular evidence for multiple origins of Insectivora and for a new order of endemic African insectivore mammals Project Report Molecular evidence for multiple origins of Insectivora and for a new order of endemic African insectivore mammals By Shubhra Gupta CBS 598 Phylogenetic Biology and Analysis Life Science

More information

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given Molecular Clocks Rose Hoberman The Holy Grail Fossil evidence is sparse and imprecise (or nonexistent) Predict divergence times by comparing molecular data Given a phylogenetic tree branch lengths (rt)

More information

Recovering evolutionary history by complex network modularity analysis

Recovering evolutionary history by complex network modularity analysis Recovering evolutionary history by complex network modularity analysis FESC Roberto F. S. Andrade, Suani T. R. de Pinho, Aristóteles Góes-Neto, Charbel N. El-Hani, Thierry P. Lobão, Gilberto C. Bomfim,

More information

CLASSIFICATION. Why Classify? 2/18/2013. History of Taxonomy Biodiversity: variety of organisms at all levels from populations to ecosystems.

CLASSIFICATION. Why Classify? 2/18/2013. History of Taxonomy Biodiversity: variety of organisms at all levels from populations to ecosystems. Why Classify? Classification has been around ever since people paid attention to organisms. CLASSIFICATION One primeval system was based on harmful and non-harmful organisms. Life is easier when we organize

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

Lecture 3: Markov chains.

Lecture 3: Markov chains. 1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.

More information

Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction

Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction DANIEL ŠTEFANKOVIČ 1 AND ERIC VIGODA 2 1 Department of Computer Science, University of Rochester, Rochester, New York 14627, USA; and

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information