Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Similar documents
Constructing Evolutionary/Phylogenetic Trees

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Phylogenetic inference

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Phylogenetic Tree Reconstruction

Letter to the Editor. Department of Biology, Arizona State University

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Consistency Index (CI)

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Dr. Amira A. AL-Hosary

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

How to read and make phylogenetic trees Zuzana Starostová

EVOLUTIONARY DISTANCES

Theory of Evolution Charles Darwin

Evolutionary Tree Analysis. Overview

Constructing Evolutionary/Phylogenetic Trees

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Molecular Evolution & Phylogenetics

Phylogenetics. BIOL 7711 Computational Bioscience

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

C3020 Molecular Evolution. Exercises #3: Phylogenetics


C.DARWIN ( )

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

A (short) introduction to phylogenetics

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular Evolution and Phylogenetic Tree Reconstruction

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetics: Building Phylogenetic Trees

Algorithms in Bioinformatics

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Week 5: Distance methods, DNA and protein models

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Kei Takahashi and Masatoshi Nei

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

BINF6201/8201. Molecular phylogenetic methods

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Minimum evolution using ordinary least-squares is less robust than neighbor-joining

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval

Phylogeny: traditional and Bayesian approaches

Phylogenetic Inference and Parsimony Analysis

Evaluating phylogenetic hypotheses

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Theory of Evolution. Charles Darwin

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

What is Phylogenetics

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

Inferring Molecular Phylogeny

Lecture 6 Phylogenetic Inference

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Distances that Perfectly Mislead

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Phylogenetic inference: from sequences to trees

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

Estimating Divergence Dates from Molecular Sequences

FUNDAMENTALS OF MOLECULAR EVOLUTION

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Phylogenetics: Parsimony

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Phylogeny. November 7, 2017

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Consensus Methods. * You are only responsible for the first two

The Generalized Neighbor Joining method

Are Guinea Pigs Rodents? The Importance of Adequate Models in Molecular Phylogenetics

66 Bioinformatics I, WS 09-10, D. Huson, December 1, Evolutionary tree of organisms, Ernst Haeckel, 1866

Phylogeny: building the tree of life

Phylogeny Tree Algorithms

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Molecular Evolution and Phylogenetics...a very short course

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

On the Uniqueness of the Selection Criterion in Neighbor-Joining

Phylogenetic trees 07/10/13

Agricultural University

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Phylogenetic Networks, Trees, and Clusters

Bayesian Models for Phylogenetic Trees

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Lecture 4. Models of DNA and protein change. Likelihood methods

How should we organize the diversity of animal life?

Reconstruire le passé biologique modèles, méthodes, performances, limites

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Phylogenetic methods in molecular systematics

A Fitness Distance Correlation Measure for Evolutionary Trees

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Multiple sequence alignment accuracy and phylogenetic inference

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1

Transcription:

Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;<=>?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods using distance data UPGMA, Neighbor-joining, minimum evolution Methods using discrete data Maximum parsimony Maximum likelihood Bayesian inference 1 2 Milestones of molecular evolution studies 1800 1900 2000 Genetics: Gregor Mendel (1856-1863) Origin of species: Charles Darwin (1859) Emergence of cladistics: Willi Hennig et al. Discovery of DNA structure: James Watson & Francis Crick (1953) First comparison of amino acid sequence (insulin) among different species: Fred Sanger et al. (1955) First example of molecular systematics: Vince Sarich & Allan Wilson (1967) on human evolution Technical breakthrough in the 1990s: computer speed, methodological improvement Technical breakthrough in the 1980s: PCR, DNA sequencing, etc. The blooming of evo devo studies Contributions to molecular evolution The patterns of evolution Cladistics and phylogenetics: Willi Hennig (1913-1976) The mechanisms The neutral theory of molecular evolution: Motoo Kimura (!"#$, 1924-1994) The models and methodological concerns The uses of parsimony, maximum likelihood methods in phylogenetics: Joseph Felsenstein Genome Science and of Biology, University of Washington, Seattle 3 Nature (1968) 217: 624 3 of molecular evolution 4 Reconstructing phylogeny The concepts of trees The data used to construct phylogenetic trees Morphological data Molecular data Homology of characters in data matrices Sequence alignment The basics of evolutionary trees The tree shapes Are these two trees different? 5 5 Baum et al. (2005) Science 310:979 6 6

Most recent common ancestor (MRCA) Types of data Character and character types Quantitative and qualitative characters Binary and multistate characters Assumptions about character evolution Substitution models Step matrix Which is the MRCA of a mushroom and a sponge? Which is the MRCA of a mouse and a fern? b d Plant Molecular Evolution 7 8 8 Distance method Evolution of DNA sequences UPGMA (Unweighted Pair Group Method using Arithmetic averages) Transformed distance method Neighbor-Joining method Minimum evolution A TCAGCCGACTGT T TCAGCCGACTGT C TCAGACGACTGT SpA SpB ACAGCCGATTGT TCAGCCGATTGT TCAGACGACTGT SpC TCAGCCGACTGT SpD TCAGACGACTGT 9 9 10 10 In distance method, sequences were grouped based on their similarity, we need to measure the differences among sequences Measuring genetic distance SpA SpB SpC SpD ACAGCCGATTGT TCAGCCGATTGT TCAGCCGACTGT TCAGACGACTGT How different are the two sequences we have? The ways of measuring nucleotide substitution Simplest way Hamming distance = n/n x% SpA SpB ACAGCCGA-TGT TCAGCCGA-TGT 2/12 = 16.6% SpC TCAGCCGACTGT 1/12 = 8.3% SpD TCAGACGACTGT 11 12

Measuring differences between sequences The common ways Direct counting The probability of substitution at certain nucleotide site change from i to j. Nucleotide substitution models Jukes & Cantor(1969) s one parameter model Kimura(1980)'s two parameter model F81 (Felsenstein, 1981) HKY85 (Hasegawa et al., 1985) Generalized time reversible model (GTR) The probability of nucleotide substitution SpC SpD TCAGCCGACTGT TCAGACGACTGT In order to know the probability of the sequences to be different (p), we can calculate the probability of the sequences being the same (I (t)) first. Let s start with Jukes-Cantor model 13 13 14 We can just use another estimator, K, the number of substitutions per site since time of divergence between the two sequences 3"t Y X 3"t Z Therefore, K = 2 x 3"t = 6"t! 8!t = "ln(1" 4 3 p) " A C " " G " " " " K =! 3 4 ln(1! 4 3 p) T Under J-C model, Example of calculation If there are 80 transitions and 20 transversions between two sequences (length=0bp), the number of substitutions per site K can be estimated: K =! 3 4 ln(1! 4 3 p) = 0.10732 (p=0.1) Under Kimura 2P model: K = 1 2 ln( 1 1! 2P! Q ) + 1 4 ln( 1 1! 2Q ) = 0.10943 (P=0.08, Q=0.02) where p = the observed proportions of different nucleotides between the two sequences 15 where P = the proportions of transitions and Q = the proportions of transversions. p is the proportion of total substitution. 16 16 Overview of Distance method UPGMA 1. Transform the data matrix into distance table OTU A B C B d AB C d AC d BC D d AD d BD d CD OTU (AB) C C d (AB)C D d (AB)D d CD where d (AB)C = (d AC + d BC )/2 and d (AB)D = (d AD + d BD )/2 2. Use this new data matrix to evaluate/ construct the tree (Page & Holme 1998) 17 17 (Graur & Li 2000) 18 18

Example Example Felsenstein (2004) 19 19 Felsenstein (2004) 20 20 Example UPGMA tree Felsenstein (2004) 21 21 Felsenstein (2004) 22 22 True tree 3 SP1: AAACGCGCTTAGATCGCCTAACGTACAAGC 10 SP2: TATAAAGGGTAGAAAACCTAAGGATCAACG N1: AATCGCGCTTAGATCGCCTAACGTACAACG 2 3 N2: AATCGCGGGTAGATCGCCTAACGTACAACG SP3: AATCGCGGGTATTTCGCCTAACGTACATCG SP1 SP2 SP3 15 8 13 SP1 SP2 SP3 7 4 3 4 SP2 SP1 SP3 Neighbor-Joining (NJ) method Saitou & Nei (1987); Studier & Keppler (1988) NJ does not assume a clock and approximates the minimum evolution method 23 24 24

NJ procedure 1. For each tip, compute u i = # n j: j"i D i (n! 2) 2. Choose the i and j for which D ij -u i -u j is smallest 3. Join i and j, compute the new node (ij) to every other nodes D (ij ),k = (D ik + D jk! D ij ) 2 4. Delete tips i and j and make new table with the new node (ij) Limitation on distance methods All nucleotide sites change independently. When evolutionary rates vary from site to site, than the data set needs to be corrected The substitution rate is constant over time and in different lineages. The base composition is at equilibrium. The conditional probabilities of nucleotide substitutions are the same for all sites and do not change over time. 5. Continue till resolve the tree 25 25 26 26 Discrete methods Maximum parsimony (%&'()*) Maximum likelihood (+,-./)*) Bayesian inference (01234567) Others Parsimony methods The goal is to find the most parsimonious tree The criteria are to calculate the changes of character states, i.e. the evolutionary steps First, we have to know the way to evaluate a given tree 27 28 28 A simple data matrix w/ discrete characters SpA: TCAGACGATTGTCAGACCATTG SpB: TCAGTCGACTGTCAAACCATTG SpC: TCGGTCAATTGTCAAACGATTG SpD: TCGGTCAATTGTCAAACGATTG SpA SpB SpC SpD SpA SpB SpD SpC SpA SpC SpD SpB A simple data matrix w/ discrete characters For position no. 3 SpA: TCAGACGATTGTCAGACCATTG SpB: TCAGTCGACTGTCAAACCATTG SpC: TCGGTCAATTGTCAAACGATTG SpD: TCGGTCAATTGTCAAACGATTG A A G G A A G G A G G A SpA SpB SpC SpD SpA SpB SpD SpC SpA SpC SpD SpB A A G G SpA SpB SpC SpD A to G Character change (step) = 2 Character change (step) = 1 29 29 Character change (step) = 2 30 30

A simple data matrix w/ discrete characters For position no. 7 SpA: TCAGACGATTGTCAGACCATTG SpB: TCAGTCGACTGTCAAACCATTG SpC: TCGGTCAATTGTCAAACGATTG SpD: TCGGTCAATTGTCAAACGATTG A simple data matrix w/ discrete characters SpA: TCAGACGATTGTCAGACCATTG SpB: TCAGTCGACTGTCAAACCATTG SpC: TCGGTCAATTGTCAAACGATTG SpD: TCGGTCAATTGTCAAACGATTG G G A A G G A A G A A G SpA SpB SpC SpD SpA SpB SpD SpC SpA SpC SpD SpB G G A A G G A A G A A G SpA SpB SpC SpD SpA SpB SpD SpC SpA SpC SpD SpB G to A Character change (step) = 9 Character change (step) = 9 Character change (step) = 6 Character change (step) = 2 Character change (step) = 2 Character change (step) = 1 31 31 32 32 Consensus trees Finding the underlying information among trees So, how many trees out there are we dealing with? (Page & Holme 1998) 33 33 34 34 3 taxa Tree searching The number of unrooted trees: N unrooted = (2n! 5)! 2 n!3 (n! 3)! 4 taxa The number of rooted trees: N rooted = (2n! 3)! 2 n!2 (n! 2)! 35 35 36 36

Taxon number All possible unrooted tree number 3 1 4 3 5 15 6 105 7 945 8 10,395 9 135,135 10 2,027,025 11 34,459,425 12 654,729,075 13 13,749,310,575 14 316,234,143,225 15 7,905,853,580,625 16 213,458,046,676,875 17 6,190,283,353,629,375 18 191,898,783,962,510,625 19 6,332,659,870,762,850,625 20 221,643,095,476,699,771,875 21 8,200,794,532,637,891,559,375 22 319,830,986,772,877,770,815,625 23 13,113,070,457,687,988,603,440,625 24 563,862,029,680,583,509,947,946,875 25 25,373,791,335,626,257,947,657,609,375 >10 29... etc. 37 37 Note Say a computer can evaluate 10 6 trees per second. If we want to evaluate all of the trees for 25 taxa, we will need x = 10 29 /10 6 /60/60/24/365 = 3.17x10 15!"#$%!& We got to have some ways to approximate the true tree. 38 38 Ways to solve time paradigm Skip unnecessary calculation Change the tree searching ways The procedure can be simplified SpA: TCAGACGATTGTCAGACCATTG SpB: TCGGTCGACTGTCAGACCATTG SpC: TCAGTCGATTGTCA-ACGATTG SpD: TCAGTCGATTGTCA-ACGATTG SpE: TCAGTCGATCGTCA-ACGATTG Parsimonious uninformative Parsimonious informative 39 39 40 40 Searching for optimal trees Exhausted search Branch-and-bound method Heuristic approach Stepwise/closest/random addition Star decomposition Branch swapping Exhausted search 41 Page & Holme (1998) Molecular evolution 41 42

Distribution of tree length Frequency distribution of lengths of 0000 random trees: mean=6528.095481 sd=65.005662 g1=-0.322317 g2=0.159848 6114.00000 /--------------------------------------------------------------------- 6146.30000 (1) 6178.60000 (3) 6210.90000 (24) 6243.20000 (84) 6275.50000 (302) 6307.80000 (1071) 6340.00 # (3493) 6372.40000 ### (9300) 6404.70000 ######## (22649) 6437.00000 ################# (48576) 6469.30000 ################################# (94873) 6501.60000 ################################################## (144323) 6533.90000 ################################################################# (187000) 6566.20000 ##################################################################### (199355) 6598.50000 ##################################################### (153445) 6630.80000 ############################### (88764) 6663.00 ############# (36489) 6695.40000 ### (8865) 6727.70000 (1266) 6760.00000 (117) \--------------------------------------------------------------------- Data of bird ovomucoids 43 Tree-island profile: First Last First Times Island Size tree tree Score replicate hit ---------------------------------------------------------------------- 1 101 1471 1571 3890 3 1 2 69 4961 5029 3890 10 1 3 1 14236 14236 3890 30 1 4 5 14237 14241 3890 31 1 5 1236 1 1236 3891 1 2 6 96 8816 8911 3891 20 1 7 256 9524 9779 3891 23 2 8 960 14242 15201 3891 32 2 9 624 15202 15825 3891 33 1 10 3085 15870 18954 3891 37 1 11 1488 1911 3398 3892 5 1 12 626 3448 4073 3892 7 2 13 211 4074 4284 3892 8 1 14 96 5935 6030 3892 12 1 15 156 7211 7366 3892 14 2 16 236 7367 7602 3892 15 2 17 1053 7667 8719 3892 18 1 18 291 9780 70 3892 24 1 19 576 71 10646 3892 25 1 20 96 18987 19082 3892 39 1 21 676 4285 4960 3893 9 1 22 905 5030 5934 3893 11 1 23 96 8720 8815 3893 19 1 24 612 8912 9523 3893 21 1 25 32 18955 18986 3893 38 1 26 18 19083 19 3893 40 1 27 339 1572 1910 3894 4 2 28 64 7603 7666 3894 17 1 29 3589 10647 14235 3894 26 1 30 234 1237 1470 3895 2 1 31 49 3399 3447 3895 6 1 32 44 15826 15869 3895 35 1 33 1180 6031 7210 3896 13 1 Results of heuristic search, with random addition from random starting points Data of bird ovomucoids 44 Branch and bound searching A shortest Hamiltonian path problem Hendy & Penny (1982) first used this method for inferring phylogenies. An example - shortest Hamiltonian path (SHP) For n cities, there will be n-1 cities to come next, so there will be n! possible solution Point x y 1 0.537 0.061 2 0.274 0.222 3 0.016 0.837 4 0.871 0.400 5 0.399 0.740 6 0.815 0.531 7 0.587 0.946 8 0.992 0.733 9 0.268 0.481 10 0.895 0.068 Random route: Total length=5.4342 Adding by greedy manner: Length=2.8027 Optimal solution: Length=2.7812 Hendy, M.D. & D. Penny (1982) Mathem. Biosci. 60: 133-142. Felsenstein (2004) Inferring phylogeny 45 45 46 Felsenstein (2004) Inferring phylogeny 46 Branch-and-bound search heuristic searching for best tree #This method can greatly improve the speed of search, but it still cannot escape from the constraints of the NPhardness proof. Page & Holme (1998) Molecular evolution 47 47 Felsenstein (2004) Inferring phylogeny 48 48

Heuristic approach- Branch swapping: NNIs Nearestneighbor interchanges (NNI) How greedy should we be? In a tree with n tips, there will be n-3 interior branches. In all, 2(n-3) neighbors will be examined for each tree. Should we do NNI for each of the best trees, or stop at some point, or should we start from scratch more often? Swofford et al. (1996) Molecular Systematics 49 49 50 50 The space of all 15 possible unrooted trees The space of all 15 possible unrooted trees NNI is to moving in this graph Felsenstein (2004) Inferring phylogeny 51 51 52 Felsenstein (2004) Inferring phylogeny 52 Heuristic approach- Branch swapping: TBR Tree bisection and reconnection (TBR) In a n 1 +n 2 species in the tree, there will be (2n 1-3)(2n 2-3) possible ways to reconnect the two trees. Rearrangement Browsing tree space Random starting point strategy to avoid being trapped in tree island (Maddison 1991) Parsimony ratchet Parsimony ratchet uses a re-weighted subset of characters as a starting point to browse the tree space, and tries to find as many tree islands as possible by using adjacent-tree searching method. (Nixon 1999) 53 Swofford et al. (1996) Molecular Systematics 53 Maddison, D.R. (1991) Syst. Zool. 40: 315-328. Nixon, K.C. (1999) Cladistics 15: 407-414. 54 54

Parsimony ratchet Described by Nixon (1999), and available in various programs (NONA, WINCLADA, PAUPRat, etc.). Parsimony ratchet uses a re-weighted subset of characters as a starting point to browse the tree space, and tries to find as many tree islands as possible by using adjacent-tree searching method. Parsimony ratchet procedure Generate a starting Wagner tree Randomly select a subset of character (5-25%) and add 1 weight score Performing TBR branch swapping, keep 1 tree Reset the weighting to original Performing branch swapping, keep the best tree Return to step 2, repeat the iteration (step2-6) 50-200 times Nixon, K.C. (1999) Cladistics 15: 407-414. 55 55 56 56 Notes on parsimony ratchet It seems to improve the effectivity of tree searching It can be used with any objective function based on character data: compatibility, distance matrix, and likelihoods. The strategy can be modified Nixon, K.C. (1999) Cladistics 15: 407-414. 57 57 58 58 Weighted parsimony method Incorporating simple nucleotide models into MP analysis Transition vs. transversion scores The Sankoff algorithm applied to the tree C A C A G! 0!! 0!!!! 0!! 0!!!!! 0! A C G T A 0 2.5 1 2.5 C 2.5 0 2.5 1 G 1 2.5 0 2.5 T 2.5 1 2.5 0 59 A C G T 2.52.5 3.5 3.5 6 6 7 8 3.5 3.5 3.5 4.5 S = min S 0 (i) 1 5 1 5 A C G T A 0 2.5 1 2.5 C 2.5 0 2.5 1 G 1 2.5 0 2.5 T 2.5 1 2.5 0 Modified from Felsenstein (2004) Inferring phylogeny 60 60

Evaluating trees How do we know the tree we got is right? The evaluation Character properties (CI, RI) Examining how clean is the data on a given tree Confidence level of trees Comparisons between obtained trees Partition distance Kishino-Hasegawa test Distance test Likelihood ratio test Bootstrap/jackknife (internal support) Bremer support (for parsimony) 61 62 62 Properties of characters Consistency index (CI) of each character is: character CI = m i /S i Tree CI for all characters in a specific tree: n! m i w i i=1 tree CI = n! w i s i i=1 m i ='()*+,-./i01/23)*+,45 (minimum conceivable steps) S i =06+,-./i01/+,45 w i = i01/789n:01; Properties of characters Retention index (RI) of each characters is: M character RI = i! s i M i! m i m i ='()*+,-./i01/23)*+,45 (minimum conceivable steps) M i ='()*+,-./i01/2<)*+,45 (maximum conceivable steps) S i =06+,-./i01/+,45 63 63 64 64 Example: calculating CI & RI Evaluating trees Taxon1 0 0 0 0 Taxon2 0 0 0 1 Taxon3 1 1 0 1 Taxon4 1 1 1 1 Taxon5 1 1 0 1 Tree 1 Tree 2 Taxon1 0 Taxon1 0 Taxon2 0 Taxon4 1 Taxon3 1 Taxon3 1 Taxon4 1 Taxon2 0 Taxon5 1 Taxon5 1 Character properties (CI, RI) Confidence level of trees Comparisons between obtained trees Bootstrap/jackknife (internal support) Bremer support (for parsimony) Character(1) CI=1/1=1 Character(1) RI=(2-1)/(2-1)=1 Character CI=1/2=0.5 Character RI=(2-2)/(2-1)=0 TreeCI = 1+1+1+1 1+1+1+1 =1 TreeCI = 1+1+1+1 2 + 2 +1+1 = 0.67 65 65 66 66

Bootstrap method Characters 1 2 3 4 5... N A B C Taxa D E F G Bootstrap method 7 33 29 72 7 86 45 78 16 98 21 2 47 108 M times replica; M=, 500, 0, or any desired number N N Make a tree Make a tree Make a tree for each replica 99 86 Make a consensus tree based on majority rule, numbers indicates the % of supported topology. Usually only >50% branches are shown. Page & Holme (1998) Molecular evolution 67 67 Diagram explaining the bootstrapping procedure used in phylogenetic analysis. 68 68 Taxa A B C D E F G 29 72 7 86 45 Characters 1 2 3 4 5... N 98 21 2 47 108 Jackknife method Take out X% of characters, X=30, 50, or any number M times replica; M=, 500, 0, or any desired number Primate mtdna phylogeny MP criteria 5 Bootstrap 0 replicates 6 1 2 3 4 7 N-X characters N-X characters Make a tree Make a tree Make one tree for each replica Make a consensus tree based on majority rule, numbers indicates 99 the % of supported topology Usually only >50% branches are shown. 86 1234567 Freq % ------------------------...** 0.00.0%...**** 999.00 99.9%...*** 945.00 94.5%.**... 497.33 49.7%..***** 484.83 48.5% Gorilla grouped with Homo/Pan or other primates Diagram explaining the Jackknife procedure used in phylogenetic analysis. 69 69 70 Pseudo-resampling. Bootstrapping An approximate measure of repeatability and accuracy of data. Will not correct the inconsistency caused by reconstruction method. The tree constructed in every replicate still depend on the methods you choose, therefore subject to associated problems Decay/support index Also called Bremer support (Bremer 1988, 1994; Donoghue et al. 1992) An index of support calculating the difference in tree lengths between the shortest trees that contain versus lack a specific group. Constraint trees can be generated by AutoDecay. 71 71 72 72

Method of calculation 1. Obtain most parsimonious tree(s) (MP) Generate a strict consensus tree 2. Obtain all trees one step longer (MP + 1) Generate a strict consensus tree If branch is not supported, Decay index = 1 3. Obtain all trees one step longer (MP + 2) Generate a strict consensus tree If branch is not supported, Decay index = 2 4. Obtain all trees one step longer (MP + 3) Generate a strict consensus tree If branch is not supported, Decay index = 3 Bremer Support (decay index) the number of extra steps it takes to collapse a group example: 1 MP tree, 110 steps Haliperp HPE232926 Arabaren AAU43231 Arabsuec ASU43227 Arabthal ATU43224 Arabsuec ASU43226 Arabthal ATH232900 Capsrube CRU232912 Capsrube CRU232913 73 73 74 74 Bremer Support (decay index) 3 trees <= 111 steps Bremer Support (decay index) 3 trees <= 111 steps Haliperp HPE232926 Arabaren AAU43231 Arabsuec ASU43227 Arabthal ATU43224 Arabsuec ASU43226 Arabthal ATH232900 Capsrube CRU232912 Capsrube CRU232913 Haliperp HPE232926 Arabaren AAU43231 Arabsuec ASU43227 Arabthal ATU43224 Arabsuec ASU43226 Arabthal ATH232900 Capsrube CRU232912 Capsrube CRU232913 75 75 76 76 Bremer Support (decay index) Bremer Support (decay index) 5 trees <= 117 steps Haliperp HPE232926 Arabaren AAU43231 Arabsuec ASU43227 Arabthal ATU43224 Can not be larger than the branch length No direct connection to branch length otherwise Quantification of support in a parsimony framework 10 13 6 13 Haliperp HPE232926 Arabaren AAU43231 Arabsuec ASU43227 Arabthal ATU43224 Arabsuec ASU43226 Arabthal ATH232900 1 2 Arabsuec ASU43226 Arabthal ATH232900 Capsrube CRU232912 Capsrube CRU232913 15 26 Capsrube CRU232912 Capsrube CRU232913 77 77 78 78

Methodological concerns Effects of sampling Sampling problem Performance of phylogenetic methods under computer simulation 0 change 3 changes Makes No.2 a fast evolving taxon 79 (Page & Holmes, 1998) 80 80 Long branch attraction Simulation analysis Joe Felsenstein (1978, Syst. Zool. 27: 401-410) UPGMA Parsimony Page & Holme (1998) Molecular evolution 81 81 Page & Holme (1998) Molecular evolution 82 82 Equal rate of evolution Unequal rate of evolution Page & Holme (1998) Molecular evolution 83 83 Page & Holme (1998) Molecular evolution 84 84

Parsimony Weighted parsimony UPGMA Lake!s invariants Maximum likelihood (Jukes-Cantor) Maximum likelihood (Kimura) References of phylogenetics Graur, D. and W.-H. Li. 2000. Fundamentals of Molecular Evolution. 2nd ed., Sinauer Assoc., Sunderland, MA, USA. Hall, B. G. 2004. Phylogenetic trees made easy: a how-to manual, 2nd ed. Sinauer Assoc., Sunderland, MA. Hillis, D. M., C. Moritz, and B. K. Mable (eds) 1996. Molecular systematics. Sinauer Assoc., Sunderland, MA. Page, R. D. M., and E. C. Holmes. 1998. Molecular evolution - A phylogenetic approach. Blackwell Science Ltd, Oxford, the United Kingdom. Yang, Z. H. 2006. Computational molecular evolution. Oxford University Press. Weighted least squares Swofford et al. (1996) Molecular Systematics 85 85 86 86