r. Walter Salzburger The tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 2 1. Molecular Markers
Inferring Molecular Phylogeny 3 Immunological comparisons! Nuttall & Uhlenhuth (early 20th century): blood relationships between species! cross reaction between sera and anti-sera! strategy: the degree of similarity reflects the strength of the evolutionary relationships Inferring Molecular Phylogeny 4 vise (1994)
Inferring Molecular Phylogeny 5 Protein electrophoresis! developed by Hunter & Markert (1957)! non-denatured proteins with different net charges migrate at different rates through starch or acrylamide gels! histochemical stains specific for enzymes under assay! zymograms are interpretable in terms of Mendelian genotypes Inferring Molecular Phylogeny 6
Inferring Molecular Phylogeny 7 vise (1994) Inferring Molecular Phylogeny 8 Restriction endonucleases! Linn & rber*, Meselson & Yuan (1968): discovery of restriction endonucleases (enzymes) i.e., precise scalpels to cut double-stranded N at specific motifs *Nobel Prize in Physiology or Medicine 1978
Inferring Molecular Phylogeny 9 Inferring Molecular Phylogeny 10 Recombinant N technology
Inferring Molecular Phylogeny 11 Restriction digestion! restriction digestion profiles! phylogenetic, population genetic markers! presence/absence matrix! used for: mitochondria, plastids, whole genomes, after PR amplification, etc. Inferring Molecular Phylogeny 12 Restriction fragment length polymorphism
Inferring Molecular Phylogeny 13 Inferring Molecular Phylogeny 14 mplified fragment length polymorphism
Inferring Molecular Phylogeny 15 N-N hybridization!...relies on the double-stranded nature of N!...and that complementary strands are held together by hydrogen bonds! when N is heated, it melts into single strands, if it is cooled, it re-associates! data: thermal elusion profiles Inferring Molecular Phylogeny 16 vise (1994)
Inferring Molecular Phylogeny 17 issociation curves! homoduplex hybridization (intraspecific)! heteroduplex hybridization (interspecific)! genetic distance ~ "Thomo - "Thetero heteroduplex ostrich-rhea vise (1994) homoduplex ostrich Inferring Molecular Phylogeny 18 Sibley & hlquist (1980s) Sibley & hlquist (1990)
Inferring Molecular Phylogeny 19 N Sequencing! Walter ilbert and Fred Sanger develop techniques for N sequencing (1977) Walter ilbert (1932-) Fred Sanger (1918-) Nobel Prize in hemistry 1980 Inferring Molecular Phylogeny 20 Polymerase hain Reaction (PR)! In the 1980s, Kary. Mullis invents and helps to develop further the PR Kary. Mullis (1944-) Nobel Prize in hemistry 1993
Inferring Molecular Phylogeny 21 2. Inferring Phylogenies ocuments of volutionary History 22 space...ttt......ttt......ttt......ttt......ttt......ttt......ttt......ttt......ttt......ttt......ttt......ttt......ttt......ttt......tttt......tttt......tttt......tttt......tttt......tttt... time...tt......tt......tt......tt......tttt......tttt......tttt......tttt......tttt......tttt......tttt......tttt...
ocuments of volutionary History 23...TTT......TT......TTT......TTTT......TTTT......TTTT... time Inferring Molecular Phylogeny 24 TTT TTT TTTT TTTT raw N sequences alignment* TTT TTT TTT--T TTT--T gap aligned N sequences *lignment: inferring homology at the N sequence level
Inferring Molecular Phylogeny 25 TTT TTT phylogeny reconstruction TTT--T TTT--T molecular phylogeny aligned N sequences Inferring Molecular Phylogeny 26 2.1. Sequence lignment
Inferring Molecular Phylogeny 27 Homology!...similarity between characters due to shared ancestry! Two nucleotides in different sequences are homologous if (and only if) the sequences both acquired that state from their common ancestor homologous character homoplasious character Inferring Molecular Phylogeny 28 lignment!...set of homologous sequences in which every nucleotide position is homologous! to align : inferring homology at the sequence level HO1! 1! K1! R1!! TTTT! TTTT! TTTT! TTTT HO1! 1! K1! R1!! TTTT! TTTT! TT-TT! TT-TT gap
Inferring Molecular Phylogeny 29 Pairwise alignment I: gaps Sequence1 TTTT Sequence2 TTT 1 Sequence1 TTTT Sequence2! TTT 2 Sequence1 T---TTT Sequence2 TTT 2 1 T!!!!!!!!! T!!!!!!!!!!!! T!!!!! T T T T dot plot Inferring Molecular Phylogeny 30 Pairwise alignment II: sequences that differ Sequence1 TTT Sequence2 TTT 1 Sequence1 TTT Sequence2! TTT 1 T!!!!!! T!!!!!!!!!! T!!!! T T T
Inferring Molecular Phylogeny 31 Pairwise alignment III: the cost of an alignment number of substitutions cost = s + wg total length of gaps gap penalty w = 1 : gap is as expensive as a substitution w = 2 : gap is twice as expensive as a substitution Inferring Molecular Phylogeny 32 Pairwise alignment IV: evaluating alternatives Sequence1 TTTT Sequence2 TT 1 Sequence1 TTTT Sequence2 T-T 2 Sequence1 T--TTT Sequence2! TT 2 1!! T!!!!!!!!!!!!!!!! T!!!!! T T T T dot plot
Inferring Molecular Phylogeny 33 Pairwise alignment IV: evaluating alternatives Sequence1 TTTT Sequence2 TT = s + wg 1 Sequence1 TTTT Sequence2 T-T 2 Sequence1 T--TTT Sequence2! TT (w=1) = 2 + 1 x 1 = 3 (w=3) = 2 + 3 x 1 = 5 (w=1) = 0 + 1 x 2 = 2 (w=3) = 0 + 3 x 2 = 6 Inferring Molecular Phylogeny 34 LST: asic Local lignment Search Tool! http://www.ncbi.nlm.nih.gov/lst/!...finds regions of local similarity between nucleotide or protein sequences and calculates the statistical significance of matches!...uses databases in enank*!...see official NI handbook, chapter 16 (on course web page) *nucleotide and protein database of the National enter for iotechnolgy Information
Inferring Molecular Phylogeny 35 Multiple alignment! sum-of-pairs: minimizing the costs of all pairwise alignments (e.g., computer program lustalw)! tree alignment: uses phylogenetic information! star alignment: all sequences are equally related! tree alignment: phylogenetic relationships between sequences are taken into account Inferring Molecular Phylogeny 36 Protein alignments: PM and LOSUM matrices S T P N Q H R K M I L V F Y W 9 ysteine S -1 4 T -1 1 5 P -3-1 -1 7 Hydrophilic 0 1 0-1 4-3 0-2 -2 0 6 N -3 1 0-2 -2 0 6-3 0-1 -1-2 -1 1 6 cid- -4 0-1 -1-1 -2 0 2 5 amide Q -3 0-1 -1-1 -2 0 0 2 5 H -3-1 -2-2 -2-2 1-1 0 0 8 R -3-1 -2-2 -1-2 0-2 0 1 0 5 asic K -3 0-1 -1-1 -2 0-1 1 1-1 2 5 M -1-1 -2-2 -1-3 -2-3 -2 0-2 -1-1 5 I -1-2 -3-3 -1-4 -3-3 -3-3 -3-3 -3 1 4 Hydro- L -1-2 -3-3 -1-4 -3-4 -3-2 -3-2 -2 2 2 4 phobic V -1-2 -2-2 0-3 -3-3 -2-2 -3-3 -2 1 3 1 4 F -2-2 -4-4 -2-3 -3-3 -3-3 -1-3 -3 0 0 0-1 6 Y -2-2 -3-3 -2-3 -2-3 -2-1 2-2 -2-1 -1-2 -1 3 7 romatic M -2-3 -4-4 -3-2 -4-4 -3-2 -2-3 -3-1 -2-1 -3 1 2 11 LOSUM62 PM... Position ccepted Mutation LOSUM... Locks SUbstitution Matrix
Inferring Molecular Phylogeny 37 2.2. Phylogenetic Methods! istance Methods! UPM! Neighbor joining! Minimum volution " Maximum Parsimony " Maximum Likelihood " ML! ayesian Inference genetic distance Seq Seq Seq Seq 3 - - Seq 5 4 - Seq 5 4 2 nucleotide sequence 1 2 3 4 5 6 7 Seq T T T T Seq T T T Seq T Seq T Inferring Molecular Phylogeny 38 iscrete character vs. distance method 2 1 2 1 1 genetic distance Seq Seq Seq Seq 3 - - Seq 5 4 - Seq 5 4 2 1 2 T> T> 3 4 5 6 T>T> T> >T 7 >T nucleotide sequence 1 2 3 4 5 6 7 Seq T T T T Seq T T T Seq T Seq T
luster methods: step-by-step approach starting tree 1 add next sequence Round 1 Round 2 starting tree 2 place next sequence add next sequence? place next sequence? Inferring Molecular Phylogeny 39 Optimally criterion: choose among all possible trees 6 5 4 7 2 3 1 4 5 3 2 3 2 4 7 Inferring Molecular Phylogeny 40
Inferring Molecular Phylogeny 41 Optimally criterion: too many trees problem... number of taxa number of trees (unrooted) number of trees (rooted) 2 1 1 3 1 3 4 3 15 5 15 105 6 105 945 7 945 10395 8 10395 135135 9 135135 2027025 10 2027025 34459425 Inferring Molecular Phylogeny 42 Type of data istances Nucleotides Tree building method lustering algorithm Optimally criterion UPM Neighbor joining Minimum volution Maximum Parsimony Maximum Likelihood
Inferring Molecular Phylogeny 43 UPM: unpaired group method with arithmetic means distance matrix - 2-6 6-10 10 10 - Sequence TTT Sequence TT 1 1 1 0 0 1 1 1 1 1 haracter Taxon Taxon pigment 1 0 fins 1 1 eyes 1 1 teeth 0 1 Inferring Molecular Phylogeny 44 UPM: unpaired group method with arithmetic means 1 distance matrix 2 1 2 2-2 - 6 6-10 10 10-5 3 6 6 10 10 10 5 4 3 2 1 0 ultrametric tree
Inferring Molecular Phylogeny 45 Neighbor joining (NJ) 5 1 distance matrix 6 1 2-6 - 7 3-14 10 9-1 6 7 3 14 9 10 1 additive tree Inferring Molecular Phylogeny 46 Minimum evolution (M) total number of branches in a tree of n sequences 2n-3 tree length L =!ei i=1 individual branch length! The minimum evolution tree is the one that minimizes L
Inferring Molecular Phylogeny 47 Minimum evolution (M): example Human himp orilla Orang-utan ibbon Human 79 92 144 162 himp 79 95 154 169 orilla 92 102 150 169 Orang-utan 144 154 150 169 ibbon 163 173 169 169 pairwise distances between hominoid sequences observed calculated Inferring Molecular Phylogeny 48 Minimum evolution (M): example ibbon Orang-utan 75 94 orilla 49 26 Human 8.5 34.5 44.5 himpanzee
Inferring Molecular Phylogeny 49 Maximum Parsimony (MP)! Maximum parsimony principle: preference for the least complex explanation for an observation! in phylogenetics: choosing the tree that requires the fewest evolutionary changes ( most parsimonious tree )...! i.e., the tree that requires the fewest mutational steps...! i.e., the tree with the shortest tree length Inferring Molecular Phylogeny 50 Maximum Parsimony (MP) number of nucleotide sites k tree length L =!li i=1 tree length for an individual site! The most parsimonious tree is the one that minimizes L
Inferring Molecular Phylogeny 51 Maximum Parsimony (MP) alignment Site 1 2 3 4 5 Taxon T T T Taxon T T Taxon T Taxon T Inferring Molecular Phylogeny 52 Maximum Parsimony (MP) alignment Site 1 2 3 4 5 Taxon T T T Taxon T T Taxon T Taxon T Site 1 change 1 change 2 changes 2 changes
Inferring Molecular Phylogeny 53 Maximum Parsimony (MP) alignment Site 1 2 3 4 5 Taxon T T T Taxon T T Taxon T Taxon T Site 1 1 change T T Site 2 1 change Site 3 Site 4 T Site 5 T 1 change T 1 change T 0 change T Inferring Molecular Phylogeny 54 Maximum Parsimony (MP) Sites Tree 1 2 3 4 5 Total ((,),(,)) 1 1 2 1 0 5 ((,),(,)) 2 2 1 1 0 6 ((,),(,)) 2 2 2 1 0 7