Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe ssvkdlneakvlavgpgaldkdgkrlpmgvnagdrvl ipqyggspvkvgeeeytlfrdseilakiae > nidulans msllrnvknlaplldrvlvqrvkpeaktasgiflpes svkeqneakvlavgpgavdrngqripmgvaagdrvlv pqfggsplkigeeeyhlfrdseilakine > pombe (fission yeast) matklksaksivplldrilvqrikadtktasgiflpe ksveklsegrvisvgkggynkegklaqpsvavgdrvl lpayggsnikvgeeeyslyrdhellaiike > alpina masritkfsktivpmmdrvlvqrikpqqktasgiyip ekaqealnegyvvavgkglttqegkvvpselaegdkv llppyggsvvkvdneelilfreseilakiq > cohnii matgiakrftplldrvlvqrlkpeaktasglflpesa akapnyatvlavgpggrtrdgdilpmnvkvgdkvvvp eyggmtlkfedeefqvfrdadimgilne > melanogaster maaaikkiipmldriliqraealtktkggivlpekav gkvlegtvlavgpgtrnastgnhipigvkegdrvllp efggtkvnlegdqkelflfresdilakle > sapiens agqafrkflplfdrvlversaaetvtkggimlpeksq gkvlqatvvavgsgskgkggeiqpvsvkvgdkvllpe yggtkvvlddkdyflfrdgxilgky > stearothermophilus vlkplgdrvvievieteektasgivlpdtakekpqeg rvvavgkgrvldsgervapevevgdriifskyagtev kydgkeylilresdilavig > tuberculosis makvnikpledkilvqaneaetttasglvipdtakek pqegtvvavgpgrwdedgekripldvaegdtviysky ggteikyngeeylilsardvlavvsk > musculus (house mouse) magqafrkflllfdrvlversaaetvtkggimlpeks qgkvlqatvvavgsggkgksgeiepvsvkvgdkvllp eyggtkvvlddkdyflfrdsdilgkyvn 1
Multiple Sequence Alignment (MSA) Why MSA? Proteins are often related to a larger group (i.e., a family) of proteins Multiple sequence alignment is more sensitive than pairwise alignment for detecting homologs MSAs can elucidate conserved residues, motifs, or other functional regions in a protein MSA is critical for phylogenetic analysis Selection of sequences Multiple sequence alignment of sequences Tree building Tree evaluation 2
Pairwise Alignment 3-sequence Alignment AGA AGT TCC A G A T C C A G T 0 0 0 0 0 5 0 0 0 0 10 4 0 5 4 6 3
Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe ssvkdlneakvlavgpgaldkdgkrlpmgvnagdrvl ipqyggspvkvgeeeytlfrdseilakiae > nidulans msllrnvknlaplldrvlvqrvkpeaktasgiflpes svkeqneakvlavgpgavdrngqripmgvaagdrvlv pqfggsplkigeeeyhlfrdseilakine > pombe (fission yeast) matklksaksivplldrilvqrikadtktasgiflpe ksveklsegrvisvgkggynkegklaqpsvavgdrvl lpayggsnikvgeeeyslyrdhellaiike > alpina masritkfsktivpmmdrvlvqrikpqqktasgiyip ekaqealnegyvvavgkglttqegkvvpselaegdkv llppyggsvvkvdneelilfreseilakiq > cohnii matgiakrftplldrvlvqrlkpeaktasglflpesa akapnyatvlavgpggrtrdgdilpmnvkvgdkvvvp eyggmtlkfedeefqvfrdadimgilne > melanogaster maaaikkiipmldriliqraealtktkggivlpekav gkvlegtvlavgpgtrnastgnhipigvkegdrvllp efggtkvnlegdqkelflfresdilakle > sapiens agqafrkflplfdrvlversaaetvtkggimlpeksq gkvlqatvvavgsgskgkggeiqpvsvkvgdkvllpe yggtkvvlddkdyflfrdgxilgky > stearothermophilus vlkplgdrvvievieteektasgivlpdtakekpqeg rvvavgkgrvldsgervapevevgdriifskyagtev kydgkeylilresdilavig > tuberculosis makvnikpledkilvqaneaetttasglvipdtakek pqegtvvavgpgrwdedgekripldvaegdtviysky ggteikyngeeylilsardvlavvsk > musculus (house mouse) magqafrkflllfdrvlversaaetvtkggimlpeks qgkvlqatvvavgsggkgksgeiepvsvkvgdkvllp eyggtkvvlddkdyflfrdsdilgkyvn Multiple Sequence Alignment 4
Pairwise Alignment Scores Schizoscchrmycs 49 46 78 45 55 54 44 38 37 42 52 41 40 43 46 44 41 39 43 43 48 45 45 40 40 38 39 42 53 55 41 41 40 40 43 46 40 43 38 39 61 43 34 36 45 49 42 36 49 37 32 93 59 38 32 Guide Tree 5
Constructing a Guide Tree Unweighted pair group method with arithmetic mean (UPGMA) Neighbor joining (NJ) Unweighted Pair Group Method with Arithmetic mean (UPGMA) Assume each organism is its own group Repeat the following step Merge together the two closest groups 6
Unweighted Pair Group Method with Arithmetic mean (UPGMA) Unweighted Pair Group Method with Arithmetic mean (UPGMA) 7
Unweighted Pair Group Method with Arithmetic mean (UPGMA) Unweighted Pair Group Method with Arithmetic mean (UPGMA) 8
Unweighted Pair Group Method with Arithmetic mean (UPGMA) Unweighted Pair Group Method with Arithmetic mean (UPGMA) 9
Unweighted Pair Group Method with Arithmetic mean (UPGMA) Unweighted Pair Group Method with Arithmetic mean (UPGMA) 10
Unweighted Pair Group Method with Arithmetic mean (UPGMA) Unweighted Pair Group Method with Arithmetic mean (UPGMA) 11
Unweighted Pair Group Method with Arithmetic mean (UPGMA) Guide Tree 12
Neighbor Joining (NJ) Generate full tree with starlike structure Repeat the following step Connect two closest groups (i.e., neighbors) through a single node Neighbor Joining 13
Neighbor Joining Neighbor Joining 14
Neighbor Joining Neighbor Joining 15
Neighbor Joining Neighbor Joining 16
Neighbor Joining Neighbor Joining 17
Neighbor Joining Multiple Sequence Alignment 18
Multiple Sequence Alignment Multiple Sequence Alignment 19
What can phylogeny do for you? Why do we care about evolution and the evolutionary history of organisms? OR How do we benefit from phylogeny? AND How is bioinformatics related to any of this? What are the goals of phylogeny? All life forms share a common origin and are part of the Tree of Life 1) Deduce correct trees of life for all species 2) Infer or estimate divergence times How can we use phylogenetic analyses? 20
Revolutionalizing the Tree of Life Carl Woese: rrna IDs Archaea as separate branch of Tree of Life Discovering new life forms 21
Developing effective snakebite antivenins Identifying emergent diseases 22
Protecting ecosystems from invasive species Eurasian water milfoil Purple loosetrife Caulerpa taxifolia Common phylogenetic tree terminology Branches or Lineages A B Terminal Nodes operational taxanomic units (OTUs) Ancestral Node or ROOT of the Tree Internal Nodes hypothetical taxanomic units (HTUs) C D E Represent the TAXA (genes, populations, species, etc.) used to infer the phylogeny 23
Phylogenetic trees can be drawn many ways Clade: group with a single common ancestor and its descendents A B C D E A-B-C clade B-C clade D-E clade 24
Phylogenetic trees can be rooted or unrooted A B B C Rooted C D D Unrooted A Specifies evolutionary path Root node is most recent common ancestor of all TUs; specifies time flow Shows degree of kinship Doesn t make assumptions or require knowledge of common ancestor Phylogenetic trees can be scaled or unscaled C B C B D A D A Unscaled Branch length not proportional to number of changes/distance Scaled Branch length proportional to number of changes/distance Cladogram Phylogram 25
Phylogenetic trees diagram evolutionary relationships B C A D No meaning to the spacing between the taxa, or to the order in which they appear from top to bottom. E 1) No scale (cladograms) 2) Proportional to genetic distance (phylograms) 3) Proportional to time (ultrametric trees) Rotating clades: same meanings B C A D E A B C = E D 26
Interpreting phylogenetic trees Is the frog more closely related to the fish or the human? How are phylogenetic trees built? Traditionally: use homologous structures Caveats: - Closely related organisms don t always look similar - Similar looking organisms not always closely related - How do you decide importance of traits? 27
Structural analogy can result from convergent evolution Classification based on traits can be tricky cell number organelles 28
Molecular phylogenetic trees Large molecular data sets: Bioinformatics! Molecular clock vs. punctuated equilibrium Eliminates analogy and trait selection issues Result: great improvement on classical phylogenies Caveat: Gene divergence may not correlate with species divergence Molecular phylogenies can be constructed using different elements Nuclear genes Mitochondrial DNA Genome structure Reasonably well conserved, present in common ancestors Usually integrate analyses of multiple different genes 29
Molecular comparisons vs. body plans = = Which species are the closest living relatives of modern humans? Gorillas Chimpanzees Bonobos Orangutans Humans Humans Chimpanzees Bonobos Gorillas Orangutans 15-30 MYA 0 14 MYA 0 Pre-molecular view Great apes (chimpanzees, gorillas and orangutans) formed a clade separate from humans. MitoDNA, most nuclear genes, and DNA hybridization Bonobos and chimpanzees are related more closely to humans than either are to gorillas. 30
What is the closest living relative of whales? Phylogenetic trees are hypotheses How do you construct phylogenetic trees? What computational strategies are used? How do you test the robustness of hypotheses? 31