Multiple sequence alignment global local Evolutionary tree reconstruction Pairwise sequence alignment (global and local) Substitution matrices Gene Finding Protein structure prediction N structure prediction Database searching BLS Sequence statistics omputational genomics Phylogeny: Phylogeny n evolutionary tree. hypothesis concerning the evolutionary history of a group of taxa and their ancestors. axon: a unit of classification strain, species, individual, gene contemporary taxa = leaves of tree ancestral taxa = internal nodes of tree taxa are also called OUs (Operational taxonomic units) Properties of rees Leaf nodes contemporary taxa Internal nodes - ancestral taxa opology relationships between species Branch lengths degree of change Ernst Haeckel (1834-1919) Properties of rees If the mutation rate is constant in all lineages (molecular clock hypothesis), the branch lengths are proportional to time. B D rees represent the order of branching only F E G D F E G B 1
Which of the following is different? ooted vs unrooted trees: B D E B D E root: common ancestor arp rout Zebrafish Salmon E D B D E B Human Mouse hicken Salmon Unroooted trees give no information about the order of speciation events Various types of trees you will see Which is different? B B D E F D F E D E F B D E B F Gene rees vs Species rees lose-up view of divergence Gene sequences can be used to infer the history of speciation infer the history of gene families aveat emptor: the history of the gene, may not be the same as the history of the organism gene duplications horizontal gene transfer Modified from Hennig, W. (1966) Phylogenetic Systematics 2
Species relationships Why phylogeny reconstruction? Some applications Fly mphioxus Hagfish Lamprey Shark Bony fish Mammals Narayanan., MU Dating events Insulin growth factor receptors ime of duplication Gene family history Insulin receptors mphioxus Hagfish Lamprey Bony fish Mammals Narayanan., MU E Worm, DM-Fly, -mosquito, MP- mphioxus, OM-rout, PO-Flounder, D- Zebrafish, XL - Frog, N rat, MM Mouse, HS - Human E Worm, DM-Fly, -mosquito, MP- mphioxus, OM-rout, PO-Flounder, D- Zebrafish, XL - Frog, N rat, MM Mouse, HS - Human haracter evolution Biogeography 3
omparing ecology and evolution: Which insects are eating which tropical shrubs? Forensics ubiaceae Psychotria Moraceae Ficus Euphorbiaceae Macaranga George Weiblen, U. Minnesota Which moths are eating which tropical shrubs? Plant phylogeny vs. faunal similiarity Is there phylogenetic structure in faunal similarity? patterns of herbivore association = feeding = not feeding alanga sexipunctalis (rambidae) Macaranga Oiketicus sp. (Psychidae) Psychotria host plant phylogeny George Weiblen, U. Minnesota host plant phylogeny faunal similarity phenogram lustering of host plants based on faunal similarity corresponds poorly with host plant phylogeny Nnone the less, some host plant clades have very similar caterpillar faunas (e.g. Macaranga and Psychotria) George Weiblen, U. Minnesota How to root an unrooted tree Outgroup If the mutation rate is constant in all lineages, the distance from root to leaf leaf is the same for all leaves. We can obtain a rooted tree algorithmically. Otherwise, use an outgroup, a taxon that is distantly related to all other leaf taxa. merican Scientist 4
Phylogeny reconstruction Given data observations of contemporary taxa, reconstruct the evolutionary history. Data for phylogeny reconstruction Morphology Behavior Biochemistry Molecular and sequence data haracter data: shared characteristics Distance data: difference between species haracter data Distance data Bees Moths nts entipedes Primitive character:wingless Bees Moths nts entipedes Bees Moths nts entipedes Bees Moths nts entipedes Bees Moths nts entipedes merican Scientist merican Scientist 5
Other examples: Niche (e.g., what finches eat) Biochemistry: serum:anti-serum reactions. Behavior: Firefly flashing patterns merican Scientist Multiple Sequence lignment Questions trees can address: Glb2;Sgl; ~~~~LEKQELLKQSWEVLKQNIPHSLLFLIIEPESKYVFSFLKDS Glb2;Sgl; ~~~MLEKQELLKQSWEVLKQNIPHSLLFLILEPESKYVFSFLKDS Glb2;Sgl; ~~~MLEQELLKQSWEVLKQNIPGHSLLFLIIEPESKYVFSFLKDS Glb2;Sgl; ~~~~~~~~~~ELLKQSWEVLKQNIPGHSLLFLIIEPESKYVFSFLKDS HUMN BI PIG HIK Four class 2 globins from asuarina glauca MKWVFISLL FLFSSYSG V..FD.H KSEVHFKD LGEENFKLV MKWVFISLL FLFSSYSG V..FE.H KSEIHFND VGEEHFIGLV ~~WVFISLL FLFSSYSG V..FD.Y KSEIHFKD LGEQYFKGLV MKWVLISFI FLFSSSN LQFDEH KSEIHYND LKEEFKV Which taxa are most closely related? What is the ancestral state? Where is rapid change occuring When did lineages diverge? lbumin in four species Evolutionary ree econstruction Given observations similarities and differences between k species, find the best hypothesis (tree) of their evolutionary history Maximum Parsimony: nature is thrifty he best tree requires the fewest mutations. e.g., jaws were only invented once backbone skulls riteria for evaluating which tree best fits the data: Maximum parsimony (character data) Minimum evolution (distance data) Maximum Likelihood (character data) tetrapody terrestrial animals bony skeletons fish sharks jaws hagfish lampreys 6
Maximum Parsimony Problem: Not all characters are parsimonious Parsimony score: minimum number of mutations needed to explain data ssumptions Selection dominates -> Few changes No multiple substitutions -> Sites are independent Duck Platapus Bat Platapus Duck Bat wings wings Platypus Platypus Duck Platapus Duck wings Platypus If the mutation rate is high, sequence data is not parsimonious rue tree: -> -> -> Bat Platapus Placenta Placenta Platypus placenta Bat Most parsimonious, but false, tree: -> -> Given a tree topology ssociate characters with leaves of tree Find the optimal labeling of internal nodes ount mutations (1) (2) (3) G 7
(1) (2) (3) (1) (2) (3) G G (1) (2) (3) (1) (2) (3) 000 010 100 010 001 G Parsimony score: 4 G Note: there can be more than one most parsimonious tree (1) (2) (3) 100 010 010 101 _ 100 010 001 _ Given a tree topology ssociate characters with leaves of tree Find the optimal labeling of internal nodes ount mutations o find the optimal tree, we need to consider all topologies. How many are there? 8
How many unrooted trees with k leaves? Number of unrooted trees for k taxa hree taxa Four axa k E(k) (k) 3 3 1 4 5 3 E( k) = E(( k 1) + 2 ( k) = k 1 i= 3 (2i 3) Five taxa 5 7 15 (2k 5)! ( k) = k 3 2 ( k 3)! he number of trees gets big fast How do you find the optimal tree? Number of leaves 3 4 5 6 10 20 50 500 Number of unrooted binary trees 1 3 15 105 2,027,025 2.2 x 10 20 2.8 x 10 74 1 x 10 1074 1. Exhaustive search (<12 taxa) (Phylogeny reconstruction is NP-complete.) How do you find the optimal tree? How do you find the optimal tree? Method Exhaustive search esult Optimal ime (k) ypical k 12 2. Branch-and-bound (<18 taxa) Note that the parsimony score is non-decreasing as you add edges = infinity, L = {3} BB(L) For each tree, t, in L If t has k leaves If Score(t) <, = Score(t). Else if Score(t) >, return //Bound Else NewL = empty set. //Branch For every edge in t» t = t plus a new edge» NewL = L U {t } BB(NewL) 9
How do you find the optimal tree? How do you find a pretty good tree? Method Exhaustive search Branch and bound esult Optimal Optimal ime (k) (k) ypical k 12 18 3. Heuristic search Search for optimal trees by finding good trees and then rearranging them in the hopes of finding an even better tree Heuristic search Global optimum Suboptimal island of trees Branch swapping Nearest-neighbour interchange (NNI) Starting trees reespace Branch swapping Branch swapping Subtree pruning and regrafting (SP) ree-bisection reconnection (B) 10
How do you find the optimal tree? Pairwise sequence alignment (global and local) Method Exhaustive search Branch and bound Heuristic search esult Optimal Optimal Suboptimal ime (k) (k) You choose ypical k 12 18 Multiple sequence alignment global local Substitution matrices Gene Finding Database searching BLS Sequence statistics Evolutionary tree reconstruction Protein structure prediction N structure prediction omputational genomics 11