G4120: Introduction to Computational Biology

Size: px
Start display at page:

Download "G4120: Introduction to Computational Biology"

Transcription

1 ICB Fall 2004 G4120: Introduction to Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology Copyright 2004 Oliver Jovanovic, All Rights Reserved.

2 Alignment Alignment (MSA) A multiple sequence alignment is an alignment of a set of sequences with structurally similar and evolutionarily homologous residues aligned in columns. In an ideal alignment, columns of aligned amino acid residues would have similar locations in the 3D structure of a protein and would diverge from a common ancestral residue. In theory, an unambigously correct evolutionary alignment exists, but can be difficult to infer and computationally intensive to calculate. Where structural data is lacking or limited, as is generally the case, it is not possible to unambiguously identify structurally similar positions. Thus, defining a single unambiguous ideal alignment can be very difficult.

3 Alignment Algorithms Dynamic Programming vs. Heuristic Alignment Using dynamic programming algorithms (such as Smith-Waterman or Needleman-Wunsch) to perform an optimal alignment of more than a few sequences is computationally intensive, and generally impractical for large sets of sequences or lengthy sequences. As a result, most commonly used multiple sequence alignment algorithms take a heuristic approach. One common heuristic approach is progressive alignment, in which the problem is broken down into a series of pairwise alignments. The details of how to choose the initial pair to align, how to score alignments, how to align subsequent sequences, and whether subfamilies of alignments should be created can all vary. MSA (Dynamic) This algorithm uses a technique that reduces the complexity of dynamic programming when applied to multiple sequences, and can give an optimal alignment for not more than ten short ( a.a.) protein sequences in a reasonable amount of time. For alignments with more or longer sequences, a heuristic approach is more practical. Feng-Doolittle (Heuristic) One of the first progressive alignment algorithms. It does not take advantage of profiles, which can increase the accuracy of the alignment. ClustalW (Heuristic) A profile based progressive alignment algorithm which uses a number of heuristics to rapidly generate multiple sequence alignments, including phylogeny and scalable gap penalties.

4 Sequence Definitions Identity The extent to which two sequences are invariant. Similarity The extent to which sequences are related, based on sequence identity and/or conservation. Conservation Changes in an amino acid sequence that preserve the biochemical properties of the original residue. This is measured in most sequence comparison algorithms by substitution matrices in which scores for each position are derived from observations of the frequencies of substitutions in blocks of local alignments in related proteins.

5 Alignment with Text Monospaced or Fixed Width vs. Variable Width Fonts Each character in a monospaced (or fixed width) font takes up the same amount of horizontal space, like early typewriter fonts, allowing multiple sequence alignments to properly align. Variable width fonts can throw off multiple sequence alignments. Fixed Width fonts in OS X: Andale Mono, Courier, Courier New, Monaco, V100 Fixed Width Font Alignment (Courier):... m s h N q f q f i G n L t r D M A s R G v N K V I L V G n L G q D M A v R G I N K V I L V G R L G k D Variable Width Font Alignment (Times):... m s h N q f q f i G n L t r D M A s R G v N K V I L V G n L G q D M A v R G I N K V I L V G R L G k D

6 Displaying Sequence Data Displaying Information Take care with your choice of fixed or variable width fonts. Use fonts carefully and consistently. Avoid overuse or arbitrary use of fonts. Use black or dark text against a white or very light background (no more than 20% color) to maximize comprehension. Avoid text that blends with a background, and be cautious in using light text on a dark background. Use shading, case, bold, italic or color when appropriate, to add emphasis, contrast, or draw attention to a feature. Avoid displays where everything blends together or lacks contrast. Align items to each other to establish a visual connection. Related items should be grouped in close proximity. Avoid simply placing items arbitrarily. Use color logically and aesthetically. Avoid the overuse of color. References The Mac is Not a Typewriter and The Non-Designer s Design Book by Robin Williams The Visual Display of Quantitiative Information by Edward R. Tufte Type & Layout by Colin Wheildon

7 Alignment with Excel 1 50 RK2... m s h N q f q f i G n L t r D t E V R h g n s n k p q A i f d i A v n E e W R n d a. G d k E. coli M A s R G v N K V I L V G n L G q D P E V R Y m P N G G A V A N i t l A T S E S W R D K a T G E M F M A v R G I N K V I L V G R L G k D P E V R Y I P N G G A V A N L Q V A T S E S W R D K Q T G E M ColIb-P9 M s a R G I N K V I L V G R L G n D P E V R Y I P N G G A V A N L Q V A T S E S W R D K Q T G E M R64 M s a R G I N K V I L V G R L G n D P E V R Y I P N G G A V A N L Q V A T S E S W R D K Q T G E M pip71a M A v R G I N K V I L V G R L G k D P E V R Y I P N G G A V A N L Q V A T S E S W R D K Q T G E i pip231a M A v R G I N K V I L V G R L G k D P E V R Y I P N G G A V A N L Q V A T S E t W R D K Q T G K M!! RK2 q E r T d f f R i k c F G s q A E a h G k Y L g K G s l V f v q G k i R n t k y E k d. G q T v Y E. coli k E Q T E W H R V V L F G K L A E V A s E Y L R K G s Q V Y I E G Q L R T R k W t D q s G q d R Y F R E Q T E W H R V V L F G K L A E V A G E c L R K G A Q V Y I E G Q L R T R S W E D N. G I T R Y ColIb-P9 R E Q T E W H R V V L F G K L A E V A G E Y L R K G A Q V Y I E G Q L R T R S W d D N. G I T R Y R64 R E Q T E W H R V V L F G K L A E V A G E Y L R K G A Q V Y I E G Q L R T R S W d D N. G I T R Y pip71a R E Q T E W H R V V L F G K L A E V A G E Y L R K G A Q V Y I E G Q L R T R S W E D N. G I T R Y pip231a R E Q T E W H R V V L F G K L A E V A G E Y L R K G A Q V Y I E G Q L R T R S W E D N. G I T R Y RK2 g T d f.. i a d k v d y l d t k A p G g s n Q e E. coli t T E v v V n v g G T M Q M L G g r q G g g a p a g g n i g g. G Q P Q s g w g q p q q p q g G n F v T E I L V K T T G T M Q M L v r A a G a q t Q p e e g q Q f s G Q P Q p e p q a E a g t K K G G ColIb-P9 i T E I L V K T T G T M Q M L G s A p q q n a Q a q p k p Q q n G Q P Q s a d a t.... K K G G R64 i T E I L V K T T G T M Q M L G s A p q q n a Q a q p k p Q q n G Q P Q s a d a t.... K K G G pip71a v T E I L V K T T G T M Q M L G r A a G t q t Q p e e a q Q f s G Q P Q p e s q p E p.. K K G G pip231a v T E I L V K T T G T M Q M L G r A a G a q t Q p e e g q Q s a. Q P Q p e p q s E a g t K K G G % Identity % Similarity RK E. coli q f s g G a q s r p q Q s a P a a p s n E p p m d f d. D D I P F! F A K T K G R g R K A A Q P E P Q p Q p P E G d D Y G F S D D I P F! ColIb-P9 A K T K G R g R K A A Q P E P Q p Q t P E G e D Y G F S D D I P F! R64 A K T K G R e R K A A Q P E P Q p Q t P E G e D Y G F S D D I P F pip71a A K T K G R e R K A A Q P E P r q p s e p a.. Y D F d D D I P F pip231a A K T K G R g R K V A Q P E P Q l Q p P E G d D Y G F S D D I P F Can use with any font, as Excel allows you to manually adjust the alignment.

8 ClustalW and ClustalX ClustalW ClustalW first generates a pairwise distance matrix for all the sequences by pairwise dynamic programming alignment. It then estimates evolutionary distance from similarity scores and constructs a guide tree using the neighbor joining distance matrix method. Dynamic progamming is then used to align the most closely related pairs of sequences. A sequence profile is constructed from these alignments, and the remaining sequences are progressively aligned to each other in order of decreasing similarity by profile-profile, profile-sequence or sequence-sequence alignment, until a complete multiple sequence alignment has been generated. ClustalW automatically chooses the optimal scoring matrix for protein alignments based on whether the sequences are close or distant neighbors in the tree. Thus it might use BLOSUM62 (optimal for close relationships) for close neighbors, and BLOSUM45 (optimal for distant relationships) for distant neighbors. ClustalW also allows for scalable gap penalties in protein profile alignments. A gap opening next to a highly conserved residue can be more heavily penalized than a gap opening next to an unconserved residue, for example. ClustalX This is a version of ClustalW with a graphical user interface, which is more intuitive to use, though the formatting requirements for input files need to be followed closely. It can display multiple sequence alignments onscreen, or output them as Postscript, which can automatically be converted to PDF format by OS X 10.3.

9 Alignment with ClustalX CLUSTAL X (1.82) MULTIPLE SEQUENCE ALIGNMENT File: tadafasta.ps Date: Wed Apr 2 12:19: Page 1 of 2 ::.. : * :. : ::: * **:. *: V_fisch MDQNKSIYIEIRAQIFDVLD--AETVN SLSKE--QLHNQLSN AIDLLIERHEWPVSTIVRAEYVTSLVNELQGLGPLQVLM 77 V_fisch MNNNKALYIQLRTQIFNALE--PEALN KLTKQ--ELTQQLSN AVDLLIDREQLPVSLIMKNEYVESLVNELVGLGPLQNLM 77 V_vulnII1_ MNQLKQIYLDLRDEIFDAID--ASTLS EISNE--ELAEQLSE SVNILIDKKQLQVSSLKRAELVKALYDELKGLGPLQKLV 77 Y_pes MIVPLKIQELMRERMLANID--INKVE LLVGDRNKLIGLLSQ TFDDLFNNNEYNLTTQAQKYIIEMIADEITGFGPLRELM 79 Y_ent MLASID--IDQVQ YLVDDYSKLSELLSQ TLDELFNNNDYKLTTQDQKKIITMIADEITGFGPLRELM 65 A_act MLTKQQKILLRSEVLSNLD--IEKID ELQSERSSLVNELVQ IVNRVANKSGAYLTSADTLVMAEIVADEIEGYGPLRDLM 78 H_aph MLTKEQQIFLRSEVLSNLD--IEKID ALQSERNLLVNELVQ IVNRVASKSGTYLTSADTLVMAEIVADEIEGYGPLRDLM 78 P_mul MLTKEQQVFFRNELLSNLD--IEKID EIQSERDKLVDELVQ VVYKVAGKGNIYITSADALFMAECIADEIDGYGPIRELM 78 H_duc MLTKDQQVFFRNALLSNLN--VDTLD EIENERSKLVTELTQ SLYRVANTNNIYITPYDATDMAEIVADEIGGYGPIRELM 78 A_pleur MLTKEQQIFFRTELLSNLD--VEKLD EIQNERNKLIDELTQ SLYRISNLHSIYLTPADAAYMAGLVADEIGGYGPIRELM 78 V_vulnI8_11 MFGN KTQMVNVSRGNPLVMPEAAQTAFEKLIEPSE AVKLTRKQLQQEIKK AVAQLSAQ-QLLPYNQSELAILVEQLCDDMLGVGPIQCLV 89 V_vulnI6_11 MFFKRKNINPEFQEKAAALEAQPSSTISDEVISDIESNVQPIDSNRVEPMQQDKKLLERQAKDKAVEEARKQLEQELAIKHYYHQRLLETLDLGLLSSLEKERAKKDLHDAIVQLMAEDQTHPMSSEGRKRVIKQIEDEVFGLGPLEPLL 150 ruler : :.**::** :::* * : *.. :* :.*:..**:*:. * *:** ****:* *:*::*. :***:*. : :.::: : :.. **:::****:**** ** :* * :*:: V_fisch1 EDESISDIMINGYDKIFIERAGLVEVAPVSFIDEEQLLHIAKRVASQVGRRVDDSSPTCDARLADGSRVNIVIPPIAIDGTSMSIRKFKKDSIGLEKLTEFGALSQEMAQLLMIASRCRLNILISGGTGSGKTTMLNALSQYISEKERIV 227 V_fisch2 DDETITDIMINGHENVFIERDGLVEKVSVNFIDEQQLIDIAKRIASRVGRRVDESSPTCDARLEDGSRVNIVIPPIAIDGTSISIRKFKKQSIAFSDLVEFGAMSKEMAQILMVASRCRLNILISGGTGSGKTTMLNALSQFISEGERIV 227 V_vulnII1_6 ENDDISDIMINGPYDVFIEIGGKVEKSPIQFVNEKQLNTIAKRIASNVGRRIDESSPLCDARLKDGSRVNIVIPPLAIDGTSISIRKFKEQKIKLENLVEFGAMSIEMAKLLSIASHCKCNILISGGTGSGKTTLLNALSGFIGEGERVV 227 Y_pes EDDSISDIMVNGPERIFIERYGLLKLTDRRFVNNTQLTDIAKRLMQKVNRRIDEGRPLADARLIDGSRINVAISPIALDGTALSIRKFSKNKRRLEDLVDMGAMSSDMANFLIIAASCRVNIIISGGTGSGKTTLLNALSKYISEDERVI 229 Y_ent EDDSISDIMVNGPEKIFIERFGMITLTSRRFINNAQLTDIAKRLMQRANRRIDEGRPLADARLIDGSRINVAISPIALDGTVLSIRKFSNNKRKLEDLVEMGAMSSDMANFLIIAASCRVNIIISGGTGSGKTTLLNALSMYISENERVI 215 A_act ADDTINDILVNGPNDIWVERAGILEKTDKEFVSNEQLTDIAKRLVARVGRRIDDGSPLVDSRLPDGSRLNAVIAPIALDGTSISIRKFSKNKKTLQELVNFGSMTRNGE-FLNYCCRSRVNIIVSGGTGSGKTTLLNALSNYISHTERVI 227 H_aph ADDTINDILVNGPDDVWIERAGILEKTSKEFVSNEQLTDIAKRLVARVGRRIDDGSPLVDSRLPDGSRLNVVIAPIALDGTSVSIRKFSKNKKTLQELVNFGSMTREMANFLIIAARSRVNIIVSGGTGSGKTTLLNALSNYISHSERVI 228 P_mul EDETVNDILVNGPDDVWVERAGILEKTDKKFISNEQLTDIAKRLVAKVGRRIDDGSPLVDSRLPDGSRLNVVIAPIALDGTSISIRKFSKSKKSLQELVNFGSMTREMANFLIIAARSRVNIIVSGGTGSGKTTLLNALSNYISPKERVI 228 H_duc EDDTVNDILVNGPDNIWIERAGVLEKTNKTFINNEQLTDIAKRLVARVGRRIDEGMPLVDSRLPDGSRLNVVIQPIALDGTSISIRKFSKSKKSLQELVNFGSMTLDMANFLIIAARSRVNIIVSGGTGSGKTTLLNALSSYISPTERVL 228 A_pleur EDEGVNDILVNGPDNIWVERAGILEKTDKKFINNEQLTDIAKRLVARVGRRIDEGMPLVDSRLPDGSRLNVVIQPIALDGTSISIRKFSKSKKSLQDLVNYGSMTLDMANFLIIAARSRVNIIVSGGTGSGKTTLLNALSHYISHTERVL 228 V_vulnI8_11 EDPSVSDILVNGPEQIYIERQGKLLKTDIRFRDKKHLLNVAQRIVNAVGRRLDESTPLVDARLEDGSRVNIIAPPLALNGVCISIRKFPERQYDLPGLVAFGSLSEEMAQCLALAARCRLNILVSGGTGAGKTTLLNAMSTPISDDERII 239 V_vulnI6_11 HDKTVSDILVNGPKNIFVERRGKLEKTPYTFLDDRHLRNIIDRIVSQVGRRIDEASPMVDARLLDGSRVNAIIPPLALDGASVSIRRFAVDKLTMDNMLGYNSLSPQMAKFVEAAVKGELNILIAGGTGSGKTTTLNIFSGFIPSDDRII 300 ruler *:**:*** * :** :::***.. *.* :: :*** *:*****:**::** ** *:.:** ******:**:.*:***:* ** * * *. *. :. :* * **:.:::* *.**.*:: * : *:* : : :: V_fisch1 TIEDAAELKLLQPHVVRLETRNSGIEGNGAITQQDLVINALRMRPDRIIVGECRGGEAFQMLQAMNTGHDGSMSTLHANTPRDAMARVEAMVMMASNNLPLEAIRRTIVSAVDIVIQISRLHDGSRKVMSITEVIGLEGNNVVLEELYKF 377 V_fisch2 TIEDAAELKLQQPHVVRLETRTSGIEGTGVVSQRDLVINSLRMRPDRIIVGECRGGEAFEMLQAMNTGHDGSMSTLHANSPRDALSRVEAMVMMATNNLPLEAVRRTIVSAVDIVIQISRLHDGTRKVMSISEVVGLEGNNVVLEEIFAF 377 V_vulnII1_6 TIEDAAELQLQKPHIVRLETRQASVEGTGQITARDLVINALRMRPDRIIVGECRGAEAFEMLQAMNTGHDGSMSTLHANTPRDAIARTESMVMMATASLPLEAIRRTIVSAVDLIVQVRRLHDGSRKVMYISEIVGLEGNNVVMEDIFRF 377 Y_pes TLEDAAELNLEQPHVVRMETRLAGLENTGQITMRDLVINSLRMRPDRIIIGECRGEETFEMLQAMNTGHNGSMSTLHANTPRDAVARLESMIMMGPVNMPLITIRRNIASAINLIVQVSRMNDGSRKIRNISEIMGMEGEHVVLQDIFTF 379 Y_ent TLEDAAELNLEQPHVVRMETRLAGLENTGQITMRDLVINSLRMRPDRIIIGECRGEETFEMLQAMNTGHNGSMSTLHANTPRDAVARLESMIMMGPVNMPILTIRRNIASAINLIVQVSRMNDGSRKLSHISEIMGMEGDNVILQDIFSF 365 A_act TLEDTAELRLEQPHVVRLETRLAGVEHTGEVTMQDLVINALRMRPERIIVGECRGGEAFQMLQAMNTGHDGSMSTLHANSPRDATSRLESMVMMSNASLPLEAIRRNISSAVNIIVQASRLNDGSRKIMNITEVMGMENGQIVLQDMFSY 377 H_aph TLEDTAELRLEQPHVVRLETRLAGVEHTGEVTMKDLVINALRMRPERIIVGECRGGEAFQMLQAMNTGHDGSMSTLHANSPRDATSRLESMVMMSNATLPLEAIRRNIASAVNIIVQASRLNDGSRKIVNITEIMGMENGQIVLQDIFSY 378 P_mul TLEDTAELRLEQPHVVRLETRLAGVERTGEITMQDLVINALRMRPERIIVGECRGGEAFQMLQAMNTGHDGSMSTLHANSPRDATARLESMVMMSNASLPLEAIRRNIASAVNIIVQASRLNDGSRKIMNITELMGMENGQIVMQDIFSY 378 H_duc TLEDTAELRLEQPHVVRLETRLAGVERTGEITMQDLVINALRMRPERIIVGECRGAEAFQMLQAMNTGHDGSMSTLHANTPRDATARLESMVMMSNASLPLEAIRRNIASAVNIIIQASRLNDGSRKVMNITEVMGMENGQIVLQDIFSF 378 A_pleur TLEDTAELRLEQPHVVRLETRLAGVERTGEISMQDLVINALRMRPERIIVGECRGAEAFQMLQAMNTGHDGSMSTLHANSPRDALARLESMVMMSNASLPLEAIRRNIASAVNIIIQASRLNDGSRKVTNITEVMGMENGQIVLQDIFSY 378 V_vulnI8_11 TIEDAAELSLTQPHWIQLETRTASSEGTGAVTVRDLVKNALRMRPDRIILGEVRGAEAFDMLQAMNTGHDGSLCTLHANSPADAMLRLENMLMMGAEQIPSAVLRQQISSALDLVVQLERSHDGKRRVTAISAVGGIEQGQIVVHPLFEC 389 V_vulnI6_11 TIEDSAELQLQQPHVVRLETRPPNLEGKGEITQRDLVKNALRMRPDRIVLGEVRGAEAVDMLAAMNTGHDGSLATIHANTPRDALSRVENMFAMAGWNISTKNLRAQIASAIHLVVQMERQEDGKRRMVSIQEINGMEGEIITMSEIFHF 450 ruler

10 Mutation Rate r = K/2T r = rate of substitution K = number of substitutions per site T = divergence time rate of substitution = number of substitutions per site / 2 x divergence time When substitutions are common, a particular site may have undergone multiple changes. Thus, alignments between sequences with many differences will underestimate the true number of substitutions that has occurred. The true number of substitutions can be estimated by K = -3/4 ln [1 -(4/3)(p)], where p is the fraction of nucleotides that differ between the two sequences.

11 Constraints on Mutations Transitions vs. Transversions Transitions (exchanging one purine (A or G) for another purine (G or A), or one pyrimidine (C, T or U) for another pyrimidine (U, T or C)) are three times as common as transversions (purine for pyrimidine or vice versa). Functional Constraints Functional constraints in coding or regulatory regions also impact the rate of change. Divergence Among Human, Mouse, Rabbit and Cow Globin Genes by Region Noncoding 3.33 substitutions/site/10 9 years Coding 1.58 substitutions/site/10 9 years 5' untranslated 1.86 substitutions/site/10 9 years 3' untranslated 3.00 substitutions/site/10 9 years

12 Synonymous vs. Nonsynonymous Substitutions Nondegenerate Sites Codon positions where any nucleotide mutation would cause a change in the amino acid (a nonsynonymous substitution). Example: Phenylalanine (UUU) Twofold Degenerate Sites Codon positions where one nucleotide mutation would not cause a change in the amino acid (but the two other possible mutations would). Example: Aspartic acid third codon position (GAU, GAC) Fourfold Degenerate Sites Codon positions where any nucleotide mutation would not cause a change in the amino acid (a synonymous substitution). Example: Glycine third codon position (GGG, GGA, GGU, GGC). Divergence Among Human and Rabbit Globin Genes by Sites Nondegenerate 0.56 substitutions/site/10 9 years Twofold Degenerate 1.67 substitutions/site/10 9 years Fourfold Degenerate 2.35 substitutions/site/10 9 years vs. Noncoding 3.33 substitutions/site/10 9 years Coding 1.58 substitutions/site/10 9 years

13 Variations in Evolutionary Rates Variations in Substitution Rates Substitution rates do not appear to be constant within even genomes of closely related species, and also vary from species to species. Relative Rate Test It is possible to estimate the overall rate of substitution in different lineages without knowing the exact divergence time by determining a relative rate of substitution. To compare the relative rate of substitution in species A and species B, one designates a less related species, C, as an outgroup. This allows you to estimate the amount of divergence that has taken place in species A and B since they last shared a common ancestor. Relative Rates of Synonymous Substitutions in Genes Mammalian Fibrinopeptide 4 Mammalian Hemoglobin 1 Mammalian Cytochrome c 0.2 Influenza NS 1,000,000 Relative Rates of Synonymous Substitution in Genomes Human genome 1 Plant genome 1 Mouse genome 2 Rat genome 2 Human mitochondria 10 Plant chloroplast 0.3

14 A Brief History of Phylogeny 1735 Taxonomy Karl von Linné 1750 Phenetic Taxonomy Michel Adanson 1859 Evolution Charles Darwin 1866 Phylogeny Ernst Haeckel 1950 Cladistic Taxonomy Willi Hennig [in time] we shall have very fairly true genealogical trees of each great kingdom of nature Charles Darwin, 1857, letter to T. H. Huxley The History of the Germ in an epitome of the History of the Descent, or, in other words: that Ontegeny is a recapitulation of Phylogeny Ernst Haeckel, 1897, The Evolution of Man The universal phylogenetic tree not only spans all extant life, but its root and earliest branchings represent stages in the evolutionary process before modern cell types had come into being. The evolution of the cell is an interplay between vertically derived and horizontally acquired variation. Primitive cellular entities were necessarily simpler and more modular in design than are modern cells. Consequently, horizontal gene transfer early on was pervasive, dominating the evolutionary dynamic. The root of the universal phylogenetic tree represents the first stage in cellular evolution when the evolving cell became sufficiently integrated and stable to the erosive effects of horizontal gene transfer that true organismal lineages could exist. Carl Woese, 2000, Interpreting the Universal Phylogenetic Tree

15 Taxonomy Taxonomy is the classification of organisms into an ordered system that indicates natural relationships. Karl von Linné, a.k.a. Caroli Linnaei or Linnaeus ( ), invented modern taxonomy by developing a hierarchy of taxa (kingdom, class, order, genus, and species) and a system of binomial nomenclature based on genus and a characteristic feature of a species.

16 Phylogeny Phylogeny is the sequence of events involved in the evolutionary development of a species or taxonomic group. Ernest Haeckel ( ), was originally trained as a physician, but devoted himself to the study of evolution after reading Darwin s Origin of Species. He was made famous by his own phrase ontogeny recapitulates phylogeny. He coined the term phylogeny and created the first phylogenetic trees.

17 Phenetic vs. Cladistic Approaches Phylogenetic Reconstruction Phylogenetic reconstruction attempts to estimate the phylogeny for some data. Any collection of sequences will share some ancestral relationship, and the data within the sequences contains information that can be used to reconstruct or infer these ancestral relationships. A phylogenetic tree is a branching structure which illustrates the relationships between the sequences. Nearly any approach currently used in phylogenetic reconstruction has adherents and detractors. At the moment, there is some disagreement as to best practices and principles for phylogenetic analysis. Phenetic Approach Phenetic taxonomy was invented in 1750 by Michel Adanson. In the phenetic approach, a tree is constructed by considering the phenotypic similarities of the species without trying to understand the evolutionary pathways of the species, and thus may or not be the correct phylogeny. Trees constructed by this method are called phenograms or dendrograms. Cladistic Approach Cladistic taxonomy was invented by the German entomologist Willi Hennig in It involves the rigorous application of the concept of evolution to taxonomy. Taxa are defined by what distinctive features their members have, not what features they share with others. In the cladistic approach, a phylogentic tree is reconstructed by considering the various possible pathways of evolution and choosing from amongst these the best possible tree, that is, the tree that involves the fewest changes, and thus the least amount of convergent evolution. Trees reconstructed by this method are called cladograms.

18 Phylogenetic Trees Rooted Trees In a rooted tree, a single node is designated as a common ancestor, and a unique path leads from it through evolutionary time to all other nodes. It thus provides information about the common ancestry of sequences and the direction of evolution, and is the most common type of tree used to study evolutionary relationships. Rooted Tree with Scaled Branches Unrooted Trees Unrooted trees specify only the relationship between nodes, and nothing about the direction in which evolution occurred. A root can be assigned to an unrooted tree through the use of an outgroup, for example a species that unambiguously previously separated from the other species being compared (e.g. baboon, when comparing humans and gorillas). Source: Krane & Raymer, Fundamental Concepts of Bioinformatics, NCBI

19 Tree Topology Operational Taxonomic Unit (OTU) This corresponds to the terminal nodes of a phylogenetic tree (also known as leaves, tips or external nodes). They represent the genes, organisms, families, species or populations, as appropriate, for which you have data. Internal Node This corresponds to points within a phylogenetic tree where interior branches meet (also known as vertices). These represent inferred ancestors. Outgroup An OTU or taxa included for the purpose of rooting a tree.

20 Rooted Tree Reconstruction The possible number of unrooted trees is one step less (i.e. 5 species or OTUs 15 trees, still an enormous number with many species or OTUs). The number of possible trees for n OTUs can be estimated by (2n-3)!/(2n-2(n-2)!) for bifurcating rooted trees and (2n-5)!/(2n-3(n-3)!) for bifurcating unrooted trees (Brian Golding, Reconstructing Phylogenies).

21 Phylogenetic Terminology Homologs Genes with a common ancestral sequence. They may have been separated by speciation (orthologs) or duplication (paralogs). Orthologs Homologous genes in different species that arose from a common ancestor. They tend to have similar structure and function. Paralogs Similar genes within a single species that are the result of a gene duplication. They tend to have different but related functions. Xenologs Genes acquired by horizontal transfer between species, typically mediated by a plasmid, transposable element, or virus. Symplesiomorphy Having characters that are both derived from a common ancestor and uniquely shared by a group. This is essential to clearly establishing a phylogeny. Having only derived or shared characters is not sufficient to establish a phylogeny. Homoplasies Convergences of a particular character at a particular site. These typically pose the most difficulty in attempting to reconstruct the ancestral phylogenetic tree.

22 Phylogenetic Tree Terminology Monophyletic A group descended from a single common ancestor that contains only and all descendants from that ancestor. Paraphyletic A group descended from a single common ancestor that does not contain all the descendants from that ancestor. Polyphyletic A group whose members are not descended from a single common ancestor. Gene Tree A phylogenetic tree based on divergence observed within a single homologous gene in different species. It may accurately represent the evolutionary history of that gene, but not necessarily of the species. Species trees are best based on the comparison of numerous genes. Bootstrapping A method for checking the robustness of a given phylogentic tree by checking whether every portion of the alignment equally supports the structure of the tree. Newick Tree Format A common text file format for representing simple phylogentic trees in a set of nested parenthesis, i.e. (B,(A,C,E),D); or with branch lengths included, (B:6.0,(A:5.0,C: 7.0,E:4.0):4.0,D:10.0);

23 Distance Matrix Methods Distance Method Distance based methods attempt to construct trees based on measures of distance between OTUs (i.e. genes or species). In contrast, character based methods evalute particular features (i.e. DNA sequence, amino acid sequence, # of legs, etc.). Unweighted-Pair-Group Method with Arithmetic Mean (UPGMA) A clustering algorithm which constructs a distance matrix, then clusters together the least distant pair of Operational Taxonomic Units (OTUs), followed by successively more distant OTUs. At each step of the algorithm, the number of OTUs declines by one, replaced by a joint OTU, from which subsequent distances from other OTUs are calculated, until the algorithm finishes by clustering the last pair of OTUs. This method assumes that the rate of evolutionary change between all branches of the tree is the same, which is generally not a valid assumption. In nature, examples of rates of evolution varying between taxa are common. As a result, corrections to this assumption are often used with this approach. Neighbor Joining Method This attempts to correct for the assumption made by UPGMA that the same rate of of evolutionary change applies to all branches of the tree. It is otherwise similar to UPGMA, but generally gives better results. It yields an unrooted tree. Fitch and Margoliash This method attempts to find an optimal tree of minimal distance. It yields an unrooted tree.

24 Maximum Parsimony Methods Maximum Parsimony The maximum parsimony method involves evaluating as many trees as possible, giving each a score that is used to choose between different trees. The highest scoring, or most parsimonious tree is the one with the minimum number of evolutionary changes. A number of different methods can be used to calculate scoring. Fitch Parsimony For a particular tree, traverse from the leaves toward the root of the tree. At each internal node, determine the set of possible states (i.e. nucleotides). Then, traverse the tree from the root towards the leaves, picking ancestral states for each internal node to minimize the number of changes required. The Fitch algorithm assumes position independence, and that any state is equally likely to change to any other state. Variations which weight the costs of changes differently exist. Dollo Parsimony Assumes that derived states are irreversible, that is, a derived character state cannot be lost and then regained. Hence, the state can evolve and be lost many times throughout evolution, but cannot be inferred to have evolved twice. The tree with maximum parsimony is the one in which derived characters have been lost the fewest number of times. This method has been used with restriction fragment length polymorphism (RFLP) data, since restriction sites are difficult to gain, but easy to lose. It may be more useful when dealing with non-sequence data, for example, complex phenotypes, which are unlikely to have evolved more than once. Source: Brian Golding, Reconstructing Phylogenies

25 The Principle of Parsimony Occam s Razor Pluralitas non est ponenda sine necessitas (Do not increase the number of entities required to explain anything beyond what is strictly necessary) William of Occam (or Ockham) ( ) Requires less changes than its neighbor These two trees are equally parsimonious

26 Other Methods Maximum Likelihood The method of maximum likelihood attempts to reconstruct a phylogeny using an explicit model of evolution. It specifies values for the likelihood of a given trait evolving within a lineage, and chooses the most likely tree, given these values. It attempts to predict the most likely interior nodes given the OTUs, then the most likely tree. Theoretically, this may be the most powerful method available. For a given model of evolution, no other method will perform as well nor provide you with as much information about the tree. Unfortunately, this is computationally difficult to do and hence, the model of evolution must be a simple one. Even with simple models of evolutionary change the computational task is enormous and this is the slowest of all methods. Compatibility Compatibility methods recode data involving multi-state characters to include knowledge of the ancestral states of characters, and from this determine what changes are compatible. Compatibility methods are more accurate when there are slow rates of evolutionary change. Both compatibility and parsimony assume that homoplasies will be rare. Source: Brian Golding, Reconstructing Phylogenies

27 Rules of Thumb for Phylogeny Use more than one method. Each one will provide a phylogenetic history biased by that method s assumptions. Bootstrap or jackknife your data to test the quality of your tree. When bootstrapping, use at least several hundred iterations of resampling and tree generation. Run your analysis with different subsets of taxa to see if the trees thus generated are congruent. Dropping a single OTU should not dramatically change your tree. Treat long branches with caution. They tend to attract each other. Beware of non-orthologous genes, horizontal gene transfers, or recombinant sequences. Standard pylogenetic methods do not handle them well. When using outgroups, consider including more than one outgroup taxa, and choose outgroup species that are evenly spaced on the tree. Including intermediate taxa can help resolve even the relationship of a few taxa. When the number of substitutions per site is unusually high or low, distance methods may perform better than parsimony methods. If you expect homoplasies to be scattered at random throughout the sequence data, then a parsimony method will perform best. If homoplasies are expected to be concentrated in a few characters, whose identities are known in advance, then compatibility will perform better than parsimony.

28 Phylogenetic Software Packages PHYLIP The leading free package for phylogeny. It includes programs to carry out parsimony, distance matrix methods, maximum likelihood, and other methods on a variety of types of data, including DNA and RNA sequences, protein sequences, restriction sites, 0/1 discrete characters data, gene frequencies, continuous characters and distance matrices. Although it is free, it can be complex to use. Phylodendron A free web-based tree drawing program with a simple user interface and many output options. ClustalX A free multiple sequence alignment program that includes the ability to create phylogenetic trees based on the Neighbor Joining Method. PAUP The leading commercial package for phylogeny. It includes parsimony, distance matrix, invariants, and maximum likelihood methods and many indices and statistical tests. and MacClade A commercial package for interactive analysis of evolution of a variety of character types, including discrete characters and molecular sequence. It works well with PAUP.

G4120: Introduction to Computational Biology

G4120: Introduction to Computational Biology ICB Fall 2003 G4120: Introduction to Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology Copyright 2003 Oliver Jovanovic, All Rights Reserved. Bioinformatics and

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

Phylogenetic Analysis

Phylogenetic Analysis Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them? Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them? Carolus Linneaus:Systema Naturae (1735) Swedish botanist &

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and

More information

How should we organize the diversity of animal life?

How should we organize the diversity of animal life? How should we organize the diversity of animal life? The difference between Taxonomy Linneaus, and Cladistics Darwin What are phylogenies? How do we read them? How do we estimate them? Classification (Taxonomy)

More information

Phylogeny: building the tree of life

Phylogeny: building the tree of life Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

More information

Phylogenetic analyses. Kirsi Kostamo

Phylogenetic analyses. Kirsi Kostamo Phylogenetic analyses Kirsi Kostamo The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species,

More information

PHYLOGENY AND SYSTEMATICS

PHYLOGENY AND SYSTEMATICS AP BIOLOGY EVOLUTION/HEREDITY UNIT Unit 1 Part 11 Chapter 26 Activity #15 NAME DATE PERIOD PHYLOGENY AND SYSTEMATICS PHYLOGENY Evolutionary history of species or group of related species SYSTEMATICS Study

More information

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.

More information

Phylogenetic trees 07/10/13

Phylogenetic trees 07/10/13 Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share

More information

PHYLOGENY & THE TREE OF LIFE

PHYLOGENY & THE TREE OF LIFE PHYLOGENY & THE TREE OF LIFE PREFACE In this powerpoint we learn how biologists distinguish and categorize the millions of species on earth. Early we looked at the process of evolution here we look at

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

The practice of naming and classifying organisms is called taxonomy.

The practice of naming and classifying organisms is called taxonomy. Chapter 18 Key Idea: Biologists use taxonomic systems to organize their knowledge of organisms. These systems attempt to provide consistent ways to name and categorize organisms. The practice of naming

More information

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the

More information

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço jcarrico@fm.ul.pt Charles Darwin (1809-1882) Charles Darwin s tree of life in Notebook B, 1837-1838 Ernst Haeckel (1934-1919)

More information

--Therefore, congruence among all postulated homologies provides a test of any single character in question [the central epistemological advance].

--Therefore, congruence among all postulated homologies provides a test of any single character in question [the central epistemological advance]. Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008 University of California, Berkeley B.D. Mishler Jan. 29, 2008. The Hennig Principle: Homology, Synapomorphy, Rooting issues The fundamental

More information

Reconstructing the history of lineages

Reconstructing the history of lineages Reconstructing the history of lineages Class outline Systematics Phylogenetic systematics Phylogenetic trees and maps Class outline Definitions Systematics Phylogenetic systematics/cladistics Systematics

More information

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise Bot 421/521 PHYLOGENETIC ANALYSIS I. Origins A. Hennig 1950 (German edition) Phylogenetic Systematics 1966 B. Zimmerman (Germany, 1930 s) C. Wagner (Michigan, 1920-2000) II. Characters and character states

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions

More information

Classification and Phylogeny

Classification and Phylogeny Classification and Phylogeny The diversity of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize without a scheme

More information

Multiple Sequence Alignment. Sequences

Multiple Sequence Alignment. Sequences Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe

More information

C.DARWIN ( )

C.DARWIN ( ) C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships

More information

Classification and Phylogeny

Classification and Phylogeny Classification and Phylogeny The diversity it of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize without a scheme

More information

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004, Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

Chapter 26 Phylogeny and the Tree of Life

Chapter 26 Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life Chapter focus Shifting from the process of how evolution works to the pattern evolution produces over time. Phylogeny Phylon = tribe, geny = genesis or origin

More information

Chapter 26: Phylogeny and the Tree of Life

Chapter 26: Phylogeny and the Tree of Life Chapter 26: Phylogeny and the Tree of Life 1. Key Concepts Pertaining to Phylogeny 2. Determining Phylogenies 3. Evolutionary History Revealed in Genomes 1. Key Concepts Pertaining to Phylogeny PHYLOGENY

More information

Lecture 11 Friday, October 21, 2011

Lecture 11 Friday, October 21, 2011 Lecture 11 Friday, October 21, 2011 Phylogenetic tree (phylogeny) Darwin and classification: In the Origin, Darwin said that descent from a common ancestral species could explain why the Linnaean system

More information

ELE4120 Bioinformatics Tutorial 8

ELE4120 Bioinformatics Tutorial 8 ELE4120 ioinformatics Tutorial 8 ontent lassifying Organisms Systematics and Speciation Taxonomy and phylogenetics Phenetics versus cladistics Phylogenetic trees iological classification Goal: To develop

More information

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses Anatomy of a tree outgroup: an early branching relative of the interest groups sister taxa: taxa derived from the same recent ancestor polytomy: >2 taxa emerge from a node Anatomy of a tree clade is group

More information

Chapter 26 Phylogeny and the Tree of Life

Chapter 26 Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life Biologists estimate that there are about 5 to 100 million species of organisms living on Earth today. Evidence from morphological, biochemical, and gene sequence

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018 CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

More information

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D 7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood

More information

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26 Phylogeny Chapter 26 Taxonomy Taxonomy: ordered division of organisms into categories based on a set of characteristics used to assess similarities and differences Carolus Linnaeus developed binomial nomenclature,

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

Phylogeny and the Tree of Life

Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Introduction to Biosystematics - Zool 575

Introduction to Biosystematics - Zool 575 Introduction to Biosystematics Lecture 10 - Introduction to Phylogenetics 1. Pre Lamarck, Pre Darwin Classification without phylogeny 2. Lamarck & Darwin to Hennig (et al.) Classification with phylogeny

More information

Classification, Phylogeny yand Evolutionary History

Classification, Phylogeny yand Evolutionary History Classification, Phylogeny yand Evolutionary History The diversity of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize

More information

Microbial Taxonomy and the Evolution of Diversity

Microbial Taxonomy and the Evolution of Diversity 19 Microbial Taxonomy and the Evolution of Diversity Copyright McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display. 1 Taxonomy Introduction to Microbial Taxonomy

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

Lecture 6 Phylogenetic Inference

Lecture 6 Phylogenetic Inference Lecture 6 Phylogenetic Inference From Darwin s notebook in 1837 Charles Darwin Willi Hennig From The Origin in 1859 Cladistics Phylogenetic inference Willi Hennig, Cladistics 1. Clade, Monophyletic group,

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

Introduction to Bioinformatics Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Dr. rer. nat. Gong Jing Cancer Research Center Medicine School of Shandong University 2012.11.09 1 Chapter 4 Phylogenetic Tree 2 Phylogeny Evidence from morphological ( 形态学的 ), biochemical, and gene sequence

More information

Lecture V Phylogeny and Systematics Dr. Kopeny

Lecture V Phylogeny and Systematics Dr. Kopeny Delivered 1/30 and 2/1 Lecture V Phylogeny and Systematics Dr. Kopeny Lecture V How to Determine Evolutionary Relationships: Concepts in Phylogeny and Systematics Textbook Reading: pp 425-433, 435-437

More information

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B Microbial Diversity and Assessment (II) Spring, 007 Guangyi Wang, Ph.D. POST03B guangyi@hawaii.edu http://www.soest.hawaii.edu/marinefungi/ocn403webpage.htm General introduction and overview Taxonomy [Greek

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Introduction to characters and parsimony analysis

Introduction to characters and parsimony analysis Introduction to characters and parsimony analysis Genetic Relationships Genetic relationships exist between individuals within populations These include ancestordescendent relationships and more indirect

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

Chapter 19. Microbial Taxonomy

Chapter 19. Microbial Taxonomy Chapter 19 Microbial Taxonomy 12-17-2008 Taxonomy science of biological classification consists of three separate but interrelated parts classification arrangement of organisms into groups (taxa; s.,taxon)

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics CS5263 Bioinformatics Guest Lecture Part II Phylogenetics Up to now we have focused on finding similarities, now we start focusing on differences (dissimilarities leading to distance measures). Identifying

More information

Unit 9: Evolution Guided Reading Questions (80 pts total)

Unit 9: Evolution Guided Reading Questions (80 pts total) Name: AP Biology Biology, Campbell and Reece, 7th Edition Adapted from chapter reading guides originally created by Lynn Miriello Unit 9: Evolution Guided Reading Questions (80 pts total) Chapter 22 Descent

More information

Michael Yaffe Lecture #4 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Michael Yaffe Lecture #4 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D 7.91 Lecture #4 Database Searching & Molecular Phylogenetics Michael Yaffe A B C D A B C D (((A,B)C)D) Outline FASTA, Blast searching, Smith-Waterman Psi-Blast Review of enomic DNA structure Substitution

More information

Chapter 27: Evolutionary Genetics

Chapter 27: Evolutionary Genetics Chapter 27: Evolutionary Genetics Student Learning Objectives Upon completion of this chapter you should be able to: 1. Understand what the term species means to biology. 2. Recognize the various patterns

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT 5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT.03.239 03.10.2012 ALIGNMENT Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity. Homology:

More information

Phylogeny and the Tree of Life

Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Chapter 10. Classification and Phylogeny of Animals. Order in Diversity. Hierarchy of taxa. Table Linnaeus introduced binomial nomenclature

Chapter 10. Classification and Phylogeny of Animals. Order in Diversity. Hierarchy of taxa. Table Linnaeus introduced binomial nomenclature Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 10 Classification and Phylogeny of Animals Order in Diversity History Systematic zoologists have three

More information

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS. !! www.clutchprep.com CONCEPT: OVERVIEW OF EVOLUTION Evolution is a process through which variation in individuals makes it more likely for them to survive and reproduce There are principles to the theory

More information

Multiple Alignment. Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis

Multiple Alignment. Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis gorm@cbs.dtu.dk Refresher: pairwise alignments 43.2% identity; Global alignment score: 374 10 20

More information

Fig. 26.7a. Biodiversity. 1. Course Outline Outcomes Instructors Text Grading. 2. Course Syllabus. Fig. 26.7b Table

Fig. 26.7a. Biodiversity. 1. Course Outline Outcomes Instructors Text Grading. 2. Course Syllabus. Fig. 26.7b Table Fig. 26.7a Biodiversity 1. Course Outline Outcomes Instructors Text Grading 2. Course Syllabus Fig. 26.7b Table 26.2-1 1 Table 26.2-2 Outline: Systematics and the Phylogenetic Revolution I. Naming and

More information

Phylogeny and the Tree of Life

Phylogeny and the Tree of Life LECTURE PRESENTATIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry, Michael L. Cain, Steven A. Wasserman, Peter V. Minorsky, Robert B. Jackson Chapter 26 Phylogeny and the Tree of Life

More information

Cladistics and Bioinformatics Questions 2013

Cladistics and Bioinformatics Questions 2013 AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species

More information

Need for systematics. Applications of systematics. Linnaeus plus Darwin. Approaches in systematics. Principles of cladistics

Need for systematics. Applications of systematics. Linnaeus plus Darwin. Approaches in systematics. Principles of cladistics Topics Need for systematics Applications of systematics Linnaeus plus Darwin Approaches in systematics Principles of cladistics Systematics pp. 474-475. Systematics - Study of diversity and evolutionary

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information

ESS 345 Ichthyology. Systematic Ichthyology Part II Not in Book

ESS 345 Ichthyology. Systematic Ichthyology Part II Not in Book ESS 345 Ichthyology Systematic Ichthyology Part II Not in Book Thought for today: Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else,

More information