Phylogeny. Information. ARB-Workshop 14/ CEH Oxford. Molecular Markers. Phylogeny The Backbone of Biology. Why? Zuckerkandl and Pauling 1965

Similar documents
9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)


Constructing Evolutionary/Phylogenetic Trees

Phylogenetic Tree Reconstruction

Phylogeny Tree Algorithms

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Dr. Amira A. AL-Hosary

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Constructing Evolutionary/Phylogenetic Trees

Phylogenetic inference

Theory of Evolution. Charles Darwin

Phylogeny. Properties of Trees. Properties of Trees. Trees represent the order of branching only. Phylogeny: Taxon: a unit of classification

Evolutionary Tree Analysis. Overview

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Inferring Molecular Phylogeny

How to read and make phylogenetic trees Zuzana Starostová

Theory of Evolution Charles Darwin

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetics. BIOL 7711 Computational Bioscience

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Phylogenetics: Building Phylogenetic Trees

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

Algorithms in Bioinformatics

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides

Seuqence Analysis '17--lecture 10. Trees types of trees Newick notation UPGMA Fitch Margoliash Distance vs Parsimony

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Molecular Evolution & Phylogenetics

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

What is Phylogenetics

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

EVOLUTIONARY DISTANCES

Phylogenetic analyses. Kirsi Kostamo

BINF6201/8201. Molecular phylogenetic methods

A (short) introduction to phylogenetics

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Letter to the Editor. Department of Biology, Arizona State University

Phylogenetics Todd Vision Spring Some applications. Uncultured microbial diversity

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Sequence Alignment (chapter 6)

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Consistency Index (CI)

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

Effects of Gap Open and Gap Extension Penalties

Phylogenetic trees 07/10/13

Finding the best tree by heuristic search

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Principles of Phylogeny Reconstruction How do we reconstruct the tree of life? Basic Terminology. Looking at Trees. Basic Terminology.

Phylogeny. November 7, 2017

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

Concepts and Methods in Molecular Divergence Time Estimation

Introduction to Bioinformatics

Phylogeny: building the tree of life

8/23/2014. Phylogeny and the Tree of Life

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Inferring Molecular Phylogeny

A phylogenetic view on RNA structure evolution

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Consensus Methods. * You are only responsible for the first two

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Midterm Exam #1. MB 451 Microbial Diversity. Honor pledge: I have neither given nor received unauthorized aid on this test.

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Phylogenetic inference: from sequences to trees

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

C.DARWIN ( )

X X (2) X Pr(X = x θ) (3)

Phylogenetics: Parsimony

A Fitness Distance Correlation Measure for Evolutionary Trees

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Algorithms in Bioinformatics

Outline. Sequence-comparison methods. Buzzzzzzzz. Why compare sequences? Gerard Kleywegt Uppsala University

Multiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas

Molecular Evolution, course # Final Exam, May 3, 2006

Molecular Evolution and Phylogenetic Tree Reconstruction

Cladistics and Bioinformatics Questions 2013

Copyright notice. Molecular Phylogeny and Evolution. Goals of the lecture. Introduction. Introduction. December 15, 2008

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

Estimating Evolutionary Trees. Phylogenetic Methods

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Transcription:

Frank Oliver löckner Information Phylogeny Who are we: Dr. Frank Oliver löckner Dr. Jörg Peplies Max Planck Institute for Marine Microbiology Microbial enomics roup Bremen, ermany ontact: arb@mpibremen.de Mailinglist: arb_users@yahoogroups.com RBWorkshop 4/5.07.004 EH Oxford Where can you find additional information: www.arbhome.de ftp.mpibremen.de/molecol_p/arb > all files needed to install RB can be found in the EH_Oxford folder Frank Oliver löckner Phylogeny he Backbone of Biology Why? o track back the origin of organisms o unravel evolutionary relationships o sort and classify organisms Molecular Markers Zuckerkandl and Pauling 965 Use macromolecules as molecular clocks DN/RN Proteins How? Botany and Zoology Morphology Fossils Microbiology Molecular markers Problem: Species vs. enephylogeny Lateral gene transfer enome plasticity/patchwork Orthologous/paralogous genes Frank Oliver löckner Frank Oliver löckner 4 Universal ree Homology Definition Homology wo sequences are homolog when they evolved from a common ancestor sequence Homology can not be quantified! Sequences are homolog or not! Orthologous genes Direct common ancestor Paralogous genes Originates from a gene duplication Doolittle, Science 999, 84:48 Frank Oliver löckner 5 Frank Oliver löckner 6

Frank Oliver löckner 7 Orthologs/Paralogs species Phylogenetic Markers geneduplication speciation speciation 4 4 species B species species B 6S rrn S rrn Elongationfactors EFu EF PSynthase Reg Hsp60 RNPolymerase yrase Housekeeping enes ranscription ranslation Frank Oliver löckner 8 rrn as Phylogenetic Marker dvantages Functional constancy Ubiquitous distribution Large size (information content) onserved and highly variable structural elements No lateral gene transfer Drawbacks No continuous sequence change Multiple genes/operons Different species with identical 6S rrns One base change needs nearly one million years Steps in Phylogenetic nalysis Sequence determination lignment Data analysis Phylogenetic reconstruction Frank Oliver löckner 9 Frank Oliver löckner 0 Sequence Determination utomatic sequencers BI Prism 77 (gel) BI Prism 00 (6 capillary) BI Prism 700 (96 capillary) Megabase 500 (48 capillary) Megabase 000 (96 capillary) he European Database on Ribosomal RN Maintenance Department of Biochemistry, University ntwerpen Services SSU, LSU sequences, annotations lignments, secondary structures, variability maps WWW interface for sequence retrieval Software for alignment and tree reconstruction ontent (aligned sequences) Release September 00 0,85 SSU sequences,400 LSU sequences www.psb.ugent.be/rrn/ Frank Oliver löckner Frank Oliver löckner

Frank Oliver löckner RDPII Ribosomal Database Project RB Software Environment for Sequence Data http://rdp.cme.msu.edu Maintenance enter for Microbial Ecology, Michigan State University Services SSU, LSU sequences, annotations lignments Phylogenetic trees nalysis services via WWW server ontent (aligned sequences) RDP Preview Release from 05/05/004 97,8 SSU sequences 7 LSU sequences Maintenance Department for Microbiology, echnical University Munich Services SSU, LSU sequences, annotations lignments, Phylogenetic trees Probe design, Probe match Software suite RB ontent (aligned sequences) Prerelease July 04 59,609 SSU sequences 698 LSU sequences www.arbhome.de Frank Oliver löckner 4 lignment Problem Variable Regions lign the sequences in a way, that homologous bases will stand one below the other in a column Frank Oliver löckner 5 Frank Oliver löckner 6 he 6S Secondary Structure RBEdit proteins 6S rrn 0S subunits 70S ribosome 50S 4 proteins 5S rrn S rrn Escherichia coli 6S rrn primary and secondarystructure Frank Oliver löckner 7 Frank Oliver löckner 8

Frank Oliver löckner 9 Secondary Structures Secondary Structures UUUUUUUU UUUUUUUUU UUUUUU Escherichia coli Secondary structure information UUUUUUUU UUUUUUUUU UUUUUU Mycoplasma hypopneumoniae Streptococcus oralis Frank Oliver löckner 0 SSU Secondary structure Data nalysis Information ontent Size (E.coli) Information (bits) Similarity 6S rrn 54 n 084 >67% S rrn 904 n 5808 >67% EFu 94 aa 706 >60% Pase β subunit 460 aa 99 >6% onserved Variable Information (variable) Information (real) 568 974 948 506 65 89 4578 8 47 656 59 555 79 Ludwig and Klenk, Bergeys Frank Oliver löckner Frank Oliver löckner Information ontent 6S rrn haracters No % 568 7 58 09 4 4 407 6 5 6 7 8 9 0 4 Phylogeny, 5 Oxford 004 S rrn EFu Pase βsub. No % No % No % 94 84 4 65 79 0 8 8 507 8 46 58 848 9 49 4 9 8 7 4 9 6 4 5 0 5 5 6 6 4 5 9 4 4 0.8 9 0.8 5 0. 4 Oliver 0. 0 0 Frank löckner Models of Evolution Models of substitution rates between bases ransition >, >, >, > ransversion >, >, >, > and reverse minoacids: PM and BLOSUM matrices Base frequencies Models of amongsite substitution rate heterogeneity Weighting particular sites according to relative mutation frequencies (position variability) Frank Oliver löckner 4

Frank Oliver löckner 5 Models of Evolution Jukesantor model ll substitution types and base frequencies are presumed equal ime reversible Kimura parameter model ransitions are more likely than transversions Equal base frequencies ime reversible Substitution Models reeing methods Maximum Parsimony Fixed costs matrices Distance Matrix and Maximum Likelihood eneral model of sequence evolution Not addressed Lineagespecific substitutions Different rates of evolution between lineages (a+a+a) a4 a7 a0 a (a4+a5+a6) a8 a a a5 (a7+a8+a9) a a a6 a9 (a0+a+a) a = relative rate between the different substitutions x frequency of target base Frank Oliver löckner Swofford, Book (Hillis), 996, p. 4 6 Models of Evolution eneral matrix Models of Evolution eneral ime Reversal (R) Jukes antor Kimura s parameter model (KP) Frank Oliver löckner Swofford, Book (Hillis), 996, p. 4 7 Phylogeny, Oxford Swofford, 004 Book (Hillis), 996, p. 44 Frank Oliver löckner 8 reeing Methods lassification Inferring a phylogeny is really an estimation procedure; we are making a best estimate of an evolutionary history based on incomplete information Swofford, 990 Distancebased ompute pairwise distances and use them to derive the tree haracterbased Work directly on each character of the data. Derive trees that optimize the distribution of the actual data pattern for each character Maximum Parsimony, Maximum Likelihood lgorithmbased enerate a tree according to a series of steps (e.g. neighbor joining) riterionbased Evaluation of alternative trees according to some optimization functions Frank Oliver löckner 9 Frank Oliver löckner 0

Frank Oliver löckner he Most ommon Methods for ree Reconstruction Distance Matrix alculation of distance matrices by binary comparison of the aligned sequences UPM or Neighbor Joining Maximum Parsimony Preservation is more likely than change Search for topologies that minimize the total tree length assuming a minimum number of base changes Maximum Likelihood Searches for the evolutionary model, including the tree itself, that has the highest likelihood of producing the observed data Models: transition/transversion; base frequencies; positional variability Definitions peripheral branch internal branch Radial tree central branch terminal nodes/tips links/edges internal nodes Dendrogram Unrooted tree: the location of the common ancestor is not specified Frank Oliver löckner Distance Matrix Ultrametric Data Distance Matrix Non Ultrametric Data UPM Unweighted Pair roup Method with rithmetic Mean 0.6 0.7 Frank Oliver löckner Frank Oliver löckner 4 Distance Matrix dditive rees Example additive trees: FitchMargoliash algorithm alculate the matrix Find the most closely related pair of sequences and link it by an internal node Link the next related sequence with an internal node alculate branch length B 9 B 4 9 4 0. E Frank Oliver löckner 5 to B = a+b = () to = a+c = 9 () B to = b+c = 4 () Subtract () from (), 94 = (4) dd () and (4), = 0, a = 0 From () and (), b =, c = 9 B Frank Oliver löckner 6 a b c Mount, Book, Bioinformatics 00, p. 57

Frank Oliver löckner 7 Principle of Neighbor Joining (Saitou and Nei, 987) he fully resolved tree is decomposed from a fully unresolved star tree by successively inserting branches between a pair of closest neighbors and the remaining terminals in the tree B H D F E star decomposition B H D F E Distance Matrix dditive rees Finding a tree that fits to the matrix Find the optimal values for the branching pattern and the branch length NJ will find the correct tree if the distances are additive Problem: Nonadditive distances caused by superimposed changes Observed distance Real distance Frank Oliver löckner 8 Dealing with nonadditive distances Instead of using raw dissimilarity correct distances based on expected numbers of hidden changes For some models (J, KP, F84) simple distance equations exist For others one must use ML Outcome: dditivity is not restored!! > Optimality criterion is needed Most widely used = leastsquares criterion (e.g., Fitch Margoliash) can lead to negative branch length Minimal Evolution (PUP) Distance Matrix orrect for Multiple hanges Jukes and antor, 969 Frank Oliver löckner 9 Frank Oliver löckner 40 Pros and ons Very fast Only one tree is derived opology and branch lengths are calculated ounts for false identities Works with different models of evolution Discards the primary character data Different sequences can yield the same matrix distance method would reconstruct the true tree if all genetic divergence events were accurately recorded in the sequence Swofford, 996 Maximum Parsimony MP is an optimality criterion that appeals to the principle: he simplest explanation of the data is the best Model of evolution: Preservation is more likely than change haracter based method Evaluates trees Selects trees that minimize the total tree length Needs a set of outgroup taxa alculations are done from the terminal nodes towards the (arbitrary) root Implicit model of evolution no additional model needed Frank Oliver löckner 4 Frank Oliver löckner 4

Frank Oliver löckner 4 Maximum Parsimony Evaluation of rees he alignment is checked for informative positions o be informative, a site must have the same sequence characters in at least two taxa (e.g. site,,, 5) nd they must favor one topology over another (only site 5) Only the informative sites are analyzed S / S S S4 / S S S S4 S / S4 S S / S S S S4 4 5 / mutations mutation / mutations / Frank Oliver löckner 44 Pros and ons Works directly on the data Works fine on data with strong similarity Relatively fast Does not need a model of evolution alculates only topologies Performs weakly on distantly related data Prone to false identities (multiple changes) long branch attraction an produce many trees with the same parsimony score Maximum Likelihood ML evaluates a hypothesis about evolutionary history in terms of probability that a proposed model of the evolutionary process and the hypothesized history would give rise to the observed data haracter based method oncrete model of evolution needed ssumes that nucleotide sites evolve independently Likelihood for each site is calculated separately and combined to a total value for a tree Looks for the tree with the highest likelihood; L () = maximal Frank Oliver löckner 45 Frank Oliver löckner 46 Maximum Likelihood Maximum Likelihood he likelihood of the full tree is the product of the likelihood at each site L () = L () x L () x. x L (N) = N j = L(j) Because the probability of any single observation is an extremely small number they are normally handled as logarithms For every internal node all four nucleotides are allowed > 4x4 = 6 probabilities Each probability is the product of the probability of the base in (6) and the transition/transversion probabilities e.g. prob. = 0.5 or average frequency of in the sequence (> depends on model) > transversion = 0 6 and > transition = x0 6 Likelihood of = 0.5 x x0 6 x 0 6 = 5x0 ln L () = ln L () + ln L () +. + ln L (N) = N j = lnl( j ) Frank Oliver löckner Swofford, Book (Hillis), 996, p. 4 47 Frank Oliver löckner 48

Frank Oliver löckner 49 Pros and ons MP vs. ML Works directly on the data Performs well also on distantly related data Includes models of evolutions he whole tree is under evaluation topologies and branch lengths are optimized urrently regarded as the best method omputationally intense number of sequences is limited Frank Oliver löckner Swofford, Book (Hillis), 996, p. 49 50 Searching for optimal trees Exact lgorithms Exhaustive search Branchandbound Methods How many trees do we have to evaluate Places to add another taxon wo taxa = Heuristic pproaches Stepwise addition Star decomposition Branch swapping hree taxa Four taxa = = 5 Five taxa = 7 Frank Oliver löckner 5 Frank Oliver löckner 5 he 5 possible unrooted trees for 5 taxa Exhaustive opologies Number of unrooted, bifurcating trees No of sequences 4 5 6 7 8 9 0 50 No of trees 5 05 945 0,95 5,5,07,05.8x0 74 B( ) = i= (i 5) he root is just another taxon so: No of sequences 4 No of trees 5 Frank Oliver löckner Swofford, Book (Hillis), 996, p. 479 5 Frank Oliver löckner 54

Frank Oliver löckner 55 Exact algorithms Search tree for BranchandBound Exhaustive (< taxa) ll trees are evaluated Branchandbound (<0 taxa) onstruct a random tree with all sequences and evaluate its value L under the chosen optimality criterion according to the reconstruction method and model used his is the initial upper bound of L Start to reconstruct trees from to X taxon by stepwise addition of taxa Evaluate each tree if the score exceeds L there is no need to go further along this path, if the score < L proceed If the score at the end of the path is less than L take this for the new upper bound Frank Oliver löckner Swofford, Book (Hillis), 996, p. 480 56 Heuristic pproaches lobal vs. Local Optimum lobal vs. local optimum Heuristic tree searches generally operate by hill climbing methods Start with an initial tree Optimize (rearrange) it under the chosen optimality criterion If we find no way for further improvement stop Problem: here is no way of knowing if we reached the global or merely a local optimum Frank Oliver löckner 57 Frank Oliver löckner 58 lobal vs. Local Optimum Heuristics Stepwise ddition Stepwise ddition Start with three sequences dd next taxon evaluate tree, do rearrangements Save the one with the best score add next taxon ddition order In the order of the data in the alignment Use a distance algorithm to decide order e.g. by closest taxon addition dd the taxon that makes the optimal e.g. shortest tree Random taxon addition order Frank Oliver löckner 59 Frank Oliver löckner 60

Frank Oliver löckner 6 Heuristics Branch swapping Heuristics NNI Branch Swapping Nearest Neighbor Interchange (NNI) Subtree pruning and recrafting (SPR) ree bisection (BR) Hoping to find a better tree by disturbing (rearranging) the tree to overcome local optima Problem: If the tree is on a plateau and the global optimum several steps away we might still not reach it Frank Oliver löckner Felsenstein, Book, 004, p. 9 6 Heuristics SPR Heuristics BR Frank Oliver löckner Felsenstein, Book, 004, p. 4 6 Frank Oliver löckner Felsenstein, Book, 004, p. 4 64 onfidence ests Bootstrapping Bootstrapping Resampling tree evaluation technique New data sets are created from the original data set by sampling columns of characters by random with replacement Each site can be sampled again with the same probability as any of the other sites Problem: Some positions can be over represented, some sites are missing t least 00, better,000 trees should be calculated Remember: High bootstrap values can make wrong phylogeny look good!! * * * * * * * * * *** ********* ** * *** *** * **** ***** **** **** 0 0 0 40 50 consensus tngccatctttcacgnaacanncnctngcngaca HI attgcagtgtattggggacaaaatggaaatgaagggtctttgcaagatgc PSHI atagctgtttactggggccaaaacggtggagaaggatccttagcagacac NIDL atagtaatatattggggccaaaatgggaatgaaggtagcttagctgacac S6608 attgtcatatactggggccaaaatggtgatgaaggaagtcttgctgacac USSEQ_ atcgccatctattggggccaaaacggcaacgaaggctctcttgcatccac USSEQ_ atcgccatctattggggtcaaaacggcaacgagggctctcttgcatccac USSEQ_ atcggcatctattggggccaaaacggcaacgaaggctctcttgcatccac VIRE atttccgtctactggggtcaaaacggtaacgagggctccctggccgacgc VURNH auuuccgucuacuggggucaaaacggcaacgagggcucucuggccgacgc HHI atagccatctattggggccaaaacggaaacgaaggtaacctctctgccac VURNHB auagccaucuacuggggccaaaacggcaacgagggaacgcuuuccgaagc NBSIL attgtagtctattggggccaagatgtaggagaaggtaaattgattgacac Frank Oliver löckner 65 Frank Oliver löckner 66

Frank Oliver löckner 67 Bootstrapping Why do trees differ? Information content Sequencing errors lignment homology of characters Nonadditive data (false identities) Different and simplified models of evolution Independence of data Lineage and/or positionspecific rate of evolution Data selection Only subsets of organisms and positions alculation heuristics Small amount of evaluated trees strong dependence on the order of input data Local or global optimum? Frank Oliver löckner 68 onsensus trees Practical implications Filters Filters: Remove or weight down individual alignment columns while treeing Keep balance between data loss and gain of accuracy E.g. 50% conservation filter olumns in the alignment are only considered for tree reconstruction, when at least 50% of the sequences show the same residue Position variability he position variability for every column is calculated and shown as numbers 9 and characters Z means highly variable Z means extremely conserved (never seen) Frank Oliver löckner 69 Frank Oliver löckner 70 RB Filter Practical implications Outgroup hose as many sequences for the outgroup as possible hey should not be too far related to the group of interest Pic RB Phylo Data Use always the largest dataset available if necessary remove sequences after the calculation ompare different algorithms Reject problematic data Never reconstruct trees or filters on partial sequence data Frank Oliver löckner 7 Frank Oliver löckner 7

Frank Oliver löckner 7 RB Internal rchitecture Probefunctions Database Databasemanagement he concept of RB Probe_Design Probe_Match request update request lignment possible probes matching sequences next relative PServer Sequencealignment Phylogenetic reconstructions Frank Oliver löckner 74 PServer Do not overdo it Not delivered with RB Different format of your database for faster performance of sequence search functions within RB It is only used to search the next relative for the automatic aligner and for Probe_Design/Probe_Match reating/updating takes a long time and a lot of memory Once it has been created searching is very fast Frank Oliver löckner 75 Frank Oliver löckner 76