Introduction to MEGA

Size: px
Start display at page:

Download "Introduction to MEGA"

Transcription

1 Introduction to MEGA Download at: Thomas Randall, PhD Manual at:

2 Use of phylogenetic analysis software tools Bioinformatics software for biologists in the genomics era Sudhir Kumar and Joel Dudley Bioinformatics 23: Fig 1(B) Relative impacts of evolutionary analysis software packages over the last 10 years. Only non-commercial software packages available on-line (without fee) are included, except for two available for a nominal fee (shown with dashed line). Data for both panels were obtained from the Web of Science (February 2007 edition). For panel B, the numbers of new citation were generated using the Cited References facility with the search arguments for author name, cited work and citation year kindly provided by Joe Felsenstein for MEGA ( PAUP (paup.csit.fsu.edu), PHYLIP (evolution.genetics.washington.edu/phylip.html), MrBayes (mrbayes.csit.fsu.edu), Puzzle ( PhyML (atgc.lirmm.fr/phyml) andpaml (abacus.gene.ucl.ac.uk/software/paml.html).

3 MEGA contains all elements necessary for building a tree Import and editing sequence/chromatographs Clustalw for alignment Various options for contructing a phylogeny Several options for generating statistical significance Tree viewing function

4 Basic steps to build a phylogeny 1. Import and Align sequences 2. Select tree building option 3. Select distance matrix 4. Choose type of bootstrapping 5. Manipulate tree with tree viewer

5 Phylogeny options in MEGA4 UPGMA Neighbor joining Minimum evolution Maximum parsimony Distance methods General rules build tree with two independent methodologies for confirmation in MEGA - one distance method plus parsimony Maximum parsimony less effective for more distantly related sequences due to homoplasy (multiple substitutions at same site can accumulate over time) WARNING: Phylogenetics has a long history of heated arguments about the relative merits of different methods researchers in the field seem preadapted for ideological warfare Huelsenbeck et al., Syst. Biol. 51: 673

6 UPGMA (Unweighted Pair Group Method with Arithmatic Mean ) UPGMA employs a sequential clustering algorithm (neighbor joining), in which pairwise distances between sequences are computed, and the phylogenetic tree is built in a stepwise manner. We first identify from among all the sequences the two that are most similar to each other and then treat these as a new single branch. Subsequently from among the remaining sequences we identify the pair with the highest similarity, and so on. Assumes equal evolutionary rates (a clock)

7 Neighbor-Joining An algorithm for constructing phylogenetic trees using distance data. Once a distance measurement between a set of sequences has been determined, a neighbor joining algorithm will find the two closest, group them, then look for the next closest until all sequences are fit into a tree. Different algorithms for doing this have been written that either do or do not consider evolutionary distance. Examples: clustalw, UPGMA, neighbor (phylip) Difference between this and UPGMA (also a neighbor joining method) Is it does not assume a constant evolutionary rate in all lineages

8 Minimum evolution All possible trees are produced, the tree with the smallest total branch Length is chosen as the best tree. Branch length is proportional to the distance between each sequence. Maximum Parsimony The selection of the phylogenetic tree requiring the least number of substitutions from among all possible phylogenetic trees as the most likely to be the true phylogenetic tree. Usefulness declines with increasing evolutionary distance

9 Johns Hopkins University - Fall 2003 Phylogenetics & Computational Genomics Informative sites in parsimony Sites OTU 1 T C A G A T C T A G 2 T T A G A A C T A G 3 T T C G A T C G A G 4 T T C T A A G G A C Invariant sites are not used in parsimony (they yield no information on character state changes) Informative sites (at least two different kinds of residues each present at least two times) are used by parsimony because they discriminate between topologies i.e. different topologies require different numbers of changes between residues Singleton sites can not be used to discriminate between topologies (they require 1 change for all topologies) Lecture #7 Page 3

10 Maximum parsimony (MP) options Exhaustive Not an option here, but all possible trees are searched, practically this takes too much time so various shorcuts (branch and bound, heuristic) have been developed Branch and bound * This is a method of searching through tree space in order to find optimal trees. It is not exhaustive, trees with a total length longer than those already examined are not considered, reducing the complexity of the search. Guaranteed to find all MP trees. Becomes time consuming if more than 20 sequences are considered Heuristic Another approximate search, still using a branch and bound approach but making more assumptions. More useful for larger trees but no guarantee of finding the MP tree with the shortest length CNI (Close-Neighbor-Interchange) In any method, examining all possible topologies is very time consuming. This algorithm reduces the time spent searching by first producing a temporary tree, and then examining all of the topologies that are different from this temporary tree by a topological distance of dt = 2 and 4. If this is repeated many times, and all the topologies previously examined are avoided, one can usually obtain the tree being sought.

11 Statistical tests of significance Bootstrapping * This is a method of attempting to estimate confidence levels of inferred relationships. The bootstrap proceeds by resampling the original data matrix with replacement of the characters. It is analagous to cutting the data matrix into individual columns of data and throwing the characters into a hat. A character is then drawn at random from this hat and it becomes the first character of the new datamatrix. The character is then replaced in the hat, the hat is shaken and again another character is drawn from the hat. This process is repeated until our new pseudoreplicate is the same size as the original. Some characters will be sampled more than once and some will not be sampled at all. This process is repeated many times (say, 100-1,000) and phylogenies are reconstructed each time. After the bootstrap procedure is finished, a majority-rule consensus tree is constructed from the optimal tree from each bootstrap sample. The bootstrap support for any internal branch is the number of times it was recovered during the bootstrapping procedure. Interior Branch Test Similar to bootstrapping but is unwieldy with a large number of taxa. A t-test, which is computed using the bootstrap procedure, is constructed based on the interior branch length and its standard error and is available only for the NJ and Minimum Evolution trees. MEGA shows the confidence probability in the Tree Explorer; if this value is greater than 95% for a given branch, then the inferred length for that branch is considered significantly positive.

12 Other phylogeny software PHYLIP MrBayes: Bayesian Inference of Phylogeny TREE-PUZZLE 5.2: Maximum likelihood analysis MEGA has no ability to do either maximum likelihood analysis or bayesian inference. These are more sophisticated, and computationally intensive (and can be more accurate for distantly related sequences)

13 Distance Distance is a phylogenetic method that considers the additive differences between either nucleotides or amino acids along the entire length of sequence. A distance measurement is made considering each type of substitution (either transversion or transition) weighted differently, depending on the distance algorithm and weighting matrix used. As distances are re-computed for all possible pairs of sequence during each step of the assembly this can be computationally intensive Homo_sapie AAGCTTCACC GGCGCAGTCA TTCTCATAAT CGCCCACGGG CTTACATCCT Pan AAGCTTCACC GGCGCAATTA TCCTCATAAT CGCCCACGGA CTTACATCCT Gorilla AAGCTTCACC GGCGCAGTTG TTCTTATAAT TGCCCACGGA CTTACATCAT Pongo AAGCTTCACC GGCGCAACCA CCCTCATGAT TGCCCATGGA CTCACATCCT Hylobates AAGCTTTACA GGTGCAACCG TCCTCATAAT CGCCCACGGA CTAACCTCTT Macaca_fus AAGCTTTTCC GGCGCAACCA TCCTTATGAT CGCTCACGGA CTCACCTCTT M_mulatta AAGCTTTTCT GGCGCAACCA TCCTCATGAT TGCTCACGGA CTCACCTCTT M_fascicul AAGCTTCTCC GGCGCAACCA CCCTTATAAT CGCCCACGGG CTCACCTCTT M_sylvanus AAGCTTCTCC GGTGCAACTA TCCTTATAGT TGCCCATGGA CTCACCTCTT Saimiri_sc saagcttcac CGGCGCAATG ATCCTAATAA TCGCTCACGG GTTTACTTCG Tarsius_sy aaagtttcat TGGAGCCACC ACTCTTATAA TTGCCCATGG CCTCACCTCC Lemur_catt AAGCTTCATA GGAGCAACCA TTCTAATAAT CGCACATGGC CTTACATCAT 12 Homo_sapie Pan Gorilla Pongo Hylobates Macaca_fus M_mulatta M_fascicul M_sylvanus Saimiri_sc Tarsius_sy Lemur_catt matrix listing all pairwise differences

14 DNA Distance matrices A G C T Jukes-Cantor distance In the Jukes-Cantor model, the rate of nucleotide substitution is the same for all pairs of the four nucleotides A, T, C, and G. Many more models, with increasing complexity

15 Distance matrices in Mega Kimura 2-parameter distance Kimura s two parameter model corrects for different substitution rates between transitions (i.e. purine to purine) and transversions (i.e. purine to pyrimidine). Tamura-Nei distance The Tamura-Nei model (1993) corrects for multiple hits, taking into account the differences in substitution rate between nucleotides and the inequality of nucleotide frequencies. It distinguishes between transitional substitution rates between purines and transversional substitution rates between pyrimidines. It also assumes equality of substitution rates among sites (see related gamma model). Also: # differences Tamura 3-parameter LogDet

16 Which DNA distance matrix is appropriate? When the Jukes-Cantor * estimate of the number of nucleotide substitutions per site (d) between different sequences is about 0.05 or less (d < 0.05), use the Jukes-Cantor distance whether there is a transition/transversion bias or not or whether the substitution rate (l) varies with nucleotide site or not. In this case, the Kimura distance or the gamma distance gives essentially the same value as the Jukes-Cantor distance. One may also use the p-distance for constructing a topology. When 0.05 < d < 0.3, use the Jukes-Cantor distance unless the transition/transversion ratio (R) is high, say R >5. When this ratio is high and the number of nucleotides examined is large, (>10K) use the Kimura distance or the gamma distances for Kimura's 2-parameter model. When 0.3 < d < 1 and there is evidence that l varies extensively with site, use gamma distances. In general, one may choose different gamma distances, estimating a from data. When 0.3 < d < 1 and the frequencies of the four nucleotides (A, T, C, G) deviate substantially from equality but there is no strong transition/transversion bias, use the Tajima- Nei distance. When there are strong transition/transversion and G+C content biases, use the Tamura or Tamura-Nei distance. When d > 1 for many pairs of sequences, the phylogenetic tree estimated is not reliable for a number of reasons (e.g., large standard errors of d's and sequence alignment errors). We therefore suggest that these sets of data should not be used.

17 Protein Distance matrices in Mega p-distance This distance is the proportion (p) of amino acid sites at which the two sequences to be compared are different. It is obtained by dividing the number of amino acid differences by the total number of sites compared. It does not make any correction for multiple substitutions at the same site or differences in evolutionary rates among sites. Equal Input Model (Amino acids) In real data, frequencies usually vary among different kind of amino acids. In this case, the correction based on the equal input model gives a better estimate of the number of amino acid substitutions than the Poisson correction distance. Note that this assumes an equality of substitution rates among sites and the homogeneity of substitution patterns between lineages. Poisson correction The Poisson correction distance assumes equality of substitution rates among sites and equal amino acid frequencies while correcting for multiple substitutions at the same site. PAM & JTT * The PAM and JTT distances correct for multiple substitutions based on a model of amino acid substitution described as substitution-rate matrices.

18 ModelTest does a likelihood analysis on your data to determine The most appropriate DNA substitution matrix. WARNING: only for advanced users, also requires PAUP for an input

19 FindModel web based version of ModelTest Input is a concatenated fasta file

20 Result: MODEL CONSIDERED: JC : Jukes-Cantor (model 1) AIC1 = lnl = FindModel output JC+G : Jukes-Cantor plus Gamma (model 3) AIC3 = lnl = F81 : Felsenstein 1981 (model 5) AIC5 = lnl = F81+G : Felsenstein 1981 plus Gamma (model 7) AIC7 = lnl = K80 : Kimura 2-parameter (model 9) AIC9 = lnl = K80+G : Kimura 2-parameter plus Gamma (model 11) AIC11 = lnl = HKY : Hasegawa-Kishino-Yano (model 13) AIC13 = lnl = HKY+G : Hasegawa-Kishino-Yano plus Gamma (model 15) AIC15 = lnl = TrN : Tamura-Nei (model 21) AIC21 = lnl = AIC = Akaike Information Criterion lnl = maximum likelihood AICi = 2 ln Li + 2ki Model favored is the one with the lowest AIC TrN+G : Tamura-Nei plus Gamma (model 23) AIC23 = lnl = GTR : General Time Reversible (model 53) AIC53 = lnl = GTR+G : General Time Reversible plus Gamma (model 55) AIC55 = lnl = AIC-SELECTED MODEL: HKY : Hasegawa-Kishino-Yano (model 13) lnl = AIC =

21 DNA Substitution models in ModelFind Reduced set: JC : Jukes-Cantor (model 1) JC+G : Jukes-Cantor plus Gamma (model 3) F81 : Felsenstein 1981 (model 5) F81+G : Felsenstein 1981 plus Gamma (model 7) K80 : Kimura 2-parameter (model 9) K80+G : Kimura 2-parameter plus Gamma (model 11) HKY : Hasegawa-Kishino-Yano (model 13) HKY+G : Hasegawa-Kishino-Yano plus Gamma (model 15) TrN : Tamura-Nei (model 21) TrN+G : Tamura-Nei plus Gamma (model 23) GTR : General Time Reversible (model 53) GTR+G : General Time Reversible plus Gamma (model 55) Red indicates models available in MEGA If a model in black is suggested, use the one immediately below If GTR is suggested, use LogDet

22 parallelized clustalw non parallelized clustalw

23 Many MSA algorithms PLOS Comp. Biol. 3: e123

24 Alternative alignment tools FACT: in published comparisons between alignment tools, clustalw usually comes out close to the bottom T Coffee better, more computationally intensive Muscle better, less intensive than T Coffee Promals designed to optimize alignment for distantly related sequences Outputs for above need to be put in Appropriate format (.aln,.phy,.nex)

25 Displaying extensions on a PC My Computer > Tools > Folder Options > View > unclick on Hide Extensions Also, Control Panels > Folder Options > View > unclick on Hide Extensions

26 Test data sets Nature 442: 37 Science 320: 499

27 Computing d 1) Compute Jukes-Cantor distance; examine distance matrix. If d < 0.05 stop and use Jukes-Cantor substitution model 2) If 0.05 < d < 0.3, check R also; use Kimura 2 parameter option for computing d; change Substitutions to Include option from d: transitions + transversions to R = s/v and calculate 3) Choose model based on the guide on previous page

28 Analysis Preferences: Setting up an analysis User defined options

29 Analysis Preferences (Distance Computation) Substitution Model - In this set of options, you choose the various attributes of the substitution models. Model - Here you select a stochastic model for estimating evolutionary distance by clicking on the ellipses to the right of the currently selected model (click on the lime square to select this row first). This will reveal a menu containing many different distance methods and models. Substitutions to Include - Depending on the distance model or method selected, the evolutionary distance can be teased into two or more components. By clicking on the drop-down button (first click on the lime square to select this row), you will be provided with a list of components relevant to the chosen model. Transition/Transversion Ratio - This option will be visible if the chosen model requires you to provide a value for the Transition/Transversion ratio (R). Pattern among Lineages - This option becomes available if the selected model has formulas that allow the relaxation of the assumption of homogeneity of substitution patterns among lineages. Rates among Sites - This option becomes available if the selected distance model has formulas that allow rate variation among sites. If you choose gamma-distributed rates, then the Gamma parameter option becomes visible.

30 Treatment of gaps Gaps often are inserted during the alignment of homologous regions of sequences and represent deletions or insertions (indels). They introduce some complications in distance estimation. Furthermore, sites with missing information sometimes result from experimental difficulties; they present the same alignment problems as gaps. In the following discussion, both of these situations are treated in the same way. In MEGA, there are two ways to treat gaps. One is to delete all of these sites from the data analysis. This option, called the Complete-Deletion, is generally desirable because different regions of DNA or amino acid sequences evolve under different evolutionary forces. The second method is relevant if the number of nucleotides involved in a gap is small and if the gaps are distributed more or less randomly. In that case it may be possible to compute a distance for each pair of sequences, ignoring only those gaps that are involved in the comparison; this option is called Pairwise-Deletion. The following table illustrates the effect of these options on distance estimation with the following three sequences: Complete-Deletion * Pairwise-Deletion

31 Uniform Rates vs. Gamma distribution Ignore this option as MEGA has no way to calculate a, the value of gamma distribution A gamma distribution reflects that there is a substitution difference between different amino acids/nucleotides; a = 1, subsitution variation is very high; a = infinity, all substitutions are equally likely

32 Tree Explorer Save tree as.emf file (for ppt or word) northnigeria turkey Turkey2005 swan Czech2006 mallard B avaria2006 swan Mongolia2005 swan Astrakhan2005 turkey Suzdalka2005 swan Iran2006 mallard Italy2005 Save tree as.nwk file (for opening in other tree viewers) ((((northnigeria: ,((turkey_turkey2005: ,swan_czech2006: )0.96: ,(mallard_bavaria2006: , swan_mongolia2005: )0.27: )0.27: )0.57: ,swan_astrakhan2005: )0.49: , turkey_suzdalka2005: )0.37: ,swan_iran2006: ,mallard_italy2005: ); Save tree as.mts file (for opening in MEGA)

33 MEGA NJ bootstrapping <1 min laptop 1G RAM LagosSO452 LagosSO494 LagosSO300 LagosSO493 chicken Egypt2006 swan Czech2006 turkey Turkey2005 northnigeria swan Mongolia2005 goose Iraq2006 mallard Bavaria2006 duck Kurgan2005 swan Astrakhan2005 LagosBA209 LagosBA210 LagosBA211 goose Novo2005 Mr Bayes 1,000,000 generations 1.5 hrs-cluster LagosSO494 LagosSO452 LagosSO493 LagosSO300 chicken Egypt20 turkey Turkey20 swan Czech2006 goose Iraq2006 swan Mongolia20 northnigeria mallard Bavaria goose Novo2005 chicken Tula200 duck Kurgan2005 swan Astrakhan2 LagosBA209 LagosBA turkey Suzdalka2005 Gull Qinghai LagosBA210 Gull Qinghai swan Iran swan Iran chicken Tula2005 chicken Thai turkey Suzdalka chicken Thai200 duck Jiangxi2005 duck Jiangxi200 chicken Hebei2005 chicken Hebei20 mallard Italy2005 mallard Italy20 swan Iran2 PHYLIP dnapars bootstrapping 30 min laptop 1G RAM goose Iraq swan Astra chicken Tu turkey Suz goose Novo Gull Qingh northniger LagosBA209 LagosBA210 LagosBA211 LagosSO mallard It chicken He duck Jiang chicken Th turkey Suz chicken Tu goose Iraq mallard Ba goose Novo swan Iran2 Gull Qingh LagosSO493 LagosSO452 LagosSO494 swan Czech turkey Tur duck Kurga mallard Ba swan Mongo chicken Th duck Jiang chicken He chicken Eg mallard It Tree-Puzzle maximum likelihood 10,000 steps <1 min laptop 1G RAM Dataset from Nature 442: 37 Multiple introductions of H5N1 in Nigeria swan Mongo northniger LagosBA209 LagosBA211 LagosBA210 swan Astra duck Kurga swan Czech turkey Tur chicken Eg LagosSO493 LagosSO300 LagosSO494 LagosSO452

34 Tree Explorer Condensed Trees When several interior branches of a phylogenetic tree have low statistical support (PC or PB) values, it often is useful to produce a multifurcating tree by assuming that all interior branches have a branch length equal to 0. We call this multifurcating tree a condensed tree. In MEGA, condensed trees can be produced for any level of PC or PB value. For example, if there are several branches with PC or PB values of less than 50%, a condensed tree with the 50% PC or PB level will have a multifurcating tree with all its branch lengths reduced to 0. Consensus Tree The MP method produces many equally parsimonious trees. Choosing this command produces a composite tree that is a consensus among all such trees, for example, either as a strict consensus, in which all conflicting branching patterns among the trees are resolved by making those nodes multifurcating or as a Majority-Rule consensus, in which conflicting branching patterns are resolved by selecting the pattern seen in more than 50% of the trees. Importing trees from other phylogenetic tools Work outtrees from phylip,.dnd and.phb files from clustalw TreePuzzle, Mr Bayes (.con file needs a little processing)

35 MEGA4 Caption View Caption function gives a publication quality summary of analysis, and suggested references for publication

36 About authors Gene Duplication and Gene Subsitution in Evolution Masatoshi Nei Nature 221: 40 Evolution by the Birth-and-Death Process in Multigene Families of the Vertebrate Immune System Nei, M., et al. Proc. Natl. Acad. Sci USA 94: 7799 MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment Sudhir Kumar, K Tamura, and M Nei Briefings in Bioinformatics 5: The Neighbor-joining Method: A New Method for Reconstructing Phylogenetic Trees Naruya Saitou and Masatoshi Nei Mol. Biol. Evol 4: 406 Much of the material in this handout derived from: Molecular evolution and phylogenetics 2000 M Nei, S Kumar - Oxford Univ. Press, New York

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods

MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods MBE Advance Access published May 4, 2011 April 12, 2011 Article (Revised) MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods

More information

Phylogenetic analyses. Kirsi Kostamo

Phylogenetic analyses. Kirsi Kostamo Phylogenetic analyses Kirsi Kostamo The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species,

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

Lab 9: Maximum Likelihood and Modeltest

Lab 9: Maximum Likelihood and Modeltest Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2010 Updated by Nick Matzke Lab 9: Maximum Likelihood and Modeltest In this lab we re going to use PAUP*

More information

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

Preliminaries. Download PAUP* from:   Tuesday, July 19, 16 Preliminaries Download PAUP* from: http://people.sc.fsu.edu/~dswofford/paup_test 1 A model of the Boston T System 1 Idea from Paul Lewis A simpler model? 2 Why do models matter? Model-based methods including

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018 Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

More information

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used Molecular Phylogenetics and Evolution 31 (2004) 865 873 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev Efficiencies of maximum likelihood methods of phylogenetic inferences when different

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26 Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

More information

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22 Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Inferring Molecular Phylogeny

Inferring Molecular Phylogeny Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction

More information

Lecture Notes: Markov chains

Lecture Notes: Markov chains Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

The Phylogenetic Handbook

The Phylogenetic Handbook The Phylogenetic Handbook A Practical Approach to DNA and Protein Phylogeny Edited by Marco Salemi University of California, Irvine and Katholieke Universiteit Leuven, Belgium and Anne-Mieke Vandamme Rega

More information

Consensus Methods. * You are only responsible for the first two

Consensus Methods. * You are only responsible for the first two Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is

More information

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Statistical estimation of models of sequence evolution Phylogenetic inference using maximum likelihood:

More information

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE Manmeet Kaur 1, Navneet Kaur Bawa 2 1 M-tech research scholar (CSE Dept) ACET, Manawala,Asr 2 Associate Professor (CSE Dept) ACET, Manawala,Asr

More information

林仲彥. Dec 4,

林仲彥. Dec 4, 林仲彥 cylin@iis.sinica.edu.tw Dec 4, 2009 http://eln.iis.sinica.edu.tw Coding Characters and Defining Homology Classical phylogenetic analysis by Morphology Molecular phylogenetic analysis By Bio-Molecules

More information

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057 Bootstrapping and Tree reliability Biol4230 Tues, March 13, 2018 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 Rooting trees (outgroups) Bootstrapping given a set of sequences sample positions randomly,

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

Phylogeny: building the tree of life

Phylogeny: building the tree of life Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging Method KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging Zhang Zhang 1,2,3#, Jun Li 2#, Xiao-Qian Zhao 2,3, Jun Wang 1,2,4, Gane Ka-Shu Wong 2,4,5, and Jun Yu 1,2,4 * 1

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

Molecular Evolution, course # Final Exam, May 3, 2006

Molecular Evolution, course # Final Exam, May 3, 2006 Molecular Evolution, course #27615 Final Exam, May 3, 2006 This exam includes a total of 12 problems on 7 pages (including this cover page). The maximum number of points obtainable is 150, and at least

More information

Week 5: Distance methods, DNA and protein models

Week 5: Distance methods, DNA and protein models Week 5: Distance methods, DNA and protein models Genome 570 February, 2016 Week 5: Distance methods, DNA and protein models p.1/69 A tree and the expected distances it predicts E A 0.08 0.05 0.06 0.03

More information

Agricultural University

Agricultural University , April 2011 p : 8-16 ISSN : 0853-811 Vol16 No.1 PERFORMANCE COMPARISON BETWEEN KIMURA 2-PARAMETERS AND JUKES-CANTOR MODEL IN CONSTRUCTING PHYLOGENETIC TREE OF NEIGHBOUR JOINING Hendra Prasetya 1, Asep

More information

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny

More information

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki Phylogene)cs IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, 2016 Joyce Nzioki Phylogenetics The study of evolutionary relatedness of organisms. Derived from two Greek words:» Phle/Phylon: Tribe/Race» Genetikos:

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço jcarrico@fm.ul.pt Charles Darwin (1809-1882) Charles Darwin s tree of life in Notebook B, 1837-1838 Ernst Haeckel (1934-1919)

More information

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood For: Prof. Partensky Group: Jimin zhu Rama Sharma Sravanthi Polsani Xin Gong Shlomit klopman April. 7. 2003 Table of Contents Introduction...3

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Keywords: evolution, genomics, software, data mining, sequence alignment, distance, phylogenetics, selection

Keywords: evolution, genomics, software, data mining, sequence alignment, distance, phylogenetics, selection Sudhir Kumar has been Director of the Center for Evolutionary Functional Genomics in The Biodesign Institute at Arizona State University since 2002. His research interests include development of software,

More information

Introduction to Bioinformatics Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Dr. rer. nat. Gong Jing Cancer Research Center Medicine School of Shandong University 2012.11.09 1 Chapter 4 Phylogenetic Tree 2 Phylogeny Evidence from morphological ( 形态学的 ), biochemical, and gene sequence

More information

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe? How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft] Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley K.W. Will Parsimony & Likelihood [draft] 1. Hennig and Parsimony: Hennig was not concerned with parsimony

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

An Investigation of Phylogenetic Likelihood Methods

An Investigation of Phylogenetic Likelihood Methods An Investigation of Phylogenetic Likelihood Methods Tiffani L. Williams and Bernard M.E. Moret Department of Computer Science University of New Mexico Albuquerque, NM 87131-1386 Email: tlw,moret @cs.unm.edu

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila

More information

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A J Mol Evol (2000) 51:423 432 DOI: 10.1007/s002390010105 Springer-Verlag New York Inc. 2000 Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus

More information

Kei Takahashi and Masatoshi Nei

Kei Takahashi and Masatoshi Nei Efficiencies of Fast Algorithms of Phylogenetic Inference Under the Criteria of Maximum Parsimony, Minimum Evolution, and Maximum Likelihood When a Large Number of Sequences Are Used Kei Takahashi and

More information

C.DARWIN ( )

C.DARWIN ( ) C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships

More information

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC

More information

Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction

Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction Published online 22 February 2008 Nucleic Acids Research, 2008, Vol. 36, No. 5 e33 doi:10.1093/nar/gkn075 Performance comparison between k-tuple distance and four model-based distances in phylogenetic

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS Masatoshi Nei" Abstract: Phylogenetic trees: Recent advances in statistical methods for phylogenetic reconstruction and genetic diversity analysis were

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

COMPUTING LARGE PHYLOGENIES WITH STATISTICAL METHODS: PROBLEMS & SOLUTIONS

COMPUTING LARGE PHYLOGENIES WITH STATISTICAL METHODS: PROBLEMS & SOLUTIONS COMPUTING LARGE PHYLOGENIES WITH STATISTICAL METHODS: PROBLEMS & SOLUTIONS *Stamatakis A.P., Ludwig T., Meier H. Department of Computer Science, Technische Universität München Department of Computer Science,

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building How Molecules Evolve Guest Lecture: Principles and Methods of Systematic Biology 11 November 2013 Chris Simon Approaching phylogenetics from the point of view of the data Understanding how sequences evolve

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

Bayesian Models for Phylogenetic Trees

Bayesian Models for Phylogenetic Trees Bayesian Models for Phylogenetic Trees Clarence Leung* 1 1 McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada ABSTRACT Introduction: Inferring genetic ancestry of different species

More information

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony ioinformatics -- lecture 9 Phylogenetic trees istance-based tree building Parsimony (,(,(,))) rees can be represented in "parenthesis notation". Each set of parentheses represents a branch-point (bifurcation),

More information

Session 5: Phylogenomics

Session 5: Phylogenomics Session 5: Phylogenomics B.- Phylogeny based orthology assignment REMINDER: Gene tree reconstruction is divided in three steps: homology search, multiple sequence alignment and model selection plus tree

More information

Phylogeny: traditional and Bayesian approaches

Phylogeny: traditional and Bayesian approaches Phylogeny: traditional and Bayesian approaches 5-Feb-2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275-284, 2003 1 Phylogeny A graph depicting the ancestor-descendent

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

In: M. Salemi and A.-M. Vandamme (eds.). To appear. The. Phylogenetic Handbook. Cambridge University Press, UK.

In: M. Salemi and A.-M. Vandamme (eds.). To appear. The. Phylogenetic Handbook. Cambridge University Press, UK. In: M. Salemi and A.-M. Vandamme (eds.). To appear. The Phylogenetic Handbook. Cambridge University Press, UK. Chapter 4. Nucleotide Substitution Models THEORY Korbinian Strimmer () and Arndt von Haeseler

More information

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides hanks to Paul Lewis, Jeff horne, and Joe Felsenstein for the use of slides Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPM infers a tree from a distance

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

Multiple Sequence Alignment. Sequences

Multiple Sequence Alignment. Sequences Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe

More information

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS August 0 Vol 4 No 005-0 JATIT & LLS All rights reserved ISSN: 99-8645 wwwjatitorg E-ISSN: 87-95 EVOLUTIONAY DISTANCE MODEL BASED ON DIFFEENTIAL EUATION AND MAKOV OCESS XIAOFENG WANG College of Mathematical

More information

Minimum evolution using ordinary least-squares is less robust than neighbor-joining

Minimum evolution using ordinary least-squares is less robust than neighbor-joining Minimum evolution using ordinary least-squares is less robust than neighbor-joining Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA email: swillson@iastate.edu November

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information

In: P. Lemey, M. Salemi and A.-M. Vandamme (eds.). To appear in: The. Chapter 4. Nucleotide Substitution Models

In: P. Lemey, M. Salemi and A.-M. Vandamme (eds.). To appear in: The. Chapter 4. Nucleotide Substitution Models In: P. Lemey, M. Salemi and A.-M. Vandamme (eds.). To appear in: The Phylogenetic Handbook. 2 nd Edition. Cambridge University Press, UK. (final version 21. 9. 2006) Chapter 4. Nucleotide Substitution

More information