Quartet Inference from SNP Data Under the Coalescent Model

Size: px
Start display at page:

Download "Quartet Inference from SNP Data Under the Coalescent Model"

Transcription

1 Bioinformatics Advance Access published August 7, 2014 Quartet Inference from SNP Data Under the Coalescent Model Julia Chifman 1 and Laura Kubatko 2,3 1 Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, NC, Department of Statistics, The Ohio State University, Columbus, OH Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH Associate Editor: Prof. David Posada ABSTRACT Motivation: Increasing attention has been devoted to estimation of species-level phylogenetic relationships under the coalescent model. However, existing methods either use summary statistics (gene trees) to carry out estimation, ignoring an important source of variability in the estimates, or involve computationally-intensive Bayesian Markov chain Monte Carlo algorithms that don t scale well to whole-genome data sets. Results: We develop a method to infer relationships among quartets of taxa under the coalescent model using techniques from algebraic statistics. Uncertainty in the estimated relationships is quantified using the nonparametric bootstrap. The performance of our method is assessed with simulated data. We then describe how our method could be used for species tree inference in larger taxon samples, and demonstrate its utility using data sets for Sistrurus rattlesnakes and for soybeans. Availability and Implementation: The method to infer the phylogenetic relationship among quartets is implemented in the software SVDquartets, available at lkubatko/software/svdquartets. Contact: Laura Kubatko, lkubatko@stat.osu.edu 1 INTRODUCTION With recent advances in DNA sequencing technology, it is now common to have available alignments from multiple genes for inference of an overall species-level phylogeny. While this species tree is generally the object that we seek to estimate, it is widely known that each individual gene has its own phylogeny, called a gene tree, which may not agree with the species tree. Many possible causes of this gene incongruence are known, including horizontal gene transfer, gene duplication and loss, hybridization, and incomplete lineage sorting (ILS) (Maddison, 1997). Of these, the best studied is incomplete lineage sorting, which is commonly modeled by the coalescent process (Kingman, 1982a,b; Liu et al., 2009a). Much recent effort has been devoted to the development of methods to estimate species-level phylogenies from multi-locus data under the coalescent model (Liu and Pearl, 2007; Liu et al., to whom correspondence should be addressed 2009b; Kubatko et al., 2009; Heled and Drummond, 2010; Than and Nakhleh, 2009; Bryant et al., 2012). Here we consider this basic problem, although our approach to the problem differs from previous approaches in several important ways. Previous approaches can be divided into two groups (Liu et al., 2009a): summary-statistics approaches and sequence-based approaches. Summary-statistics approaches first estimate a gene tree independently for each gene, and then treat the estimated gene trees as data for a second stage of analysis to estimate the species tree. The most popular approaches in this category are Maximum Tree (Liu et al., 2009b) (also implemented in the program STEM (Kubatko et al., 2009)), STAR (Liu et al., 2009c), STEAC (Liu et al., 2009c), MP-EST (Liu et al., 2010), and Minimize Deep Coalescences (MDC; as implemented in the program PhyloNet (Than and Nakhleh, 2009)). These methods are computationally efficient for large data sets, but generally ignore variability in the estimated gene trees and thus potentially lose accuracy. The second group of methods uses the full data for estimation of the species tree via a Bayesian framework for inference. The three most common methods in this group, BEST (Liu and Pearl, 2007), *BEAST (Heled and Drummond, 2010), and SNAPP (Bryant et al., 2012), all seek to estimate the posterior distribution for the species tree using Markov chain Monte Carlo (MCMC), but differ in some details of the implementation. These methods become time-consuming when the number of loci is large, and assessment of convergence of the MCMC can be difficult. Our proposed method is distinct from both classes of existing approaches in that it uses the full data directly, but does not utilize a Bayesian framework. It is thus computationally efficient while incorporating all sources of variability (both mutational variance and coalescent variance (cf. Huang et al. (2010))) in the estimation process. The theory underlying our method applies to unlinked Single Nucleotide Polymorphism (SNP) data, for which each site is assumed to have its own genealogy drawn from the coalescent model; however, we use simulation to show that the method also performs well for multi-locus sequence data. To describe our proposed method, we first begin with a brief overview of the coalescent model in the context of species-level phylogenetics. We use simulation to assess the performance of the method for both simulated and empirical data. We conclude with a short discussion of how the proposed method can be scaled up to larger taxon sets Downloaded from by guest on July 19, 2015 The Author (2014). Published by Oxford University Press. All rights reserved. For Permissions, please journals.permissions@oup.com 1

2 Downloaded from by guest on July 19, 2015

3 Quartet Inference from SNPs We want to infer which of the three possible splits on quartets is the true split. One way to assess this would be to consider the Flat L1 L 2 ( ˆP ) matrix for each of the three possible splits, and measure which of the three is closest to a rank 10 matrix. To do this, we need a method to measure distances between matrices. Our choice of a distance, described below, is modeled after the approach of Eriksson (2005), who considered the problem of tree estimation from a flattening matrix obtained from the probability distribution of site patterns at the tips of a gene tree. His overall approach to estimation of the phylogeny differed from ours, however, in that he used splits of varying sizes (rather than just splits of quartets of taxa) to develop a clustering algorithm to obtain the phylogenetic estimate. We provide the details of our approach below. Let a ij be the (i, j) th entry of an m n matrix A. The Frobenius norm of a matrix A is m n A F = a 2 ij. i=1 j=1 An important property of the Frobenius norm is its characterization using the singular values of A, that is A F = p σi 2, where σ 1 σ 2 σ p 0 are the singular values of A and p = min{m, n}. The well-known low-rank approximation theorem (Eckart-Young theorem), implies that the distance from a matrix A to the nearest rank k matrix in the Frobenius norm is p A B F = σi 2. min rank(b)=k i=1 i=k+1 See Section 2.4 in Golub and Van Loan (2013) for more information about singular value decomposition. We apply this well-known result to our species tree estimation problem by defining the SVD score for a split L 1 L 2 to be SVD(L 1 L 2):= 16 ˆσ i 2, (2) i=11 where ˆσ i are the singular values of Flat L1 L 2 ( ˆP ), for i {11,...,16}. Our proposal for inferring the true species-level relationship within a sample of four taxa is thus the following. For each of the three possible splits, construct the matrix Flat L1 L 2 ( ˆP ) and compute SVD(L 1 L 2). The split with the smallest score is taken to be the true split. To quantify uncertainty in the inferred split, we implement a nonparametric bootstrap procedure as follows. For a data set consisting of M aligned sites, we re-sampled the columns of the data matrix with replacement M times to generate a new bootstrapped data matrix, and the SVD scores of the three splits are computed for this bootstrapped data matrix. This procedure is repeated B times, and the proportion of bootstrapped data sets that support each of the three possible splits provides a measure of support for that split. 2.2 Simulation Study We first use simulated data to assess the ability of SVD(L 1 L 2) to correctly identify the valid split under a variety of conditions. Before describing the simulation procedure, we first point out that while much of the currently available methodology for inferring species trees assumes that multi-locus data (e.g., aligned DNA sequences from many independent loci) are available for inference, our method is actually designed for unlinked sites, for example, for a sample of unlinked SNPs. This is because in computing the probability distribution of site patterns at the tips of the species tree, we integrate over the probability distribution of gene trees under the coalescent model, with the implicit assumption that sequence data evolve along these gene trees. Thus each site pattern is viewed as an independent draw from the distribution f(x 1 = i, X 2 = j, X 3 = k, X 4 = l S) = f(x1 = i, X2 = j, X3 = G k, X 4 = l G)f(G S)dG, where S represents the species tree (topology and speciation times) and G represents a gene tree (both topology and divergence times). True multi-locus data, however, consist of an aligned portion of the DNA that is believed to share a single underlying gene tree, and thus all sequence data within a locus are believed to have evolved from a common gene genealogy. We wish to examine the performance of our method for both unlinked SNP data and for multi-locus data, and we thus consider simulated data of two types: unlinked SNP data (e.g., each site has its own underlying gene tree) and multi-locus data (a sequence of length l is simulated from a shared underlying gene tree). Our simulation consists of the following steps: 1. Generate a sample of g gene trees from the model species tree ((1:x,2:x):x,(3:x,4:x):x), where x is the length of each branch, under the coalescent model using the program COAL (Degnan and Salter, 2005). 2. Generate sequence data of length n on each gene tree under a specified substitution model using the program Seq-Gen (Rambaut and Grassly, 1997). 3. Construct the flattening matrix for each of the three possible splits, and compute SVD(L 1 L 2) for each. 4. Repeat the above procedure (Steps 1 3) 1, 000 times and record SVD(L 1 L 2) k,k =1, 2,...,1, 000, for each split. For each of the 1, 000 data sets, generate B bootstrapped data sets and record SVD(L 1 L 2) k,b,k = 1, 2,...,1, 000; b = 1, 2,...,B for each split. Given the above simulation algorithm, there are several choices to be made at each step. In step (1), we must select the lengths of the branches, x, in the model species tree. We considered branches of length 0.5, 1.0, and 2.0 coalescent units. A branch of length 0.5 coalescent units is very short, and corresponds to a case in which there will be widespread incomplete lineage sorting, making species tree inference difficult. A branch of length 2.0 coalescent units is longer and will result in much lower rates of incomplete lineage sorting, resulting in an easier species tree inference problem. In Step (2), we need to choose the gene length, n. In simulating unlinked SNP data, we used g =5, 000 and n =1(corresponding to 5,000 unlinked SNPs) and for the multi-locus setting, we considered g = 10 and n = 500 (corresponding to 10 genes, each of length 500 sites). Further, step (2) requires choice of Downloaded from by guest on July 19,

4 Downloaded from by guest on July 19, 2015

5 Downloaded from by guest on July 19, 2015

6 Chifman and Kubatko Table 1. Time information for the simulation study with results shown in Figure 6. All results represent the average time in seconds (over 1,000 replicates) to carry out the computation of three SVD scores for the simulated data sets, and were obtained using the UNIX time command. Branch Number of Real User System Lengths Sites Time Time Time 0.5 1, , , , , , under the GTR+I+Γ model, however, bootstrap support values for the true split are sometimes lower, with the worst results occurring when the branch lengths are short. Overall, however, the bootstrap appears to give a reliable measure of support for the true split, particularly when the model assumptions are satisfied. Figure 6 examines the performance of the method for unlinked SNP data with varying numbers of sites. In particular, unlinked SNP data sets were generated with either 1,000, 5,000, or 10,000 total sites under model species trees with branch lengths of 0.5, 1.0, or 2.0 coalescent units. These results demonstrate that the method performs well even for smaller sample sizes. However, it is clear that as the sample size becomes larger, the separation between the scores for the valid and non-valid splits increases. This is to be expected, because the matrix Flat L1 L 2 ( ˆP ) will better approximate Flat L1 L 2 (P ) for larger sample sizes. Table 1 gives timing results for the simulations carried out in Figure 6. Because the main work undertaken by the method involves counting the number of site patterns in order to build the Flat L1 L 2 ( ˆP ) matrix, the time should be approximately linear in the number of unique site patterns in the data, which is related to both the total number of sites in the data matrix and the overall scale of time represented by the phylogeny. The results in Table 1 demonstrate that the time is less than linear in the total number of site patterns, as expected, and that the computations can be carried out very rapidly (e.g., the computation of three SVD scores for data matrices of 10,000 sites takes less than 0.1 seconds). 4.2 Potential Use for Species Tree Inference These results make it clear that the SVD score is a highly accurate means of inferring the correct, unrooted phylogenetic tree among a set of four taxa. We note that the SVD score is extremely easy to compute. It requires only counting the site patterns and constructing the matrix Flat L1 L 2 ( ˆP ). Computing singular values of a matrix is a standard calculation that any mathematical or statistical software package can easily implement. Our software, SVDquartets, carries out both steps using a PHYLIP-formatted input file. Given the efficiency with which computations can be carried out in the four-taxon setting, this method is a good candidate for estimation of species trees for larger taxon sets. We propose that the method could be used in the following way. For a data set with T taxa, form all samples of 4 taxa, or sample sets of 4 taxa if T is large. For each sample of four taxa, infer the valid split using the SVD score. Using the collection of inferred valid splits, construct a species tree estimate using a quartet assembly method. Substantial previous work and software exist for the problem of quartet assembly (see, e.g., Strimmer and von Haeseler (1996); Strimmer et al. (1997); Snir and Rao (2012)). We give the results of using this method for inferring a tree consisting of several North American rattlesnake species and for inferring a tree from SNP data for several soybean species below. This method has tremendous potential to improve the set of tools available for species tree inference. Unlike summary statistics methods, which are known to be quick but fail to model variability in individual gene tree estimates, this method uses the sequence data directly, thus incorporating all sources of variability. The other existing methods based on sequence data (BEST, *BEAST, and SNAPP) all rely on Bayesian MCMC methods, and thus require long computing times and the difficult problem of assessing convergence. Our method can be carried out rapidly, and is easily parallelizable, as each quartet can be analyzed on a separate processor. Our method can handle both unlinked SNP and multilocus data, again providing an advantage over existing sequencebased methods, which can handle either SNP (SNAPP) or multilocus (BEST and *BEAST) data. Bootstrapping can be easily implemented to provide a means of quantifying support for the estimated phylogeny. However, there are several issues with this method that will need to be examined in future work. First, the number of quartets to be sampled needs to be specified in cases where the number of taxa is too large to examine all possible quartets. This number should necessarily increase with increasingly large taxon samples, but we have not yet rigorously examined how to select this. In addition, it is worth pointing out that the method estimates the topology only. In some studies, other parameters associated with the evolutionary process, such as branch lengths and effective population sizes, will also be of interest. One possibility is that the tree topology could first be estimated with this method, and then fixed in a subsequent MCMC analysis with either *BEAST or SNAPP, thus greatly reducing the complexity of that analysis. Finally, we have not yet conducted a thorough simulation study of the inferential accuracy of this method for full species tree inference, which will be the topic of future work. 4.3 Application to Rattlesnake Data The results of the analysis of the rattlesnake data set are shown in Figure 7, with bootstrap support values above 50% indicated on the appropriate nodes. In the case of the lineage tree (Figure 7(a)), the method identifies the two major species S. catenatus and S. miliarius with high bootstrap support, and additionally groups the subspecies S. c. catenatus as monophyletic. In the species tree in Figure 7(b), we again see that the method correctly identifies the two species with high bootstrap support, and is able to differentiate subspecies S. c. catenatus from a clade containing the other two subspecies within this group. Within species S. miliarius, there is not strong support for the subspecies relationships. These results are consistent with the earlier analyses of Kubatko et al. (2011), in which strong support for the delimitation of S. c. catenatus as a distinct species was found using several methods of coalescent-based species tree inference. Those analyses also found Downloaded from by guest on July 19,

7 Downloaded from by guest on July 19, 2015

8 Chifman and Kubatko '()$ '*)$ W12 W07 W07 W14 W10 W08 W12 W15 ##$!!$ C01 C35 C12 C33 W14 W10 C35!!"!#$!!"!#$ Fig. 8. Results of the analysis of the soybean data. (a) Tree estimated by SVDquartets with bootstrap support values. (b) Maximum clade credibility tree estimated using SNAPP. and Associate Editor David Posada for helpful comments on earlier versions of this manuscript. FUNDING The authors are supported in part by National Science Foundation award DMS and in part by NIH Cancer Biology Training Grant T32-CA (J.C.). REFERENCES Allman, E. S. and Rhodes, J. A. (2008) Phylogenetic ideals and varieties for the general Markov model. Adv. in Appl. Math., 40 (2). arxiv:math.ag/ Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N. and RoyChoudhury, A. (2012) Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol. Biol. Evol., 29 (8), Chifman, J. and Kubatko, L. S. (2014) Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, submitted; available at Degnan, J and Salter, L. (2005) Gene tree distributions under the coalescent process. Evolution 59(1), DeGeorgio, M. and Degnan, J. (2010) Fast and consistent estimation of species trees using supermatrix rooted triples. Mol. Biol. Evol. 27(3), Eriksson, N. (2005) Tree construction using singular value decompsition, in L. Pachter and B. Sturmfels, editors, Algebraic Statistics for Computational Biology, chapter 19, pgs Cambridge University Press, Cambridge, UK. Golub, G. H. and Van Loan, C. F. (2013), Matrix Computations (4th Edition), Johns Hopkins University Press. Heled, J. and Drummond, A. (2010) Bayesian inference of species trees from multilocus data. Mol. Biol. Evol., 27, Huang, H., He, Q., Kubatko, L., and Knowles, L. (2010) Sources of error for species-tree estimation: Impact of mutational and coalescent effects on accuracy and implications for choosing among different methods. Syst. Biol., 59(5), Jukes, T. and Cantor, C. (1969) Evolution of Protein Molecules. New York: Academic Press, pp Kingman,J.F.C. (1982) The coalescent. Stoch. Proc. Appl., 13, Kingman, J. F. C. (1982) Exchangeability and the evolution of large populations. Pp in G. Koch and F. Spizzichino, eds. C33 Exchangeability in probability and statistics. North-Holland, Amsterdam. Kubatko, L. S., Carstens, B. C., and Knowles, L. L. (2009) STEM: Species tree estimation using maximum likelihood for gene trees under the coalescent. Bioinformatics, 25, Kubatko, L. S., Gibbs, H. L., and Bloomquist, E. W. (2011) Inferring specieslevel phylogenies and taxonomic distinctiveness using multilocus data in Sistrurus rattlesnakes. Syst. Biol., 60, Lam, H. M., et al. (2010) Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection, Nat. Genet., 42, Lee, T. H., Guo, H., Wang, X., Kim, C., and Paterson, A. H. (2014) SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data. BMC Genomics, 15(1). Liu, L. and Pearl, D. (2007). Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst. Biol., 56, Liu, L., Yu, L., Kubatko, L., Pearl, D., and Edwards, S. (2009) Coalescent methods for estimating phylogenetic trees. Mol. Phylogenet. Evol. 52, Liu, L., Yu L., and Pearl D. K. (2009) Maximum tree: a consistent estimator of the species tree. J. Math. Biol., 60, Liu, L., Yu, L., Pearl, D. and Edwards, S. (2009) Estimating species phylogenies using coalescence times among sequences. Syst. Biol., 58(5), Liu, L., Yu, L., and Edwards, S. (2010) A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol., 10, 302. Maddison,W. (1997) Gene trees in species trees. Syst. Biol., 46, Rambaut, A. and Grassly, N. C. (1997) Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13, Rannala, B. and Yang, Z. (2003) Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, Snir, S. and S. Rao. (2012) Quartet MaxCut: A fast algorithm for amalgamating quartet trees. Mol. Phylogen. Evol. 62:,1-8. Strimmer, K., and von Haeseler, A. (1996) Quartet puzzling: A quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13, Strimmer, K., Goldman, N., and von Haeseler, A. (1997) Bayesian probabilities and quartet puzzling. Mol. Biol. Evol. 14, Tavare, S. (1986) Some probabilistic and statistical problems in the analysis of DNA sequences, Lectures on Mathematics in the Life Sciences (American Mathematical Society) 17, Than, C. and Nakhleh, L. (2009) Species tree inference by minimizing deep coalescences. PLoS Computational Biology, 5(9), e W08 C12 W15 C01 Downloaded from by guest on July 19,

Jed Chou. April 13, 2015

Jed Chou. April 13, 2015 of of CS598 AGB April 13, 2015 Overview of 1 2 3 4 5 Competing Approaches of Two competing approaches to species tree inference: Summary methods: estimate a tree on each gene alignment then combine gene

More information

Taming the Beast Workshop

Taming the Beast Workshop Workshop and Chi Zhang June 28, 2016 1 / 19 Species tree Species tree the phylogeny representing the relationships among a group of species Figure adapted from [Rogers and Gibbs, 2014] Gene tree the phylogeny

More information

Species Tree Inference using SVDquartets

Species Tree Inference using SVDquartets Species Tree Inference using SVDquartets Laura Kubatko and Dave Swofford May 19, 2015 Laura Kubatko SVDquartets May 19, 2015 1 / 11 SVDquartets In this tutorial, we ll discuss several different data types:

More information

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees Concatenation Analyses in the Presence of Incomplete Lineage Sorting May 22, 2015 Tree of Life Tandy Warnow Warnow T. Concatenation Analyses in the Presence of Incomplete Lineage Sorting.. 2015 May 22.

More information

Anatomy of a species tree

Anatomy of a species tree Anatomy of a species tree T 1 Size of current and ancestral Populations (N) N Confidence in branches of species tree t/2n = 1 coalescent unit T 2 Branch lengths and divergence times of species & populations

More information

WenEtAl-biorxiv 2017/12/21 10:55 page 2 #2

WenEtAl-biorxiv 2017/12/21 10:55 page 2 #2 WenEtAl-biorxiv 0// 0: page # Inferring Phylogenetic Networks Using PhyloNet Dingqiao Wen, Yun Yu, Jiafan Zhu, Luay Nakhleh,, Computer Science, Rice University, Houston, TX, USA; BioSciences, Rice University,

More information

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University kubatko.2@osu.edu

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species DNA sequences II Analyses of multiple sequence data datasets, incongruence tests, gene trees vs. species tree reconstruction, networks, detection of hybrid species DNA sequences II Test of congruence of

More information

Efficient Bayesian species tree inference under the multi-species coalescent

Efficient Bayesian species tree inference under the multi-species coalescent Efficient Bayesian species tree inference under the multi-species coalescent arxiv:1512.03843v1 [q-bio.pe] 11 Dec 2015 Bruce Rannala 1 and Ziheng Yang 2 1 Department of Evolution & Ecology, University

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Efficient Bayesian Species Tree Inference under the Multispecies Coalescent

Efficient Bayesian Species Tree Inference under the Multispecies Coalescent Syst. Biol. 66(5):823 842, 2017 The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com

More information

To link to this article: DOI: / URL:

To link to this article: DOI: / URL: This article was downloaded by:[ohio State University Libraries] [Ohio State University Libraries] On: 22 February 2007 Access Details: [subscription number 731699053] Publisher: Taylor & Francis Informa

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis. Research article.

Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis. Research article. Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis David Bryant,*,1 Remco Bouckaert, 2 Joseph Felsenstein, 3 Noah A. Rosenberg, 4 and Arindam

More information

In comparisons of genomic sequences from multiple species, Challenges in Species Tree Estimation Under the Multispecies Coalescent Model REVIEW

In comparisons of genomic sequences from multiple species, Challenges in Species Tree Estimation Under the Multispecies Coalescent Model REVIEW REVIEW Challenges in Species Tree Estimation Under the Multispecies Coalescent Model Bo Xu* and Ziheng Yang*,,1 *Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China and Department

More information

Coalescent Histories on Phylogenetic Networks and Detection of Hybridization Despite Incomplete Lineage Sorting

Coalescent Histories on Phylogenetic Networks and Detection of Hybridization Despite Incomplete Lineage Sorting Syst. Biol. 60(2):138 149, 2011 c The Author(s) 2011. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com

More information

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Workshop III: Evolutionary Genomics

Workshop III: Evolutionary Genomics Identifying Species Trees from Gene Trees Elizabeth S. Allman University of Alaska IPAM Los Angeles, CA November 17, 2011 Workshop III: Evolutionary Genomics Collaborators The work in today s talk is joint

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Bayesian inference of species trees from multilocus data

Bayesian inference of species trees from multilocus data MBE Advance Access published November 20, 2009 Bayesian inference of species trees from multilocus data Joseph Heled 1 Alexei J Drummond 1,2,3 jheled@gmail.com alexei@cs.auckland.ac.nz 1 Department of

More information

The Generalized Neighbor Joining method

The Generalized Neighbor Joining method The Generalized Neighbor Joining method Ruriko Yoshida Dept. of Mathematics Duke University Joint work with Dan Levy and Lior Pachter www.math.duke.edu/ ruriko data mining 1 Challenge We would like to

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support Siavash Mirarab University of California, San Diego Joint work with Tandy Warnow Erfan Sayyari

More information

Phylogenetic Geometry

Phylogenetic Geometry Phylogenetic Geometry Ruth Davidson University of Illinois Urbana-Champaign Department of Mathematics Mathematics and Statistics Seminar Washington State University-Vancouver September 26, 2016 Phylogenies

More information

Understanding How Stochasticity Impacts Reconstructions of Recent Species Divergent History. Huateng Huang

Understanding How Stochasticity Impacts Reconstructions of Recent Species Divergent History. Huateng Huang Understanding How Stochasticity Impacts Reconstructions of Recent Species Divergent History by Huateng Huang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Posterior predictive checks of coalescent models: P2C2M, an R package

Posterior predictive checks of coalescent models: P2C2M, an R package Molecular Ecology Resources (2016) 16, 193 205 doi: 10.1111/1755-0998.12435 Posterior predictive checks of coalescent models: P2C2M, an R package MICHAEL GRUENSTAEUDL,* NOAH M. REID, GREGORY L. WHEELER*

More information

JML: testing hybridization from species trees

JML: testing hybridization from species trees Molecular Ecology Resources (2012) 12, 179 184 doi: 10.1111/j.1755-0998.2011.03065.x JML: testing hybridization from species trees SIMON JOLY Institut de recherche en biologie végétale, Université de Montréal

More information

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1-2010 Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci Elchanan Mossel University of

More information

Introduction to Phylogenetic Analysis

Introduction to Phylogenetic Analysis Introduction to Phylogenetic Analysis Tuesday 24 Wednesday 25 July, 2012 School of Biological Sciences Overview This free workshop provides an introduction to phylogenetic analysis, with a focus on the

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

Chapter 7: Models of discrete character evolution

Chapter 7: Models of discrete character evolution Chapter 7: Models of discrete character evolution pdf version R markdown to recreate analyses Biological motivation: Limblessness as a discrete trait Squamates, the clade that includes all living species

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Upcoming challenges in phylogenomics. Siavash Mirarab University of California, San Diego

Upcoming challenges in phylogenomics. Siavash Mirarab University of California, San Diego Upcoming challenges in phylogenomics Siavash Mirarab University of California, San Diego Gene tree discordance The species tree gene1000 Causes of gene tree discordance include: Incomplete Lineage Sorting

More information

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Statistical estimation of models of sequence evolution Phylogenetic inference using maximum likelihood:

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

PhyloNet. Yun Yu. Department of Computer Science Bioinformatics Group Rice University

PhyloNet. Yun Yu. Department of Computer Science Bioinformatics Group Rice University PhyloNet Yun Yu Department of Computer Science Bioinformatics Group Rice University yy9@rice.edu Symposium And Software School 2016 The University Of Texas At Austin Installation System requirement: Java

More information

arxiv: v1 [q-bio.pe] 27 Oct 2011

arxiv: v1 [q-bio.pe] 27 Oct 2011 INVARIANT BASED QUARTET PUZZLING JOE RUSINKO AND BRIAN HIPP arxiv:1110.6194v1 [q-bio.pe] 27 Oct 2011 Abstract. Traditional Quartet Puzzling algorithms use maximum likelihood methods to reconstruct quartet

More information

Methods to reconstruct phylogene1c networks accoun1ng for ILS

Methods to reconstruct phylogene1c networks accoun1ng for ILS Methods to reconstruct phylogene1c networks accoun1ng for ILS Céline Scornavacca some slides have been kindly provided by Fabio Pardi ISE-M, Equipe Phylogénie & Evolu1on Moléculaires Montpellier, France

More information

Bayesian Models for Phylogenetic Trees

Bayesian Models for Phylogenetic Trees Bayesian Models for Phylogenetic Trees Clarence Leung* 1 1 McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada ABSTRACT Introduction: Inferring genetic ancestry of different species

More information

Intraspecific gene genealogies: trees grafting into networks

Intraspecific gene genealogies: trees grafting into networks Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation

More information

Introduction to Phylogenetic Analysis

Introduction to Phylogenetic Analysis Introduction to Phylogenetic Analysis Tuesday 23 Wednesday 24 July, 2013 School of Biological Sciences Overview This workshop will provide an introduction to phylogenetic analysis, leading to a focus on

More information

The problem Lineage model Examples. The lineage model

The problem Lineage model Examples. The lineage model The lineage model A Bayesian approach to inferring community structure and evolutionary history from whole-genome metagenomic data Jack O Brien Bowdoin College with Daniel Falush and Xavier Didelot Cambridge,

More information

Introduction to Algebraic Statistics

Introduction to Algebraic Statistics Introduction to Algebraic Statistics Seth Sullivant North Carolina State University January 5, 2017 Seth Sullivant (NCSU) Algebraic Statistics January 5, 2017 1 / 28 What is Algebraic Statistics? Observation

More information

SpeciesNetwork Tutorial

SpeciesNetwork Tutorial SpeciesNetwork Tutorial Inferring Species Networks from Multilocus Data Chi Zhang and Huw A. Ogilvie E-mail: zhangchi@ivpp.ac.cn January 21, 2018 Introduction This tutorial covers SpeciesNetwork, a fully

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting

Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting arxiv:1509.06075v3 [q-bio.pe] 12 Feb 2016 Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting Claudia Solís-Lemus 1 and Cécile Ané 1,2 1 Department of Statistics,

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Inferring Speciation Times under an Episodic Molecular Clock

Inferring Speciation Times under an Episodic Molecular Clock Syst. Biol. 56(3):453 466, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701420643 Inferring Speciation Times under an Episodic Molecular

More information

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species PGA: A Program for Genome Annotation by Comparative Analysis of Maximum Likelihood Phylogenies of Genes and Species Paulo Bandiera-Paiva 1 and Marcelo R.S. Briones 2 1 Departmento de Informática em Saúde

More information

Maximum Likelihood Inference of Reticulate Evolutionary Histories

Maximum Likelihood Inference of Reticulate Evolutionary Histories Maximum Likelihood Inference of Reticulate Evolutionary Histories Luay Nakhleh Department of Computer Science Rice University The 2015 Phylogenomics Symposium and Software School The University of Michigan,

More information

Estimating Species Phylogeny from Gene-Tree Probabilities Despite Incomplete Lineage Sorting: An Example from Melanoplus Grasshoppers

Estimating Species Phylogeny from Gene-Tree Probabilities Despite Incomplete Lineage Sorting: An Example from Melanoplus Grasshoppers Syst. Biol. 56(3):400-411, 2007 Copyright Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701405560 Estimating Species Phylogeny from Gene-Tree Probabilities

More information

Fast coalescent-based branch support using local quartet frequencies

Fast coalescent-based branch support using local quartet frequencies Fast coalescent-based branch support using local quartet frequencies Molecular Biology and Evolution (2016) 33 (7): 1654 68 Erfan Sayyari, Siavash Mirarab University of California, San Diego (ECE) anzee

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

DNA-based species delimitation

DNA-based species delimitation DNA-based species delimitation Phylogenetic species concept based on tree topologies Ø How to set species boundaries? Ø Automatic species delimitation? druhů? DNA barcoding Species boundaries recognized

More information

Metric learning for phylogenetic invariants

Metric learning for phylogenetic invariants Metric learning for phylogenetic invariants Nicholas Eriksson nke@stanford.edu Department of Statistics, Stanford University, Stanford, CA 94305-4065 Yuan Yao yuany@math.stanford.edu Department of Mathematics,

More information

Molecular Evolution & Phylogenetics

Molecular Evolution & Phylogenetics Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

Inferring Species-Level Phylogenies and Taxonomic Distinctiveness Using Multilocus Data In Sistrurus Rattlesnakes

Inferring Species-Level Phylogenies and Taxonomic Distinctiveness Using Multilocus Data In Sistrurus Rattlesnakes Systematic Biology Advance Access published March 9, 2011 c The Author(s) 2011. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions,

More information

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree Nicolas Salamin Department of Ecology and Evolution University of Lausanne

More information

Properties of Consensus Methods for Inferring Species Trees from Gene Trees

Properties of Consensus Methods for Inferring Species Trees from Gene Trees Syst. Biol. 58(1):35 54, 2009 Copyright c Society of Systematic Biologists DOI:10.1093/sysbio/syp008 Properties of Consensus Methods for Inferring Species Trees from Gene Trees JAMES H. DEGNAN 1,4,,MICHAEL

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

19 Tree Construction using Singular Value Decomposition

19 Tree Construction using Singular Value Decomposition 19 Tree Construction using Singular Value Decomposition Nicholas Eriksson We present a new, statistically consistent algorithm for phylogenetic tree construction that uses the algeraic theory of statistical

More information

Lecture 6 Phylogenetic Inference

Lecture 6 Phylogenetic Inference Lecture 6 Phylogenetic Inference From Darwin s notebook in 1837 Charles Darwin Willi Hennig From The Origin in 1859 Cladistics Phylogenetic inference Willi Hennig, Cladistics 1. Clade, Monophyletic group,

More information

The Phylogenetic Handbook

The Phylogenetic Handbook The Phylogenetic Handbook A Practical Approach to DNA and Protein Phylogeny Edited by Marco Salemi University of California, Irvine and Katholieke Universiteit Leuven, Belgium and Anne-Mieke Vandamme Rega

More information

An Investigation of Phylogenetic Likelihood Methods

An Investigation of Phylogenetic Likelihood Methods An Investigation of Phylogenetic Likelihood Methods Tiffani L. Williams and Bernard M.E. Moret Department of Computer Science University of New Mexico Albuquerque, NM 87131-1386 Email: tlw,moret @cs.unm.edu

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

One-minute responses. Nice class{no complaints. Your explanations of ML were very clear. The phylogenetics portion made more sense to me today.

One-minute responses. Nice class{no complaints. Your explanations of ML were very clear. The phylogenetics portion made more sense to me today. One-minute responses Nice class{no complaints. Your explanations of ML were very clear. The phylogenetics portion made more sense to me today. The pace/material covered for likelihoods was more dicult

More information

Molecular Evolution, course # Final Exam, May 3, 2006

Molecular Evolution, course # Final Exam, May 3, 2006 Molecular Evolution, course #27615 Final Exam, May 3, 2006 This exam includes a total of 12 problems on 7 pages (including this cover page). The maximum number of points obtainable is 150, and at least

More information

Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution

Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution Elizabeth S. Allman Dept. of Mathematics and Statistics University of Alaska Fairbanks TM Current Challenges and Problems

More information

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To

More information

arxiv: v2 [q-bio.pe] 4 Feb 2016

arxiv: v2 [q-bio.pe] 4 Feb 2016 Annals of the Institute of Statistical Mathematics manuscript No. (will be inserted by the editor) Distributions of topological tree metrics between a species tree and a gene tree Jing Xi Jin Xie Ruriko

More information

Using algebraic geometry for phylogenetic reconstruction

Using algebraic geometry for phylogenetic reconstruction Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA

More information

Analytic Solutions for Three Taxon ML MC Trees with Variable Rates Across Sites

Analytic Solutions for Three Taxon ML MC Trees with Variable Rates Across Sites Analytic Solutions for Three Taxon ML MC Trees with Variable Rates Across Sites Benny Chor Michael Hendy David Penny Abstract We consider the problem of finding the maximum likelihood rooted tree under

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES INTRODUCTION CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES This worksheet complements the Click and Learn developed in conjunction with the 2011 Holiday Lectures on Science, Bones, Stones, and Genes:

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

C.DARWIN ( )

C.DARWIN ( ) C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships

More information

Diffusion Models in Population Genetics

Diffusion Models in Population Genetics Diffusion Models in Population Genetics Laura Kubatko kubatko.2@osu.edu MBI Workshop on Spatially-varying stochastic differential equations, with application to the biological sciences July 10, 2015 Laura

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

Consensus Methods. * You are only responsible for the first two

Consensus Methods. * You are only responsible for the first two Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is

More information

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS Masatoshi Nei" Abstract: Phylogenetic trees: Recent advances in statistical methods for phylogenetic reconstruction and genetic diversity analysis were

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information