Fast coalescent-based branch support using local quartet frequencies
|
|
- Vincent Doyle
- 6 years ago
- Views:
Transcription
1 Fast coalescent-based branch support using local quartet frequencies Molecular Biology and Evolution (2016) 33 (7): Erfan Sayyari, Siavash Mirarab University of California, San Diego (ECE) anzee Orangutan
2 Phylogenomics Orangutan anzee gene 1 gene 2 gene 999 gene 1000 ACTGCACACCG ACTGC-CCCCG AATGC-CCCCG -CTGCACACGG CTGAGCATCG CTGAGC-TCG ATGAGC-TC- CTGA-CAC-G AGCAGCATCGTG AGCAGC-TCGTG AGCAGC-TC-TG C-TA-CACGGTG CAGGCACGCACGAA AGC-CACGC-CATA ATGGCACGC-C-TA AGCTAC-CACGGAT gene here refers to a portion of the genome (not a functional gene) 2
3 Gene tree discordance gene 1 gene1000 3
4 Gene tree discordance The species tree gene 1 gene1000 Orangutan A gene tree 3
5 Gene tree discordance The species tree gene 1 gene1000 Orangutan A gene tree Causes of gene tree discordance include: Incomplete Lineage Sorting (ILS) Duplication and loss Horizontal Gene Transfer (HGT) 3
6 Incomplete Lineage Sorting (ILS) A random process related to the coalescence of alleles across various populations Tracing alleles through generations 4
7 Incomplete Lineage Sorting (ILS) A random process related to the coalescence of alleles across various populations Tracing alleles through generations 4
8 Incomplete Lineage Sorting (ILS) A random process related to the coalescence of alleles across various populations Tracing alleles through generations Omnipresent: possible for every tree Likely for short branches or large population sizes 4
9 MSC and Identifiability A statistical model called multi-species coalescent (MSC) can generate ILS. 5
10 MSC and Identifiability A statistical model called multi-species coalescent (MSC) can generate ILS. Any species tree defines a unique distribution on the set of all possible gene trees 5
11 MSC and Identifiability A statistical model called multi-species coalescent (MSC) can generate ILS. Any species tree defines a unique distribution on the set of all possible gene trees In principle, the species tree can be identified despite high discordance from the gene tree distribution Likelihood calculation is not feasible. 5
12 Unrooted quartets under MSC model For a quartet (4 species), the unrooted species tree topology has at least 1/3 probability in gene trees (Allman, et al. 2010) θ 1 =70% θ 2 =15% θ 3 =15% d=0.8 6
13 Unrooted quartets under MSC model For a quartet (4 species), the unrooted species tree topology has at least 1/3 probability in gene trees (Allman, et al. 2010) θ 1 =70% θ 2 =15% θ 3 =15% d=0.8 The most frequent gene tree = The most likely species tree 6
14 Unrooted quartets under MSC model For a quartet (4 species), the unrooted species tree topology has at least 1/3 probability in gene trees (Allman, et al. 2010) θ 1 =70% θ 2 =15% θ 3 =15% d=0.8 The most frequent gene tree = The most likely species tree speices topology probability =1 2 3 e d 1/ branch length 6
15 Unrooted quartets under MSC model For a quartet (4 species), the unrooted species tree topology has at least 1/3 probability in gene trees (Allman, et al. 2010) θ 1 =70% θ 2 =15% θ 3 =15% d=0.8 The most frequent gene tree = The most likely species tree speices topology probability =1 2 3 e d 1/3 shorter branches more discordance a harder species tree reconstruction problem branch length 6
16 Species tree inference for >4 species For >4 species, the species tree topology can be different from the most like gene tree (called anomaly zone) (Degnan, 2013) Rhesus 7
17 Species tree inference for >4 species For >4 species, the species tree topology can be different from the most like gene tree (called anomaly zone) (Degnan, 2013) Rhesus 1. Break gene trees into ( n 4 ) quartets of species 2. Find the dominant tree for all quartets of taxa 3. Combine quartet trees Some tools (e.g.. BUCKy-p [Larget, et al., 2010]) 7
18 Species tree inference for >4 species For >4 species, the species tree topology can be different from the most like gene tree (called anomaly zone) (Degnan, 2013) ASTRAL: Rhesus weight all 3( n 4 ) quartet topologies by 1. Break gene trees into ( n 4 ) quartets of species their frequency in gene trees & find the optimal species tree using dynamic programming 2. Find the dominant tree for all quartets of taxa 3. Combine quartet trees Some tools (e.g.. BUCKy-p [Larget, et al., 2010]) 7
19 ASTRAL used by biologists Plants: Wickett et al., 2014, PNAS Birds: Prum et al., 2015, Nature ASTRALI: [Mirarab et al., 2014, Bioinformatics] Xenoturbella Cannon et al., 2016, Nature Xenoturbella Rouse et al., 2016, Nature Flatworms: Laumer et al., 2015, elife Shrews: Giarla et al., 2015, Syst. Bio. Frogs: Yuan et al., 2016, Syst. Bio. Tomatoes: Pease et al., 2016, PLoS Bio. ASTRAL-II: [Mirarab and Warnow, 2015, Bioinformatic] Angiosperms: Huang et al., 2016, MBE Worms: Andrade et al., 2015, MBE
20 Going beyond the topology [Sayyari and Mirarab, Molecular Biology & Evolution, 2016] Branch length (BL): Erfan Sayyari ASTRAL did not estimate branch length We added branch length estimation in coalescent units (#generations/population size) only for internal branches 9
21 Going beyond the topology [Sayyari and Mirarab, Molecular Biology & Evolution, 2016] Branch length (BL): Erfan Sayyari ASTRAL did not estimate branch length We added branch length estimation in coalescent units (#generations/population size) only for internal branches Branch support: how reliable is a branch? ASTRAL relied on bootstrapping We added a native Bayesian support 9
22 Branch Length [Sayyari and Mirarab, MBE, 2016] Simply a function of the level of discordance d=0.8 1 =1 2 3 e d θ 1 =70% θ 2 =15% θ 3 =15% 10
23 Branch Length [Sayyari and Mirarab, MBE, 2016] Simply a function of the level of discordance A single quartet (n=4): reverse the discordance formula to get the ML estimate d=0.8 1 =1 2 3 e d θ 1 =70% θ 2 =15% θ 3 =15% d =0.67 ln 3 2 (1 ˆ 1 ) m 1 = 132 θ 1=66% m 2 = 32 m 3 = 36 θ 2=16% θ 3=18% 10
24 Branch length for n>4 Simply average all quartet frequencies around that branch a d Justified given some b 1 =1 2 3 e d e assumptions c f h g 11
25 Branch length for n>4 Simply average all quartet frequencies around that branch a d Justified given some b 1 =1 2 3 e d e assumptions Can be done efficiently in Θ(n 2 m) for all c f branches for n species and m genes h g 11
26 Branch length accuracy estimated estimated branch branch length length (log (log scale) True gene trees true branch length (log scale) With true gene trees, ASTRAL correctly estimates BL 12
27 Branch length accuracy estimated estimated branch branch length length (log (log scale) low gene tree error Moderate g.t. error True gene trees Medium g.t. error true branch length (log scale) 12 High gene tree error true branch length (log scale) With error-prone With true estimated gene trees, gene ASTRAL trees, correctly ASTRAL estimates underestimates BL BL
28 Branch support (common practice) Multi-locus bootstrapping (MLBS) Slow: requires bootstrapping all genes (e.g., 100m ML trees) Inaccurate and hard to interpret [Mirarab et al., Sys bio, 2014; Bayzid et al., PLoS One, 2015] Correct branches (percentage) [Mirarab et al., Sys bio, 2014] 13
29 Branch support idea: n=4 Recall quartet frequencies follow a multinomial distribution m = 200 m 1 = 80 m 2 = 63 m 3 = 57 θ 1 θ 2 θ 3 P ( topology seen in m 1 / m gene trees is the species tree ) = P ( θ 1 > 1/3 ) = P ( a 3-sided coin tossed m times is biased towards the side that shows up m 1 times) 14
30 Branch support idea: n=4 Recall quartet frequencies follow a multinomial distribution m = 200 m 1 = 80 m 2 = 63 m 3 = 57 θ 1 P ( topology seen in m 1 / m gene trees is the species tree ) = P ( θ 1 > 1/3 ) = P ( a 3-sided coin tossed m times is biased towards the side that shows up m 1 times) Can be analytically solved θ 2 θ 3 14
31 Posterior Prior: Yule process become conjugate Fast to calculate Depends on the frequency of not just the first topology, but also the frequency of second and third topologies 15
32 Conjugate prior All three topologies have equally prior Pr( 1 > 1 3 )=Pr( 2 > 1 3 )=Pr( 3 > 1 3 )=1 3 The species tree generated through a birth-only (Yule) process with rate λ Turns out to be the conjugate prior (default) λ =0.5 uniformly distributed branch lengths 16
33 Quartet support v.s. posterior quartet frequency (θ 1 ) Increased number of genes (m) increased support Decreased discordance increased support 17
34 How about n>4? Locality Assumption: All four clusters around a branch are correct a C 1 =n 1 C 3 =n 3 d Treat branches independently b e c f C 2 =n 2 C 4 =n 4 h g k=n 1 n 2 n 3 n 4 18
35 How about n>4? Locality Assumption: All four clusters around a branch are correct a C 1 =n 1 C 3 =n 3 d Treat branches independently b e k quartets around a branch? Independence assumption is too liberal (m k tosses of the coin) c C 2 =n 2 C 4 =n 4 h g f Fully dependent assumption: all quartets give noisy estimates of a single hidden true frequency Simply average their frequencies k=n 1 n 2 n 3 n 4 18
36 Simulation studies Our simulations violate our assumptions Estimated gene trees instead of true gene trees Estimated species trees: the locality assumption can be violated Measuring the support accuracy: the number of false positive and false negatives above various thresholds of support True (model) species tree True gene trees Sequence data Finch Falcon Owl Eagle Pigeon Finch Owl Falcon Eagle Pigeon Es mated species tree Es mated gene trees 19
37 localpp is more accurate than bootstrapping 1.00 MLBS Local PP Recall X faster False Positive Rate Avian simulated dataset (48 taxa, 1000 genes) [Sayyari and Mirarab, MBE, 2016] 20
38 High precision and recall at high A support B B Downloaded from by guest on May 28, 2016 Downloaded from by guest on May 28, 2016 valuation of local PP on the A-200 dataset with ASTRAL species trees. See supplementary figures S2 S4, Supplementary Material online for ecies trees. (A) Precision FIG. 3. and Evaluation recall of branches of local withpp local on PPthe above A-200 a threshold dataset ranging with fromastral 0.9 to 1.0 using species estimated trees. gene See trees supplementary (solid) or figures S2 S4, Supplementary Material online for e trees (dotted). other (B) ROCspecies curve (recall trees. vs. FPR) (A) for Precision varying thresholds and recall (figure of201-taxon branches trimmed at 0.4 with FPR). local Columns datasets PP above show different a threshold levels (simphy) ofranging ILS. from 0.9 to 1.0 using estimated gene trees (solid) or e observed genetrue treegene discordance trees (dotted). and branch(b) lengths ROC curve 81% (recall for thevs. 1,500 FPR) bpfor model varying condition thresholds to 69% (figure for 250trimmed bp at 0.4 FPR). Columns show different levels of ILS. nction of observed discordance. (supplementary table S1 and figs. S5 and S6, Supplementary Material online). Precision is at least 99.8% 21for the 0.95 threshold, and the recall is between 71.5% and 84.7%, depending on
39 High precision and recall at high A support B Downloaded from B by guest on May 28, 2016 FIG. 3. Evaluation of local PP on the A-200 dataset with ASTRAL species trees. See supplementary figures S2 S4, Supplementary Material online for other species trees. (A) Precision and recall of201-taxon branches with localdatasets PP above a threshold (simphy) ranging from 0.9 to 1.0 using estimated gene trees (solid) or true gene trees (dotted). (B) ROC curve (recall vs. FPR) for varying thresholds (figure trimmed at 0.4 FPR). Columns show different levels of ILS. 21
40 High precision and recall at high A support B 201-taxon datasets (simphy) 21
41 Summary Both branch length and support can be computed quickly a function of the observed amount of gene tree discordance support is also a function of the number of genes Local posterior probability outperforms bootstrapping Requires strong assumptions (to be relaxed in future) Branch length accuracy depends on the gene tree accuracy All available at 22
42 Tandy Warnow Erfan Sayyari
43 Results (A200) 24
44 Results (A200) recall (the percentage of all true branches that have support s), false positive rate (FPR) (the percentage of all false branches that have support s). 24
45 Results (A200) recall (the percentage of all true branches that have support s), false positive rate (FPR) (the percentage of all false branches that have support s). Recall above threshold Low ILS False Positive Rate # genes True gene tree Estimated gene tree 24
46 Results (A200) recall (the percentage of all true branches that have support s), false positive rate (FPR) (the percentage of all false branches that have support s). 24
47 Results (A200) recall (the percentage of all true branches that have support s), false positive rate (FPR) (the percentage of all false branches that have support s). Recall above threshold Low ILS Med ILS High ILS False Positive Rate # genes True gene tree Estimated gene tree 24
48 MLBS Procedure 25
49 MLBS Procedure First bootstrap each gene 25
50 MLBS Procedure First bootstrap each gene Alignments gene 1 gene 2 gene k 25
51 MLBS Procedure First bootstrap each gene Alignments gene 1 gene 2 gene k Replicate M Replicate 1 gene k gene 2 gene 1 gene k gene 2 gene 1 25
52 MLBS Procedure First bootstrap each gene Alignments gene 1 gene 2 gene k Replicate M Replicate 1 gene k gene 2 gene 1 gene k gene 2 gene 1 gene K gene 2 gene 1 gene K gene 2 gene 1 Gene tree estimation Gene tree estimation 25
53 MLBS Procedure First bootstrap each gene Gene tree estimation Alignments gene 1 gene 2 gene k Replicate M Replicate 1 gene k gene 2 gene 1 gene k gene 2 gene 1 gene K gene 2 gene 1 gene K gene 2 gene 1 Gene tree estimation 25
54 MLBS Procedure First bootstrap each gene Gene tree estimation Alignments gene 1 gene 2 gene k Replicate M Replicate 1 gene k gene 2 gene 1 gene k gene 2 gene 1 gene K gene 2 gene 1 gene K gene 2 gene 1 Gene tree estimation Q 25
55 MLBS Procedure First bootstrap each gene Gene tree estimation Alignments gene 1 gene 2 gene k Replicate M Replicate 1 gene k gene 2 gene 1 gene k gene 2 gene 1 gene K gene 2 gene 1 gene K gene 2 gene 1 Gene tree estimation Count how many times Q appeared Q Count how many times Q appeared 25
ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support
ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support Siavash Mirarab University of California, San Diego Joint work with Tandy Warnow Erfan Sayyari
More informationReconstruction of species trees from gene trees using ASTRAL. Siavash Mirarab University of California, San Diego (ECE)
Reconstruction of species trees from gene trees using ASTRAL Siavash Mirarab University of California, San Diego (ECE) Phylogenomics Orangutan Chimpanzee gene 1 gene 2 gene 999 gene 1000 Gorilla Human
More informationUpcoming challenges in phylogenomics. Siavash Mirarab University of California, San Diego
Upcoming challenges in phylogenomics Siavash Mirarab University of California, San Diego Gene tree discordance The species tree gene1000 Causes of gene tree discordance include: Incomplete Lineage Sorting
More informationConstruc)ng the Tree of Life: Divide-and-Conquer! Tandy Warnow University of Illinois at Urbana-Champaign
Construc)ng the Tree of Life: Divide-and-Conquer! Tandy Warnow University of Illinois at Urbana-Champaign Phylogeny (evolutionary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the Life Website,
More informationfirst (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees
Concatenation Analyses in the Presence of Incomplete Lineage Sorting May 22, 2015 Tree of Life Tandy Warnow Warnow T. Concatenation Analyses in the Presence of Incomplete Lineage Sorting.. 2015 May 22.
More informationAnatomy of a species tree
Anatomy of a species tree T 1 Size of current and ancestral Populations (N) N Confidence in branches of species tree t/2n = 1 coalescent unit T 2 Branch lengths and divergence times of species & populations
More informationNJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees
NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana
More informationTaming the Beast Workshop
Workshop and Chi Zhang June 28, 2016 1 / 19 Species tree Species tree the phylogeny representing the relationships among a group of species Figure adapted from [Rogers and Gibbs, 2014] Gene tree the phylogeny
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the
More informationConstrained Exact Op1miza1on in Phylogene1cs. Tandy Warnow The University of Illinois at Urbana-Champaign
Constrained Exact Op1miza1on in Phylogene1cs Tandy Warnow The University of Illinois at Urbana-Champaign Phylogeny (evolu1onary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the Life Website,
More informationUsing phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)
Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures
More informationEstimating Evolutionary Trees. Phylogenetic Methods
Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent
More informationNew methods for es-ma-ng species trees from genome-scale data. Tandy Warnow The University of Illinois
New methods for es-ma-ng species trees from genome-scale data Tandy Warnow The University of Illinois Phylogeny (evolu9onary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the Life Website,
More informationFrom Gene Trees to Species Trees. Tandy Warnow The University of Texas at Aus<n
From Gene Trees to Species Trees Tandy Warnow The University of Texas at Aus
More informationToday's project. Test input data Six alignments (from six independent markers) of Curcuma species
DNA sequences II Analyses of multiple sequence data datasets, incongruence tests, gene trees vs. species tree reconstruction, networks, detection of hybrid species DNA sequences II Test of congruence of
More informationMul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu
Mul$ple Sequence Alignment Methods Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu Species Tree Orangutan Gorilla Chimpanzee Human From the Tree of the Life
More informationQuartet Inference from SNP Data Under the Coalescent Model
Bioinformatics Advance Access published August 7, 2014 Quartet Inference from SNP Data Under the Coalescent Model Julia Chifman 1 and Laura Kubatko 2,3 1 Department of Cancer Biology, Wake Forest School
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types
More informationThe Mathema)cs of Es)ma)ng the Tree of Life. Tandy Warnow The University of Illinois
The Mathema)cs of Es)ma)ng the Tree of Life Tandy Warnow The University of Illinois Phylogeny (evolu9onary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the Life Website, University of Arizona
More informationPhylogenetic Geometry
Phylogenetic Geometry Ruth Davidson University of Illinois Urbana-Champaign Department of Mathematics Mathematics and Statistics Seminar Washington State University-Vancouver September 26, 2016 Phylogenies
More informationConcepts and Methods in Molecular Divergence Time Estimation
Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks
More informationJed Chou. April 13, 2015
of of CS598 AGB April 13, 2015 Overview of 1 2 3 4 5 Competing Approaches of Two competing approaches to species tree inference: Summary methods: estimate a tree on each gene alignment then combine gene
More informationWorkshop III: Evolutionary Genomics
Identifying Species Trees from Gene Trees Elizabeth S. Allman University of Alaska IPAM Los Angeles, CA November 17, 2011 Workshop III: Evolutionary Genomics Collaborators The work in today s talk is joint
More informationC3020 Molecular Evolution. Exercises #3: Phylogenetics
C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from
More informationCS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign
CS 581 Algorithmic Computational Genomics Tandy Warnow University of Illinois at Urbana-Champaign Today Explain the course Introduce some of the research in this area Describe some open problems Talk about
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationPhylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.
Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class
More informationSpecies Tree Inference using SVDquartets
Species Tree Inference using SVDquartets Laura Kubatko and Dave Swofford May 19, 2015 Laura Kubatko SVDquartets May 19, 2015 1 / 11 SVDquartets In this tutorial, we ll discuss several different data types:
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationPhyloNet. Yun Yu. Department of Computer Science Bioinformatics Group Rice University
PhyloNet Yun Yu Department of Computer Science Bioinformatics Group Rice University yy9@rice.edu Symposium And Software School 2016 The University Of Texas At Austin Installation System requirement: Java
More informationCS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign
CS 581 Algorithmic Computational Genomics Tandy Warnow University of Illinois at Urbana-Champaign Course Staff Professor Tandy Warnow Office hours Tuesdays after class (2-3 PM) in Siebel 3235 Email address:
More informationElements of Bioinformatics 14F01 TP5 -Phylogenetic analysis
Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila
More informationInferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting
arxiv:1509.06075v3 [q-bio.pe] 12 Feb 2016 Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting Claudia Solís-Lemus 1 and Cécile Ané 1,2 1 Department of Statistics,
More informationarxiv: v1 [math.st] 22 Jun 2018
Hypothesis testing near singularities and boundaries arxiv:1806.08458v1 [math.st] Jun 018 Jonathan D. Mitchell, Elizabeth S. Allman, and John A. Rhodes Department of Mathematics & Statistics University
More informationIn comparisons of genomic sequences from multiple species, Challenges in Species Tree Estimation Under the Multispecies Coalescent Model REVIEW
REVIEW Challenges in Species Tree Estimation Under the Multispecies Coalescent Model Bo Xu* and Ziheng Yang*,,1 *Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China and Department
More informationImpact of recurrent gene duplication on adaptation of plant genomes
Impact of recurrent gene duplication on adaptation of plant genomes Iris Fischer, Jacques Dainat, Vincent Ranwez, Sylvain Glémin, Jacques David, Jean-François Dufayard, Nathalie Chantret Plant Genomes
More informationThe impact of missing data on species tree estimation
MBE Advance Access published November 2, 215 The impact of missing data on species tree estimation Zhenxiang Xi, 1 Liang Liu, 2,3 and Charles C. Davis*,1 1 Department of Organismic and Evolutionary Biology,
More informationBioinformatics tools for phylogeny and visualization. Yanbin Yin
Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and
More informationOn the variance of internode distance under the multispecies coalescent
On the variance of internode distance under the multispecies coalescent Sébastien Roch 1[0000 000 7608 8550 Department of Mathematics University of Wisconsin Madison Madison, WI 53706 roch@math.wisc.edu
More informationMany of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationGenome-scale Es-ma-on of the Tree of Life. Tandy Warnow The University of Illinois
Genome-scale Es-ma-on of the Tree of Life Tandy Warnow The University of Illinois Phylogeny (evolu9onary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the Life Website, University of Arizona
More informationGenome-scale Es-ma-on of the Tree of Life. Tandy Warnow The University of Illinois
WABI 2017 Factsheet 55 proceedings paper submissions, 27 accepted (49% acceptance rate) 9 additional poster-only submissions (+8 posters accompanying papers) 56 PC members Each paper reviewed by at least
More informationComparative Methods on Phylogenetic Networks
Comparative Methods on Phylogenetic Networks Claudia Solís-Lemus Emory University Joint Statistical Meetings August 1, 2018 Phylogenetic What? Networks Why? Part I Part II How? When? PCM What? Phylogenetic
More informationIncomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1-2010 Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci Elchanan Mossel University of
More informationPerformance Evaluation
Performance Evaluation Confusion Matrix: Detected Positive Negative Actual Positive A: True Positive B: False Negative Negative C: False Positive D: True Negative Recall or Sensitivity or True Positive
More informationQ1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.
OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall
More informationPhylogenetics in the Age of Genomics: Prospects and Challenges
Phylogenetics in the Age of Genomics: Prospects and Challenges Antonis Rokas Department of Biological Sciences, Vanderbilt University http://as.vanderbilt.edu/rokaslab http://pubmed2wordle.appspot.com/
More informationWenEtAl-biorxiv 2017/12/21 10:55 page 2 #2
WenEtAl-biorxiv 0// 0: page # Inferring Phylogenetic Networks Using PhyloNet Dingqiao Wen, Yun Yu, Jiafan Zhu, Luay Nakhleh,, Computer Science, Rice University, Houston, TX, USA; BioSciences, Rice University,
More informationPhylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign
Phylogenomics, Multiple Sequence Alignment, and Metagenomics Tandy Warnow University of Illinois at Urbana-Champaign Phylogeny (evolutionary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationPoint of View. Why Concatenation Fails Near the Anomaly Zone
Point of View Syst. Biol. 67():58 69, 208 The Author(s) 207. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email:
More informationAn Investigation of Phylogenetic Likelihood Methods
An Investigation of Phylogenetic Likelihood Methods Tiffani L. Williams and Bernard M.E. Moret Department of Computer Science University of New Mexico Albuquerque, NM 87131-1386 Email: tlw,moret @cs.unm.edu
More informationA (short) introduction to phylogenetics
A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field
More informationSystematics - Bio 615
Bayesian Phylogenetic Inference 1. Introduction, history 2. Advantages over ML 3. Bayes Rule 4. The Priors 5. Marginal vs Joint estimation 6. MCMC Derek S. Sikes University of Alaska 7. Posteriors vs Bootstrap
More informationUsing Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics
Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics Tandy Warnow Founder Professor of Engineering The University of Illinois at Urbana-Champaign http://tandy.cs.illinois.edu
More informationEvaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets
Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets Xiaofan Zhou, 1,2 Xing-Xing Shen, 3 Chris Todd Hittinger, 4 and Antonis Rokas*,3 1 Integrative Microbiology
More informationEstimating phylogenetic trees from genome-scale data
Ann. N.Y. Acad. Sci. ISSN 0077-8923 ANNALS OF THE NEW YORK ACADEMY OF SCIENCES Issue: The Year in Evolutionary Biology Estimating phylogenetic trees from genome-scale data Liang Liu, 1,2 Zhenxiang Xi,
More informationUnderstanding How Stochasticity Impacts Reconstructions of Recent Species Divergent History. Huateng Huang
Understanding How Stochasticity Impacts Reconstructions of Recent Species Divergent History by Huateng Huang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor
More informationPhylogenetics: Building Phylogenetic Trees
1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should
More informationSTEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)
STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University kubatko.2@osu.edu
More informationPhylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline
Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying
More informationTo link to this article: DOI: / URL:
This article was downloaded by:[ohio State University Libraries] [Ohio State University Libraries] On: 22 February 2007 Access Details: [subscription number 731699053] Publisher: Taylor & Francis Informa
More informationPhylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University
Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary
More informationSupplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles
Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To
More informationEstimating phylogenetic trees from genome-scale data
Estimating phylogenetic trees from genome-scale data Liang Liu 1,2, Zhenxiang Xi 3, Shaoyuan Wu 4, Charles Davis 3, and Scott V. Edwards 4* 1 Department of Statistics, University of Georgia, Athens, GA
More informationPhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence
PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence Are directed quartets the key for more reliable supertrees? Patrick Kück Department of Life Science, Vertebrates Division,
More informationAn Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times
Syst. Biol. 67(1):61 77, 2018 The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. This is an Open Access article distributed under the terms of
More informationPhylogenomics of closely related species and individuals
Phylogenomics of closely related species and individuals Matthew Rasmussen Siepel lab, Cornell University In collaboration with Manolis Kellis, MIT CSAIL February, 2013 Short time scales 1kyr-1myrs Long
More informationPhylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X
More informationDNA-based species delimitation
DNA-based species delimitation Phylogenetic species concept based on tree topologies Ø How to set species boundaries? Ø Automatic species delimitation? druhů? DNA barcoding Species boundaries recognized
More informationPhylogenetic Networks, Trees, and Clusters
Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University
More informationProperties of Consensus Methods for Inferring Species Trees from Gene Trees
Syst. Biol. 58(1):35 54, 2009 Copyright c Society of Systematic Biologists DOI:10.1093/sysbio/syp008 Properties of Consensus Methods for Inferring Species Trees from Gene Trees JAMES H. DEGNAN 1,4,,MICHAEL
More informationLearning Outbreak Regions in Bayesian Spatial Scan Statistics
Maxim Makatchev Daniel B. Neill Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213 USA maxim.makatchev@cs.cmu.edu neill@cs.cmu.edu Abstract The problem of anomaly detection for biosurveillance
More informationMethods to reconstruct phylogene1c networks accoun1ng for ILS
Methods to reconstruct phylogene1c networks accoun1ng for ILS Céline Scornavacca some slides have been kindly provided by Fabio Pardi ISE-M, Equipe Phylogénie & Evolu1on Moléculaires Montpellier, France
More informationThe statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection
The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection Mark T. Holder and Jordan M. Koch Department of Ecology and Evolutionary Biology, University of
More informationIntegrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley
Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley B.D. Mishler Feb. 7, 2012. Morphological data IV -- ontogeny & structure of plants The last frontier
More informationTechniques for generating phylogenomic data matrices: transcriptomics vs genomics. Rosa Fernández & Marina Marcet-Houben
Techniques for generating phylogenomic data matrices: transcriptomics vs genomics Rosa Fernández & Marina Marcet-Houben DE NOVO Raw reads Sanitize Filter Assemble Translate Reduce reduncancy Download DATABASES
More informationLecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30
Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny
More informationReconstruire le passé biologique modèles, méthodes, performances, limites
Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire
More informationX X (2) X Pr(X = x θ) (3)
Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree
More information9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)
I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by
More informationFine-Scale Phylogenetic Discordance across the House Mouse Genome
Fine-Scale Phylogenetic Discordance across the House Mouse Genome Michael A. White 1,Cécile Ané 2,3, Colin N. Dewey 4,5,6, Bret R. Larget 2,3, Bret A. Payseur 1 * 1 Laboratory of Genetics, University of
More informationWhat is Phylogenetics
What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)
More informationPhylogenetic inference: from sequences to trees
W ESTFÄLISCHE W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT NIVERSITÄT WILHELMS-U ÜNSTER MM ÜNSTER VOLUTIONARY FUNCTIONAL UNCTIONAL GENOMICS ENOMICS EVOLUTIONARY Bioinformatics 1 Phylogenetic inference: from sequences
More informationIsolating - A New Resampling Method for Gene Order Data
Isolating - A New Resampling Method for Gene Order Data Jian Shi, William Arndt, Fei Hu and Jijun Tang Abstract The purpose of using resampling methods on phylogenetic data is to estimate the confidence
More informationPhylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center
Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods
More informationSupplementary Materials for
advances.sciencemag.org/cgi/content/full/1/8/e1500527/dc1 Supplementary Materials for A phylogenomic data-driven exploration of viral origins and evolution The PDF file includes: Arshan Nasir and Gustavo
More informationECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu
ECE521 W17 Tutorial 6 Min Bai and Yuhuai (Tony) Wu Agenda knn and PCA Bayesian Inference k-means Technique for clustering Unsupervised pattern and grouping discovery Class prediction Outlier detection
More informationA phylogenomic toolbox for assembling the tree of life
A phylogenomic toolbox for assembling the tree of life or, The Phylota Project (http://www.phylota.org) UC Davis Mike Sanderson Amy Driskell U Pennsylvania Junhyong Kim Iowa State Oliver Eulenstein David
More informationFrom Genes to Genomes and Beyond: a Computational Approach to Evolutionary Analysis. Kevin J. Liu, Ph.D. Rice University Dept. of Computer Science
From Genes to Genomes and Beyond: a Computational Approach to Evolutionary Analysis Kevin J. Liu, Ph.D. Rice University Dept. of Computer Science!1 Adapted from U.S. Department of Energy Genomic Science
More informationDetection and Polarization of Introgression in a Five-Taxon Phylogeny
Syst. Biol. 64(4):651 662, 2015 The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com
More informationEfficient Bayesian Species Tree Inference under the Multispecies Coalescent
Syst. Biol. 66(5):823 842, 2017 The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com
More informationMisconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants
Syst. Biol. 66(3):399 412, 2017 The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com
More informationSymmetric Tree, ClustalW. Divergence x 0.5 Divergence x 1 Divergence x 2. Alignment length
ONLINE APPENDIX Talavera, G., and Castresana, J. (). Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Systematic Biology, -. Symmetric
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationBayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies
Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development
More informationIntegrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley
Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;
More informationA Phylogenetic Network Construction due to Constrained Recombination
A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer
More informationRecent Advances in Phylogeny Reconstruction
Recent Advances in Phylogeny Reconstruction from Gene-Order Data Bernard M.E. Moret Department of Computer Science University of New Mexico Albuquerque, NM 87131 Department Colloqium p.1/41 Collaborators
More informationGene Tree Parsimony for Incomplete Gene Trees
Gene Tree Parsimony for Incomplete Gene Trees Md. Shamsuzzoha Bayzid 1 and Tandy Warnow 2 1 Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
More information