From Gene Trees to Species Trees. Tandy Warnow The University of Texas at Aus<n

Similar documents
394C, October 2, Topics: Mul9ple Sequence Alignment Es9ma9ng Species Trees from Gene Trees

CS 581 / BIOE 540: Algorithmic Computa<onal Genomics. Tandy Warnow Departments of Bioengineering and Computer Science hhp://tandy.cs.illinois.

CS 394C Algorithms for Computational Biology. Tandy Warnow Spring 2012

Construc)ng the Tree of Life: Divide-and-Conquer! Tandy Warnow University of Illinois at Urbana-Champaign

The Mathema)cs of Es)ma)ng the Tree of Life. Tandy Warnow The University of Illinois

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu

New methods for es-ma-ng species trees from genome-scale data. Tandy Warnow The University of Illinois

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Genome-scale Es-ma-on of the Tree of Life. Tandy Warnow The University of Illinois

Genome-scale Es-ma-on of the Tree of Life. Tandy Warnow The University of Illinois

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics

Anatomy of a species tree

Fast coalescent-based branch support using local quartet frequencies

Reconstruction of species trees from gene trees using ASTRAL. Siavash Mirarab University of California, San Diego (ECE)

CS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign

The impact of missing data on species tree estimation

Constrained Exact Op1miza1on in Phylogene1cs. Tandy Warnow The University of Illinois at Urbana-Champaign

Phylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species

SEPP and TIPP for metagenomic analysis. Tandy Warnow Department of Computer Science University of Texas

SEPP and TIPP for metagenomic analysis. Tandy Warnow Department of Computer Science University of Texas

Evolu&on, Popula&on Gene&cs, and Natural Selec&on Computa.onal Genomics Seyoung Kim

Ultra- large Mul,ple Sequence Alignment

Phylogenetics in the Age of Genomics: Prospects and Challenges

CS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign

SNPs versus sequences for phylogeography an explora:on using simula:ons and massively parallel sequencing in a non- model bird

Methods to reconstruct phylogene1c networks accoun1ng for ILS

CSCI1950 Z Computa3onal Methods for Biology* (*Working Title) Lecture 1. Ben Raphael January 21, Course Par3culars

Phylogenetic Geometry

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Upcoming challenges in phylogenomics. Siavash Mirarab University of California, San Diego

Estimating phylogenetic trees from genome-scale data

Taming the Beast Workshop

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

Phylogenetic inference

Phylogeny Fig Overview: Inves8ga8ng the Tree of Life Phylogeny Systema8cs

Computational Challenges in Constructing the Tree of Life

In comparisons of genomic sequences from multiple species, Challenges in Species Tree Estimation Under the Multispecies Coalescent Model REVIEW

Estimating phylogenetic trees from genome-scale data

Analysis of a Rapid Evolutionary Radiation Using Ultraconserved Elements: Evidence for a Bias in Some Multispecies Coalescent Methods

To link to this article: DOI: / URL:

Bias/variance tradeoff, Model assessment and selec+on

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

From Genes to Genomes and Beyond: a Computational Approach to Evolutionary Analysis. Kevin J. Liu, Ph.D. Rice University Dept. of Computer Science

Priors in Dependency network learning

Quartet Inference from SNP Data Under the Coalescent Model

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

TheDisk-Covering MethodforTree Reconstruction

Ch. 26 Phylogeny BIOL 221

Gene Tree Parsimony for Incomplete Gene Trees

Classifica(on and predic(on omics style. Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University

A phylogenomic toolbox for assembling the tree of life

Contentious relationships in phylogenomic studies can be driven by a handful of genes

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Point of View. Why Concatenation Fails Near the Anomaly Zone

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

WenEtAl-biorxiv 2017/12/21 10:55 page 2 #2

Ensemble of Climate Models

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #3 Machine Learning. Edward Chang

Least Squares Parameter Es.ma.on

Resolving Deep Nodes in an Ancient Radiation of Neotropical Fishes in the. Presence of Conflicting Signals from Incomplete Lineage Sorting

A Phylogenetic Network Construction due to Constrained Recombination

Test of neutral theory predic3ons for the BCI tree community informed by regional abundance data

Fine-Scale Phylogenetic Discordance across the House Mouse Genome

Properties of Consensus Methods for Inferring Species Trees from Gene Trees

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

On the variance of internode distance under the multispecies coalescent

Smith et al. American Journal of Botany 98(3): Data Supplement S2 page 1

Workshop III: Evolutionary Genomics

Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Techniques for generating phylogenomic data matrices: transcriptomics vs genomics. Rosa Fernández & Marina Marcet-Houben

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

Dr. Amira A. AL-Hosary

CSCI1950 Z Computa3onal Methods for Biology Lecture 24. Ben Raphael April 29, hgp://cs.brown.edu/courses/csci1950 z/ Network Mo3fs

Inferring Phylogenies from RAD Sequence Data

Crowdsourcing Mul/- Label Classifica/on. Jonathan Bragg University of Washington

Exponen'al growth Limi'ng factors Environmental resistance Carrying capacity logis'c growth curve

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13

An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology. Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University

Inference of Gene Tree Discordance and Recombination. Yujin Chung. Doctor of Philosophy (Statistics)

Species Tree Inference using SVDquartets

Bayesian Estimation of Concordance among Gene Trees

Outline. What is Machine Learning? Why Machine Learning? 9/29/08. Machine Learning Approaches to Biological Research: Bioimage Informa>cs and Beyond

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Points of View Matrix Representation with Parsimony, Taxonomic Congruence, and Total Evidence

BIOINFORMATICS. Scaling Up Accurate Phylogenetic Reconstruction from Gene-Order Data. Jijun Tang 1 and Bernard M.E. Moret 1

Quan&fying Uncertainty. Sai Ravela Massachuse7s Ins&tute of Technology

Species Trees from Highly Incongruent Gene Trees in Rice

Experimental Designs for Planning Efficient Accelerated Life Tests

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

Scientific ethics, tree testing, Open Tree of Life. Workshop on Molecular Evolution Marine Biological Lab, Woods Hole, MA.

CS 6140: Machine Learning Spring 2016

Coalescent Histories on Phylogenetic Networks and Detection of Hybridization Despite Incomplete Lineage Sorting

Transcription:

From Gene Trees to Species Trees Tandy Warnow The University of Texas at Aus<n

Avian Phylogenomics Project Erich Jarvis, HHMI MTP Gilbert, Copenhagen G Zhang, BGI T. Warnow UT- Aus<n S. Mirarab Md. S. Bayzid, UT- Aus<n UT- Aus<n Plus many many other people Approx. 50 species, whole genomes 8000+ genes, UCEs Gene sequence alignments and trees computed using SATé (Liu et al., Science 2009 and Systema<c Biology 2012) Challenges: Maximum likelihood on mul3- million- site sequence alignments Massive gene tree incongruence

Phylogeny (evolu<onary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the Life Website, University of Arizona

(Phylogenetic estimation from whole genomes)

Using mul<ple genes S 1 S 2 S 3 S 4 gene 1" TCTAATGGAA" GCTAAGGGAA" TCTAAGGGAA" TCTAACGGAA" S 4 gene 2! GGTAACCCTC! S 1 S 3 S 4 gene 3! TATTGATACA" TCTTGATACC" TAGTGATGCA" S 7 S 8 TCTAATGGAC" TATAACGGAA" S 5 S 6 S 7 GCTAAACCTC! GGTGACCATC! GCTAAACCTC! S 7 S 8 TAGTGATGCA" CATTCATACC"

Concatena<on gene 1" gene 2! gene 3! S 1 S 2 S 3 S 4 S 5 TCTAATGGAA" GCTAAGGGAA" TCTAAGGGAA" TCTAACGGAA" GGTAACCCTC! TAGTGATGCA"?????????? "?????????? " GCTAAACCTC! TATTGATACA"?????????? "??????????"?????????? " TCTTGATACC"??????????" S 6??????????" GGTGACCATC!??????????" S 7 TCTAATGGAC" GCTAAACCTC! TAGTGATGCA" S 8 TATAACGGAA"?????????? " CATTCATACC"

Red gene tree species tree (green gene tree okay)

Gene Tree Incongruence Gene trees can differ from the species tree due to: Duplica<on and loss Horizontal gene transfer Incomplete lineage sor<ng (ILS)

Lineage Sor<ng Popula<on- level process, also called the Mul<- species coalescent (Kingman, 1982) Gene trees can differ from species trees due to short <mes between specia<on events or large popula<on size; this is called Incomplete Lineage Sor<ng or Deep Coalescence.

The Coalescent Courtesy James Degnan Past Present

Gene tree in a species tree Courtesy James Degnan

Species tree es<ma<on: difficult, even for small datasets! Orangutan Gorilla Chimpanzee Human From the Tree of the Life Website, University of Arizona

Incomplete Lineage Sor<ng (ILS) 2000+ papers in 2013 alone Confounds phylogene<c analysis for many groups: Hominids Birds Yeast Animals Toads Fish Fungi There is substan<al debate about how to analyze phylogenomic datasets in the presence of ILS.

Two compe<ng approaches Species gene 1 gene 2... gene k"..." Concatenation" Analyze" separately"..." Summary Method"

How to compute a species tree?..."

How to compute a species tree?..." Techniques: MDC? Most frequent gene tree? Consensus of gene trees? Other?

Key observa<on: Under the mul<- species coalescent model, the species tree defines a probability distribu.on on the gene trees Courtesy James Degnan

Sta<s<cal Consistency error Data

Sta<s<cal Consistency error Data Data are gene trees, presumed to be randomly sampled true gene trees.

Sta<s<cally consistent under ILS? MP- EST (Liu et al. 2010): maximum likelihood es<ma<on of rooted species tree YES BUCKy- pop (Ané and Larget 2010): quartet- based Bayesian species tree es<ma<on YES MDC NO Greedy NO Concatena<on under maximum likelihood open MRP (supertree method) open

Two compe<ng approaches Species gene 1 gene 2... gene k"..." Concatenation" Analyze" separately"..." Summary Method"

The Debate: Concatena<on vs. Coalescent Es<ma<on In favor of coalescent- based es<ma<on Sta<s<cal consistency guarantees Addresses gene tree incongruence resul<ng from ILS Some evidence that concatena<on can be posi<vely misleading In favor of concatena<on Reasonable results on data High bootstrap support Summary methods (that combine gene trees) can have poor support or miss well- established clades en<rely Some methods (such as *BEAST) are computa<onally too intensive to use

Is Concatena<on Evil? Joseph Heled: YES John Gatesy No Data needed to held understand exis<ng methods and their limita<ons Berer methods are needed

Results on 11- taxon datasets with weak ILS 0.25 0.2 Average FN rate 0.15 0.1 0.05 *BEAST CA ML BUCKy con BUCKy pop MP EST Phylo exact MRP GC 0 5 genes 10 genes 25 genes 50 genes *BEAST more accurate than summary methods (MP- EST, BUCKy, etc) CA- ML: (concatenated analysis) most accurate Datasets from Chung and Ané, 2011 Bayzid & Warnow, Bioinforma<cs 2013

Results on 11- taxon datasets with strongils 0.5 0.4 Average FN rate 0.3 0.2 0.1 *BEAST CA ML BUCKy con BUCKy pop MP EST Phylo exact MRP GC 0 5 genes 10 genes 25 genes 50 genes *BEAST more accurate than summary methods (MP- EST, BUCKy, etc) CA- ML: (concatenated analysis) also very accurate Datasets from Chung and Ané, 2011 Bayzid & Warnow, Bioinforma<cs 2013

Gene Tree Es<ma<on: *BEAST vs. Maximum Likelihood 0.5 0.5 0.4 0.4 Average FN rate 0.3 0.2 5 genes 10 genes 25 genes 50 genes Average FN rate 0.3 0.2 8 genes 32 genes 0.1 0.1 0 *BEAST FastTree RAxML 0 *BEAST FastTree RAxML 11- taxon weakils datasets 17- taxon (very high ILS) datasets *BEAST produces more accurate gene trees than ML on gene sequence alignments 11- taxon datasets from Chung and Ané, Syst Biol 2012 17- taxon datasets from Yu, Warnow, and Nakhleh, JCB 2011

Impact of Gene Tree Es<ma<on Error on MP- EST 0.25 0.2 Average FN rate 0.15 0.1 true estimated 0.05 0 MP EST MP- EST has no error on true gene trees, but MP- EST has 9% error on es<mated gene trees Datasets: 11- taxon strongils condi<ons with 50 genes Similar results for other summary methods (MDC, Greedy, etc.).

Problem: poor gene trees Summary methods combine es<mated gene trees, not true gene trees. The individual gene sequence alignments in the 11- taxon datasets have poor phylogene<c signal, and result in poorly es<mated gene trees. Species trees obtained by combining poorly es<mated gene trees have poor accuracy.

Problem: poor gene trees Summary methods combine es<mated gene trees, not true gene trees. The individual gene sequence alignments in the 11- taxon datasets have poor phylogene<c signal, and result in poorly es<mated gene trees. Species trees obtained by combining poorly es<mated gene trees have poor accuracy.

Problem: poor gene trees Summary methods combine es<mated gene trees, not true gene trees. The individual gene sequence alignments in the 11- taxon datasets have poor phylogene<c signal, and result in poorly es<mated gene trees. Species trees obtained by combining poorly es<mated gene trees have poor accuracy.

TYPICAL PHYLOGENOMICS PROBLEM: many poor gene trees Summary methods combine es<mated gene trees, not true gene trees. The individual gene sequence alignments in the 11- taxon datasets have poor phylogene<c signal, and result in poorly es<mated gene trees. Species trees obtained by combining poorly es<mated gene trees have poor accuracy.

Addressing gene tree es<ma<on error Get berer es<mates of the gene trees Restrict to subset of es<mated gene trees Model error in the es<mated gene trees Modify gene trees to reduce error Bin- and- conquer

Bin- and- Conquer? 1. Assign genes to bins, crea<ng supergene alignments 2. Es<mate trees on each supergene alignment using maximum likelihood 3. Combine the supergene trees together using a summary method Variants: Naïve binning (Bayzid and Warnow, Bioinforma<cs 2013) Sta<s<cal binning (Mirarab, Bayzid, and Warnow, in prepara<on)

Bin- and- Conquer? 1. Assign genes to bins, crea<ng supergene alignments 2. Es<mate trees on each supergene alignment using maximum likelihood 3. Combine the supergene trees together using a summary method Variants: Naïve binning (Bayzid and Warnow, Bioinforma<cs 2013) Sta<s<cal binning (Mirarab, Bayzid, Boussau, and Warnow, submired)

Avian Phylogenomics Project Approx. 50 species, whole genomes 8000+ genes, UCEs Gene sequence alignments and trees computed using SATé (Liu et al., Science 2009 and Systema<c Biology 2012) Approximately 14,000 gene trees, all with very low support (exons average bootstrap support about 25%, introns about 47%) To concatenate or not to concatenate?

Avian Phylogeny GTRGAMMA Maximum likelihood analysis (RAxML) of 37 million basepair alignment (exons, introns, UCEs) highly resolved tree with near 100% bootstrap support. More than 17 years of compute <me, and used 256 GB. Run at HPC centers. Unbinned MP- EST on 14000+ genes: highly incongruent with the concatenated maximum likelihood analysis, poor bootstrap support. Sta<s<cal binning version of MP- EST on 14000+ gene trees highly resolved tree, largely congruent with the concatenated analysis, good bootstrap support Sta<s<cal binning: faster than concatenated analysis, highly parallelized. Avian Phylogenomics Project, in prepara<on

Avian Phylogeny GTRGAMMA Maximum likelihood analysis (RAxML) of 37 million basepair alignment (exons, introns, UCEs) highly resolved tree with near 100% bootstrap support. More than 17 years of compute <me, and used 256 GB. Run at HPC centers. Unbinned MP- EST on 14000+ genes: highly incongruent with the concatenated maximum likelihood analysis, poor bootstrap support. Sta<s<cal binning version of MP- EST on 14000+ gene trees highly resolved tree, largely congruent with the concatenated analysis, good bootstrap support Sta<s<cal binning: faster than concatenated analysis, highly parallelized. Avian Phylogenomics Project, in prepara<on

Avian Phylogeny GTRGAMMA Maximum likelihood analysis (RAxML) of 37 million basepair alignment (exons, introns, UCEs) highly resolved tree with near 100% bootstrap support. More than 17 years of compute <me, and used 256 GB. Run at HPC centers. Unbinned MP- EST on 14000+ genes: highly incongruent with the concatenated maximum likelihood analysis, poor bootstrap support. Sta<s<cal binning version of MP- EST on 14000+ gene trees highly resolved tree, largely congruent with the concatenated analysis, good bootstrap support Sta<s<cal binning: faster than concatenated analysis, highly parallelized. Avian Phylogenomics Project, in prepara<on

To consider Binning reduces the amount of data (number of gene trees) but can improve the accuracy of individual supergene trees. The response to binning differs between methods. Thus, there is a trade- off between data quan.ty and quality, and not all methods respond the same to the trade- off. We know very lirle about the impact of data error on methods. We do not even have proofs of sta<s<cal consistency in the presence of data error.

Warnow Laboratory PhD students: Siavash Mirarab*, Nam Nguyen, and Md. S. Bayzid** Undergrad: Keerthana Kumar Lab Website: hrp://www.cs.utexas.edu/users/phylo Funding: Guggenheim Founda<on, Packard, NSF, Microsox Research New England, David Bruton Jr. Centennial Professorship, and TACC (Texas Advanced Compu<ng Center) TACC and UTCS computa<onal resources * Supported by HHMI Predoctoral Fellowship ** Supported by Fulbright Founda<on Predoctoral Fellowship

Quan<fying Error FN FN: false negative (missing edge) FP: false positive (incorrect edge) 50% error rate FP

Avian Simula<on 14,000 genes MP- EST: Unbinned ~ 11.1% error Binned ~ 6.6% error Greedy: Unbinned ~ 26.6% error Binned ~ 13.3% error 8250 exon- like genes (27% avg. bootstrap support) 3600 UCE- like genes (37% avg. bootstrap support) 2500 intron- like genes (51% avg. bootstrap support)

Avian Simula<on 14,000 genes MP- EST: Unbinned ~ 11.1% error Binned ~ 6.6% error Greedy: Unbinned ~ 26.6% error Binned ~ 13.3% error 8250 exon- like genes (27% avg. bootstrap support) 3600 UCE- like genes (37% avg. bootstrap support) 2500 intron- like genes (51% avg. bootstrap support)

Avian Phylogeny GTRGAMMA Maximum likelihood analysis (RAxML) of 37 million basepair alignment (exons, introns, UCEs) highly resolved tree with near 100% bootstrap support. More than 17 years of compute <me, and used 256 GB. Run at HPC centers. Unbinned MP- EST on 14000+ genes: highly incongruent with the concatenated maximum likelihood analysis, poor bootstrap support. Sta<s<cal binning version of MP- EST on 14000+ gene trees highly resolved tree, largely congruent with the concatenated analysis, good bootstrap support Sta<s<cal binning: faster than concatenated analysis, highly parallelized. Avian Phylogenomics Project, in prepara<on

Basic Ques<ons Is the model tree iden<fiable? Which es<ma<on methods are sta<s<cally consistent under this model? How much data does the method need to es<mate the model tree correctly (with high probability)? What is the computa<onal complexity of an es<ma<on problem?

Addi<onal Sta<s<cal Ques<ons Trade- off between data quality and quan<ty Impact of data selec<on Impact of data error Performance guarantees on finite data (e.g., predic<on of error rates as a func<on of the input data and method) We need a solid mathema<cal framework for these problems.

Summary DCM1- NJ: an absolute fast converging (afc) method, uses chordal graph theory and probabilis<c analysis of algorithms to prove performance guarantees Binning: species tree es<ma<on from mul<ple genes, can improve coalescent- based species tree es<ma<on methods. New ques<ons in phylogene<c es<ma<on about impact of error in input data.

Metazoa Dataset from Salichos & Rokas - Nature 2013 225 genes and 21 species UnBinned MP- EST compared to Concatena<on using RAxML o Poor bootstrap support o o Substan<al conflict with concatena<on (red is conflict - green/black is congruence) Strongly rejects (Tunicate,Craniate), a subgroup that is strongly supported in the literature [Bourlat, Sarah J., et al Nature 444.7115 (2006); Delsuc, Frédéric, et al. Genesis 46.11 (2008); Singh, Tiratha R., et al. BMC genomics 10.1 (2009): 534.] Concatena<on Unbinned MPEST

RAxML on combined datamatrix Binned MP- EST MP- EST unbinned

Binned vs. unbinned analyses 75%- threshold for binning Number of species: 21 for both Number of genes Unbinned: 225 genes Binned:17 supergenes Gene tree average bootstrap support Unbinned: 47% Binned: 78% Species tree bootstrap support Unbinned: avg 83%, 11 above 75%, 10 above 90% Binned: avg 89%, 15 above 75%, 12 above 90%

Naïve binning vs. unbinned: 50 genes 0.2 Average FN rate 0.15 0.1 0.05 Unbinned Binned (10 bins) 0 CA ML *BEAST BUCKY con BUCKY pop MP EST Phylo exact MRP GC Bayzid and Warnow, Bioinforma<cs 2013 11- taxon strongils datasets with 50 genes, 5 genes per bin

Naïve binning vs. unbinned, 100 genes 0.1 Average FN rate 0.08 0.06 0.04 0.02 Unbinned Binned (20 bins) 0 CA ML *BEAST BUCKy con BUCKy pop MP EST Phylo exact MRP GC *BEAST did not converge on these datasets, even with 150 hours. With binning, it converged in 10 hours.

Naïve binning vs. unbinned: 50 genes 0.2 Unbinned Binned (10 bins) 0.15 0.1 0.05 0 BUCKY con BUCKY pop MP EST Phylo exact MRP GC Bayzid and Warnow, Bioinforma<cs 2013 11- taxon strongils datasets with 50 genes, 5 genes per bin

Avian Simula<on study binned vs. unbinned, and RAxML 1X, 200 genes, 250bp 1X, 200 genes, 500bp 1X, 200 genes, 1000bp 30% 20% 10% Tree Error 0% 30% 1X, 600 genes, Mixed 0.5X, 200 genes, 500bp 2X, 200 genes, 500bp 20% 10% 0% Greedy MRP MP EST RAxML Greedy MRP MP EST RAxML Greedy MRP MP EST RAxML Unbinned Statistical Binning Concatenation

Mammals Simula<on

Avian Simula<on