PhyloNet. Yun Yu. Department of Computer Science Bioinformatics Group Rice University

Similar documents
WenEtAl-biorxiv 2017/12/21 10:55 page 2 #2

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

Maximum Likelihood Inference of Reticulate Evolutionary Histories

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species

Anatomy of a species tree

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

SpeciesNetwork Tutorial

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Estimating Evolutionary Trees. Phylogenetic Methods

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

An introduction to phylogenetic networks

Quartet Inference from SNP Data Under the Coalescent Model

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

reconciling trees Stefanie Hartmann postdoc, Todd Vision s lab University of North Carolina the data

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support

Methods to reconstruct phylogene1c networks accoun1ng for ILS

Intraspecific gene genealogies: trees grafting into networks

Exploring phylogenetic hypotheses via Gibbs sampling on evolutionary networks

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

A (short) introduction to phylogenetics

Phylogeny: traditional and Bayesian approaches

Coalescent Histories on Phylogenetic Networks and Detection of Hybridization Despite Incomplete Lineage Sorting

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection

Fast coalescent-based branch support using local quartet frequencies

Smith et al. American Journal of Botany 98(3): Data Supplement S2 page 1

Taming the Beast Workshop

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Infer relationships among three species: Outgroup:

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

Generating phylogenetic trees with Phylomatic and dendrograms of functional traits in R

Package WGDgc. October 23, 2015

Dr. Amira A. AL-Hosary

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees

Phylogenetic Tree Reconstruction

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Package WGDgc. June 3, 2014

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Constructing Evolutionary/Phylogenetic Trees

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Species Tree Inference by Minimizing Deep Coalescences

Constructing Evolutionary/Phylogenetic Trees

A phylogenomic toolbox for assembling the tree of life

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week:

Concepts and Methods in Molecular Divergence Time Estimation

Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Unsupervised Learning in Spectral Genome Analysis

Consensus Methods. * You are only responsible for the first two

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

An introduction to the picante package

Phylogenetic inference

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Regular networks are determined by their trees

Phylogenetics in the Age of Genomics: Prospects and Challenges

How to read and make phylogenetic trees Zuzana Starostová

Phylogenetic Networks, Trees, and Clusters

Upcoming challenges in phylogenomics. Siavash Mirarab University of California, San Diego

EVOLUTIONARY DISTANCES

G.S. Gilbert, ENVS291 Transition to R vf13 Class 10 Vegan and Picante. Class 10b - Tools for Phylogenetic Ecology

Lecture 11 Friday, October 21, 2011

C.DARWIN ( )

Molecular Evolution & Phylogenetics

Phylogenetic analyses. Kirsi Kostamo

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics


Inference of Parsimonious Species Phylogenies from Multi-locus Data

Learning in Bayesian Networks

Effects of Gap Open and Gap Extension Penalties

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution?

A Phylogenetic Network Construction due to Constrained Recombination

Theory of Evolution Charles Darwin

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

One-minute responses. Nice class{no complaints. Your explanations of ML were very clear. The phylogenetics portion made more sense to me today.

林仲彥. Dec 4,

1. Can we use the CFN model for morphological traits?

How should we organize the diversity of animal life?

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes

Comparative Genomics II

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenetics Todd Vision Spring Some applications. Uncultured microbial diversity

USE OF CLUSTERING TECHNIQUES FOR PROTEIN DOMAIN ANALYSIS

Phylogenetic inference: from sequences to trees

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Multiple Alignment. Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis

User s Manual for. Continuous. (copyright M. Pagel) Mark Pagel School of Animal and Microbial Sciences University of Reading Reading RG6 6AJ UK

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley

Bayesian Models for Phylogenetic Trees

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference

Phylogenetics: Building Phylogenetic Trees

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016

Transcription:

PhyloNet Yun Yu Department of Computer Science Bioinformatics Group Rice University yy9@rice.edu Symposium And Software School 2016 The University Of Texas At Austin

Installation System requirement: Java 1.8 or later Download executable of PhyloNet (latest version 3.6.0): http://bioinfo.cs.rice.edu/phylonet

Basic Usage Command: > java -jar PhyloNet_3.6.0.jar script.nex Input NEXUS file:

Estimating species phylogenies from gene trees Maximum Likelihood: InferNetwork_ML Maximum Pseudo-Likelihood: InferNetwork_MPL Bayesian: MCMC_GT Maximum Parsimony: Trees: Infer_ST_MDC, Infer_ST_MDC_UR Networks: InferNetwork_MP

ILS

Maximum Parsimony Command Usage infer_st_mdc/infer_st_mdc_ur (genetree1 [, genetree2...]) [-e proportion] [-x] [-b threshold] [-a taxamap] [-ur] [t time] [resultoutputfile] genetree1 [, genetree2...] Comma delimited collection of gene trees. mandatory -a taxamap Gene trees / species tree taxa association. -b threshold Gene trees bootstrap threshold. -e proportion Get optimal and sub-optimal species trees whose scores are less than proportion% worse than the optimal score. -x Use all clusters instead of clusters in gene trees. -ur Allow returning unresolved species tree. Limit search time to timelimit minutes. Optional file destination for command output. -t timelimit resultoutputfile

Maximum Parsimony Input NEXUS File #NEXUS BEGIN TREES; Tree gt0 = (((((((Scer,Spar),Smik),Skud),Sbay),Scas),Sklu),Calb); Tree gt105 = ((((((Scer,Spar),Smik),Skud),Sbay),(Scas,Sklu)),Calb); END; BEGIN PHYLONET; Infer_ST_MDC (gt0- gt105); END; Output (Calb:0,(Sklu:0,(Scas:0,(Sbay:0,(Skud:0,(Smik:0,(Scer:0,Spar:0):2):21):54):0):35):0):0; Total number of extra lineages:112

ILS + Introgression

Phylogenetic Network Rich Newick Format Newick for phylogenetic tree & Extended Newick for phylogenetic network (Cardona et al., 2008) https://wiki.rice.edu/confluence/display/phylonet/rich +Newick+Format 2 ((A:2,((B:1,C:1)P:1)X#H1:0::0.3)Q:2, R 0.3 0.7 (D:2,X#H1:0::0.7)R:2); Q X 1 P A B 1 C D :branch length:support:inheritance probability

ILS + Introgression Parsimony

Maximum Parsimony Command Usage InferNetwork_MP (genetree1 [, genetree2...]) numreticulations [-a taxa map] [-b threshold] [-s startingnetwork] [-n numnetreturned] [-m maxnetexamined] [-d maxdiameter] [-h {s1 [,s2...]}] [-w (w1,w2,w3,w4)] [-f maxfailure] [-x numruns] [-pl numprocessors] [-di] [result output file] genetree1 [, genetree2...] Comma delimited collection of gene trees. mandatory numreticulations Maximum number of reticulation nodes to added. mandatory -s startingnetwork Specify the network to start search. Default value is the optimal MDC tree. -n numnetreturned Number of optimal networks to return. Default value is 1. A set of specified hybrid species. The size of this set equals the number of reticulations in the inferred network -pl numprocessors Number of processors if you want the computation to be done in parallel. Default value is 1. -di Output the Rich Newick string of the inferred network that can be read by Dendroscope. -h {s1 [, s2...]} See https://wiki.rice.edu/confluence/display/phylonet/infernetwork_mp for details.

Maximum Parsimony Input NEXUS File #NEXUS BEGIN TREES; Tree gt0 = (((((((Scer,Spar),Smik),Skud),Sbay),Scas),Sklu),Calb); Tree gt105 = ((((((Scer,Spar),Smik),Skud),Sbay),(Scas,Sklu)),Calb); END; BEGIN PHYLONET; InferNetwork_MP (gt0- gt105) 1; END; Output (((((((Sbay)#H1:::0.3608,Skud),((Spar,Scer),Smik)), #H1:::0.6392),Scas),Sklu),Calb); Total number of extra lineages: 60

ILS + Introgression Likelihood

Maximum Likelihood Command Usage InferNetwork_ML (genetree1 [, genetree2...]) numreticulations [-a taxa map] [-bl] [-b threshold] [-s startingnetwork] [-n numnetreturned] [-h {s1 [,s2...]}] [-w (w1,w2,w3,w4)] [-x numruns] [-m maxnetexamined] [-md movediameter] [-rd reticulationdiameter] [-f maxfailure] [o] [-po] [-p (rel,abs)] [-r maxrounds] [-t maxtryperbr] [-i improvethreshold] [-l maxbl] [-pl numprocessors] [-di] [result output file] -po Branch lengths are sampled during the search. This option allows users to optimize branch lengths and inheritance probabilities of the inferred networks as a post-process -bl The branch lengths of the input gene trees need to be considered in the computation. See https://wiki.rice.edu/confluence/display/phylonet/infernetwork_ml+command for details.

Maximum Likelihood Input NEXUS File #NEXUS BEGIN TREES; Tree gt0 = ((((Scer,Spar),Smik),Skud),Sbay); Tree gt105 = ((Scer,Spar),Smik,Skud,Sbay); END; BEGIN PHYLONET; InferNetwork_ML (gt0- gt105) 1; END; Output Inferred Network #1: ((Sbay:1.0)#H1:1.0::0.6130,((Smik:1.0,(Scer:1.0,Spar:1.0):3.5436):1.0585, (#H1:1.0::0.3869,Skud:1.0):2.1717):5.9272); Total log probability: - 151.57753843275103

ILS + Introgression Pseudo-Likelihood

Maximum Pseudo-Likelihood Command Usage InferNetwork_MPL (genetree1 [, genetree2...]) numreticulations [-a taxa map] [-b threshold] [-s startingnetwork] [-n numnetreturned] [-h {s1 [,s2...]}] [-w (w1,w2,w3,w4)] [-x numruns] [-m maxnetexamined] [-md movediameter] [-rd reticulationdiameter] [-f maxfailure] [-o] [-po] [-p (rel,abs)] [-r maxrounds] [-t maxtryperbr] [-i improvethreshold] [-l maxbl] [-pl numprocessors] [di] [result output file] -po Branch lengths are sampled during the search. This option allows users to optimize branch lengths and inheritance probabilities of the inferred networks under FULL likelihood as a post-process See https://wiki.rice.edu/confluence/display/phylonet/infernetwork_mpl for details.

Maximum Pseudo-Likelihood Input NEXUS File #NEXUS BEGIN TREES; Tree gt0 = ((((Scer,Spar),Smik),Skud),Sbay); Tree gt105 = ((Scer,Spar),Smik,Skud,Sbay); END; BEGIN PHYLONET; InferNetwork_MPL (gt0- gt105) 1 - x 10 - po; END;

Maximum Pseudo-Likelihood Output Inferred Network #1: ((Sbay:1.0)#H1:1.0::0.6130,((Smik:1.0,(Scer:1.0,Spar:1.0):3.5436):1.0585, (#H1:1.0::0.3869,Skud:1.0):2.1717):5.9272); Total log probability: - 151.57753843275103 Inferred Network #2: (((((Smik:1.0,(Scer:1.0,Spar:1.0):3.5517):0.8718)#H1:1.5844::0.6690,Skud:1.0):1.6469,Sbay: 1.0):5.914823983639853,#H1:0.0012::0.3309); Total log probability: - 168.28650850921707 Inferred Network #3: (((Smik:1.0,(Scer:1.0,Spar:1.0):3.5379):1.0444,(Skud:1.0)#H1:1.0::0.7007):0.8788, (#H1:1.0::0.2993,Sbay:1.0):5.9324); Total log probability: - 177.6791272776054 Inferred Network #4: ((Sbay:1.0,(Skud:1.0)#H1:1.0::0.9288):0.2035,((#H1:1.0::0.0712,Smik:1.0):0.0012,(Spar: 1.0,Scer:1.0):3.5535):1.3471); Total log probability: - 199.98142731241273 Inferred Network #5: (((Skud:1.0)#H1:1.0::0.9326,Sbay:1.0):0.1974,((#H1:1.0::0.0674,(Spar:1.0,Scer:1.0):3.5517): 0.0012,Smik:1.0):1.3412); Total log probability: - 199.98564708955553

ILS + Introgression Bayesian

Bayesian Command Usage MCMC_GT (genetree1 [, genetree2...]) [-cl chainlength] [-bl burninlength] [-sf samplefrequency] [-sd seed] [-pp poissonparameter] [-mr maximumreticulation] [-pl parallelthreads] [-tp temperaturelist] [-sn startingnetworklist] [-tm taxonmap] -mr The maximum number of reticulation nodes in the sampled phylogenetic networks. The default value is infinity. See https://wiki.rice.edu/confluence/display/phylonet/mcmc_gt+command for details.

Bayesian Input NEXUS File #NEXUS BEGIN TREES; Tree gt0 = ((((Scer,Spar),Smik),Skud),Sbay); Tree gt105 = ((Scer,Spar),Smik,Skud,Sbay); END; BEGIN PHYLONET; MCMC_GT (gt0- gt105); END;

Bayesian Output Iteration; Posterior; ESS; Likelihood; Prior; ESS; #Reticulation 0; - 252.23994; 0.00000; - 244.23994; - 8.00000; 0.00000; 0; (Sbay:1.0,(Skud:1.0,(Smik:1.0,(Scer:1.0,Spar:1.0):1.0):1.0):1.0);... 1100; - 182.48421; 6.31204; - 154.15857; - 28.32565; 8.02957; 1; ((Sbay:0.8364)I0#H1:0.5373::0.5949,((Smik:1.0,(Scer:1.0,Spar:1.0)I5:3.3052)I4:0.8173,(Skud: 2.7098,I0#H1:0.6856::0.4051)I2:2.9664)I3:2.2888)I1; - - - - - - - - - - - - - - - - - - - - - - - Summarization: - - - - - - - - - - - - - - - - - - - - - - - Burn- in = 100000, Chain length = 1100000, Sample size = 1000 Acceptance rate = 0.55319 - - - - - - - - - - - - - - - Operations - - - - - - - - - - - - - - - Operation:Move- Head; Used:143520; Accepted:1 ACrate:6.967670011148272E- 6... Operation:Delete- Reticulation; Used:5358; Accepted:2626 ACrate:0.49010824934677116 Overall MAP = - 179.89656989770972 ((((Sbay:0.8364)I0#H1:0.6856::0.3776,Skud:2.7098)I2:1.7771,(Smik:1.0,(Scer:1.0,Spar: 1.0)I5:2.9942)I4:1.0385)I3:2.4721,I0#H1:0.5373::0.6224)I1;

Bayesian Output - - - - - - - - - - - - - - Top Topologies: - - - - - - - - - - - - - - Rank = 0; Size = 818; Percent = 0.818; MAP = - 179.89656989770972: ((((Sbay:0.8364)I0#H1:0.6856::0.3776,Skud:2.7098)I2:1.7771,(Smik:1.0,(Scer: 1.0,Spar:1.0)I5:2.9942)I4:1.0385)I3:2.4721,I0#H1:0.5373::0.6224)I1; Ave = - 182.497829758399; ((Sbay:0.8364)I0#H1:0.5373::0.6231,(((Spar:1.0,Scer:1.0)I5:3.3546,Smik: 1.0)I4:1.0773,(Skud:2.7098,I0#H1:0.6857::0.3769)I2:1.9924)I3:2.6931)I1; Rank = 1; Size = 180; Percent = 0.18; MAP = - 191.61104349697848:((((((Spar:1.0,Scer:1.0)I5:3.3889,Smik: 1.0)I4:0.8307)I3#H1:0.9523::0.6741,Skud:0.1783)I2:1.2530,Sbay:0.3632)I0:1.6455,I3#H1:0.0210::0.3258)I1; Ave = - 194.12662772729175; (((Smik:1.0,(Scer:1.0,Spar:1.0)I5:3.3053)I4:0.9721)I3#H1:0.0479::0.3634, (Sbay:0.3632,(Skud:0.1919,I3#H1:1.2856::0.6365)I2:1.4111)I0:1.5633)I1; Rank = 2; Size = 2; Percent = 0.002; MAP = - 189.56637204871708:(((((Skud:2.7098,(Sbay: 0.8363)I0#H2:0.6856::0.3355)I2:3.8612)I6#H1:0.0667::0.3223,((Spar:1.0,Scer:1.0)I5:2.6678,Smik: 1.0)I4:0.4006)I7:1.0679,I6#H1:0.2275::0.6776)I3:2.1196,I0#H2:0.5372::0.6644)I1; Ave = - 190.47255907248774; ((Sbay:0.8363)I0#H1:0.5372::0.8354,(((I0#H1:0.6856::0.1645,Skud: 2.7098)I2:3.7692)I6#H2:0.2659::0.7540,((Smik:1.0,(Scer:1.0,Spar: 1.0)I5:2.4576)I4:0.4274,I6#H2:0.0704::0.2459)I7:1.2316)I3:2.0443)I1;

Scalability Maximum Parsimony: ~30 taxa, 3 or 4 reticulations Maximum Likelihood: ~10 taxa, 2 or 3 reticulations Maximum Pseudo-Likelihood: ~30 taxa Bayesian: ~10 taxa, 2 or 3 reticulations

Testing Your Hypothesis Parsimony: Trees: DeepCoalCount_tree Networks: DeepCoalCount_network Likelihood: CalGTProb

Relevant Publications Likelihood: Yu et. al, PloS Genetics, vol. 8, no. 4, p. e1002660, 2012 Parsimony: Yu et. al, Systematic Biology, vol. 62, no. 5, pp. 738-751, 2013 Likelihood: Yu et. al, PNAS, vol. 111, no. 46, pp. 16448-16453, 2014 Pseudo-likelihood: Yu et. al, BMC Genomics, vol. 16, no. Suppl 10, p. S10, 2015 Application: Wen et. al, Molecular Ecology, vol. 25, pp. 2361-2372, 2016 Bayesian: Wen et. al, PloS Genetics, vol. 12, no. 5, p. e1006006, 2016

Try it yourself!