Methods to reconstruct phylogene1c networks accoun1ng for ILS

Similar documents
LETTERS. ; The complete mitochondrial DNA genome of an unknown hominin from southern Siberia

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Maximum Likelihood Inference of Reticulate Evolutionary Histories

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu

Anatomy of a species tree

WenEtAl-biorxiv 2017/12/21 10:55 page 2 #2

Coalescent Histories on Phylogenetic Networks and Detection of Hybridization Despite Incomplete Lineage Sorting

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support

PhyloNet. Yun Yu. Department of Computer Science Bioinformatics Group Rice University

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

Mitochondrial DNA Clocks Imply Linear Speciation Rates Within Kinds

Non-binary Tree Reconciliation. Louxin Zhang Department of Mathematics National University of Singapore

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees

Quartet Inference from SNP Data Under the Coalescent Model

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Estimating Evolutionary Trees. Phylogenetic Methods

From Gene Trees to Species Trees. Tandy Warnow The University of Texas at Aus<n

Taming the Beast Workshop

Evolu&on, Popula&on Gene&cs, and Natural Selec&on Computa.onal Genomics Seyoung Kim

Upcoming challenges in phylogenomics. Siavash Mirarab University of California, San Diego

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

Selection against A 2 (upper row, s = 0.004) and for A 2 (lower row, s = ) N = 25 N = 250 N = 2500

The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection

In comparisons of genomic sequences from multiple species, Challenges in Species Tree Estimation Under the Multispecies Coalescent Model REVIEW

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

CSCI1950 Z Computa3onal Methods for Biology* (*Working Title) Lecture 1. Ben Raphael January 21, Course Par3culars

Improved maximum parsimony models for phylogenetic networks

Point of View. Why Concatenation Fails Near the Anomaly Zone

Workshop III: Evolutionary Genomics

Fast coalescent-based branch support using local quartet frequencies

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci

Denisova cave. h,ps://

Properties of Consensus Methods for Inferring Species Trees from Gene Trees

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Dr. Amira A. AL-Hosary

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley

An introduction to phylogenetic networks

Level 3 Biology, 2014

CS 581 / BIOE 540: Algorithmic Computa<onal Genomics. Tandy Warnow Departments of Bioengineering and Computer Science hhp://tandy.cs.illinois.

Phylogenetics: Building Phylogenetic Trees

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics in the Age of Genomics: Prospects and Challenges

Phylogeny and Molecular Evolution. Introduction

Supporting Information

A (short) introduction to phylogenetics

Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting

Consistency Index (CI)

BINF6201/8201. Molecular phylogenetic methods

Comparative Genomics II

394C, October 2, Topics: Mul9ple Sequence Alignment Es9ma9ng Species Trees from Gene Trees

The Mathema)cs of Es)ma)ng the Tree of Life. Tandy Warnow The University of Illinois

Phylogenetic Networks, Trees, and Clusters

Properties of normal phylogenetic networks

Bayesian Models for Phylogenetic Trees

Phylogenomics of closely related species and individuals

Human Evolution

Human Evolution. Darwinius masillae. Ida Primate fossil from. in Germany Ca.47 M years old. Cantius, ca 55 mya

EVOLUTIONARY DISTANCES

Is the equal branch length model a parsimony model?

Inference of Parsimonious Species Phylogenies from Multi-locus Data

Efficient Bayesian species tree inference under the multi-species coalescent

Inferring Molecular Phylogeny

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

The Inference of Gene Trees with Species Trees

From Genes to Genomes and Beyond: a Computational Approach to Evolutionary Analysis. Kevin J. Liu, Ph.D. Rice University Dept. of Computer Science

Regular networks are determined by their trees

Theory of Evolution Charles Darwin


Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Understanding How Stochasticity Impacts Reconstructions of Recent Species Divergent History. Huateng Huang

Species Tree Inference by Minimizing Deep Coalescences

Estimating Species Phylogeny from Gene-Tree Probabilities Despite Incomplete Lineage Sorting: An Example from Melanoplus Grasshoppers

Genetic history of an archaic hominin group from Denisova Cave in Siberia

Constructing Evolutionary/Phylogenetic Trees

Leber s Hereditary Optic Neuropathy

DNA-based species delimitation

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Efficient Bayesian Species Tree Inference under the Multispecies Coalescent

Genetic Drift in Human Evolution

Detection and Polarization of Introgression in a Five-Taxon Phylogeny

A Phylogenetic Network Construction due to Constrained Recombination

Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim

To link to this article: DOI: / URL:

Phylogenetics: Parsimony

Mixture Models. Michael Kuhn

Chapter 26 Phylogeny and the Tree of Life

Supplementary Materials: Efficient moment-based inference of admixture parameters and sources of gene flow

Exploring phylogenetic hypotheses via Gibbs sampling on evolutionary networks

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Robust demographic inference from genomic and SNP data

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley

What is Phylogenetics

Posterior predictive checks of coalescent models: P2C2M, an R package

Happy New Year Homo erectus? More evidence for interbreeding with archaics predating the modern human/neanderthal split.

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Concepts and Methods in Molecular Divergence Time Estimation

Transcription:

Methods to reconstruct phylogene1c networks accoun1ng for ILS Céline Scornavacca some slides have been kindly provided by Fabio Pardi ISE-M, Equipe Phylogénie & Evolu1on Moléculaires Montpellier, France February the 2nd, 2017

Phylogene1c networks Trees displayed by a network In a phylogene1c network, a re1culate event is represented as a re1cula1on, where branches converge to give rise to a new lineage: a b c d e f

Phylogene1c networks Trees displayed by a network In a phylogene1c network, a re1culate event is represented as a re1cula1on, where branches converge to give rise to a new lineage: The genome(s) at the start of the new lineage is (are) a composi1on of those of the parent lineages. a b c d e f

Phylogene1c networks Trees displayed by a network In a phylogene1c network, a re1culate event is represented as a re1cula1on, where branches converge to give rise to a new lineage: The genome(s) at the start of the new lineage is (are) a composi1on of those of the parent lineages. The evolu1on of each part independently inherited is described by a gene tree a b c d e f

Phylogene1c networks Trees displayed by a network In a phylogene1c network, a re1culate event is represented as a re1cula1on, where branches converge to give rise to a new lineage: The genome(s) at the start of the new lineage is (are) a composi1on of those of the parent lineages. The evolu1on of each part independently inherited is described by a gene tree a b c d e f a b c d e f

Phylogene1c networks Trees displayed by a network In a phylogene1c network, a re1culate event is represented as a re1cula1on, where branches converge to give rise to a new lineage: The genome(s) at the start of the new lineage is (are) a composi1on of those of the parent lineages. The evolu1on of each part independently inherited is described by a gene tree a b c d e f a b c d e f a b c d e f

Phylogene1c networks Trees displayed by a network In a phylogene1c network, a re1culate event is represented as a re1cula1on, where branches converge to give rise to a new lineage: The genome(s) at the start of the new lineage is (are) a composi1on of those of the parent lineages. The evolu1on of each part independently inherited is described by a gene tree a b c d e f In the absence of deep coalescence and allopolyploidy, the gene trees are a subset of the trees displayed by the network a b c d e f a b c d e f

Phylogene1c networks Phylogene1c network inference N a b c d e f?

Phylogene1c networks Phylogene1c network inference An op1miza1on problem where a candidate network is evaluated on the basis of how well the trees it displays fit the data: N a b c d e f Many possible formula4ons: a b c d e f a b c d e f Data: Trees with 3 taxa: (inferred from other data) a b c Goal: Find the network N with the lower hybridiza1on number such that the triplets are `consistent with one of the trees displayed by N subject to constraints on the complexity of N c f a d e f d f b a c d...

Phylogene1c networks Phylogene1c network inference An op1miza1on problem where a candidate network is evaluated on the basis of how well the trees it displays fit the data: N a b c d e f Many possible formula4ons: a b c d e f a b c d e f Data: Any trees on the same taxa: (inferred from other data) Goal: Find the network N with the lower hybridiza1on number such that the input trees are `consistent with one of the trees displayed by N subject to constraints on the complexity of N a c d e f c f a b d e f...

Phylogene1c networks Phylogene1c network inference An op1miza1on problem where a candidate network is evaluated on the basis of how well the trees it displays fit the data: N a b c d e f Many possible formula4ons: a b c d e f a b c d e f Data: Clusters of taxa: {a, b}, {d, e}, {d, e, f}, {a, b, c, d, e, f}, {e, f}, {c, d, e, f},... Goal: Find the network N with the lower hybridiza1on number such that the input clusters are `explained by one of the trees displayed by N subject to constraints on the complexity of N

Phylogene1c networks Phylogene1c network inference An op1miza1on problem where a candidate network is evaluated on the basis of how well the trees it displays fit the data: N a b c d e f Many possible formula4ons: a b c d e f a b c d e f Data: Sequence alignments: (typically given in blocks) T (N) : Goal: mx Find N that minimizes F (N A 1,A 2,...,A m )= min F (T A i) T 2T (N) i=1 subject to constraints on the complexity of N. F() is the parsimony score. Jin et al. Parsimony Score of Phylogene1c Networks: Hardness Results and a Linear-Time Heuris1c. TCCB. 2009....

Jin et al.maximum likelihood of phylogene1c networks. Bioinforma1cs 2006. Phylogene1c networks Phylogene1c network inference An op1miza1on problem where a candidate network is evaluated on the basis of how well the trees it displays fit the data: N p 1-p a b c d e f T (N) : Many possible formula4ons: a b c d e f a b c d e f Data: Sequence alignments: (typically given in blocks) Goal: Find N that maximises 0 1 my my Pr(A 1,A 2,...,A m N) = Pr(A i N) = @ X Pr(A i T )Pr(T N) A... i=1 i=1 T 2T (N)

Jin et al.maximum likelihood of phylogene1c networks. Bioinforma1cs 2006. Phylogene1c networks Phylogene1c network inference An op1miza1on problem where a candidate network is evaluated on the basis of how well the trees it displays fit the data: h^ps://phylomeme1c.wordpress.com/2015/04/17/phylodag/ h^p://old-bioinfo.cs.rice.edu/nepal/ T (N) : p 1-p N a b c d e f Many possible formula4ons: a b c d e f a b c d e f Data: Sequence alignments: (typically given in blocks) Goal: Find N that maximises 0 1 my my Pr(A 1,A 2,...,A m N) = Pr(A i N) = @ X Pr(A i T )Pr(T N) A... i=1 i=1 T 2T (N)

Problems Some issues Searching the space of phylogene1c networks The space of networks with k re1cula1ons is infinite. Controlling for Model Complexity Because any network with k re1cula1ons provides a more complex model than any network with (k-1) re1cula1ons, we must handle the model selec1on problem (AIC, BIC, K-fold cross-valida1on, ). Iden1fiability issues Pr(A 1,A 2,...,A m N) = my Pr(A i N) = i=1 0 my @ Not accoun1ng for ILS and allopolyploidy i=1 X T 2T (N) 1 Pr(A i T )Pr(T N) A

Iden1fiability problems Different networks can display the same trees Some networks display exactly the same trees: N 1 N 2 d d a b c a b c T 1 T 2 T 3 d d d a b c a b c a b c

Iden1fiability problems Different networks can display the same trees Some networks display exactly the same trees: N 1 N 2 Because N 1 and N 2 display the same trees, they are equally good to any of the inference methods we saw no ma9er the input data T 1 a d d b c a b T 2 T 3 d d d c a b c a b c a b c (Recall that a network is evaluated on the basis of how well the trees it displays fit the data) Data

Iden1fiability problems Different networks can display the same trees Some networks display exactly the same trees: N 1 N 2 Because N 1 and N 2 display the same trees, they are equally good to any of the inference methods we saw no ma9er the input data T 1 a d d d b c a b T 2 T 3 UNIDENTIFIABILITY d d c a b c a b c a b c

Iden1fiability problems Indis1nguishable networks Branch lengths do not eliminate noniden1fiability 5.2.05 5.05.05.05 N 1 N 2 5 2.07 5.05.03 5.05.05 a b c d e a b c d e.2 5.2.2.25.2.05.05 a b c d e.35.2.2 5.05.05 a b c d e.35.25.05 5.05 a d b c e N 1 and N 2 display the same trees (i.e. including branch lengths) and are thus indis4nguishable even to methods accoun1ng for lengths

Iden1fiability problems Indis1nguishable networks Branch lengths do not eliminate noniden1fiability The same hold for inheritance probabili1es 5 p 1-p.2.05 5.05.05.05 N 1 N 2 5 2.07 5.05.03 5.05.05 a b c d e a b c d e.2 5.2.2.25.2.05.05 a b c d e.35.2.2 5.05.05 a b c d e.35.25.05 5.05 a d b c e N 1 and N 2 display the same trees (i.e. including branch lengths) and are thus indis4nguishable even to methods accoun1ng for lengths

Canonical networks Unzipping a network Key observa1on: we can move re1cula1ons up or down (un1l they hit a specia1on node) and the trees displayed by a network remain the same:.2 5.05.05.05.05 5 a b c d e N 1 5.2.05 Δ 5.05.05.05 a b c d e

Canonical networks Unzipping a network Key observa1on: we can move re1cula1ons up or down (un1l they hit a specia1on node) and the trees displayed by a network remain the same:.2 5.05.05.05.05 5 a b c d e N 1 5 5 N*.2.05 Δ 5.05.05.05 a b c d e.2.2.2 5 5.05.05 a b c d e

Canonical networks Unzipping a network Key observa1on: we can move re1cula1ons up or down (un1l they hit a specia1on node) and the trees displayed by a network remain the same: N 1 5.2.05 5.05.05.05 a b c d e N 2 5 2.07.2 5.05.03 5.05.05 a b c d e 5 5 N*.2.05 Δ 5.05.05.05 a b c d e.2.2.2 5 5.05.05 a b c d e

Canonical networks Unzipping a network Key observa1on: we can move re1cula1ons up or down (un1l they hit a specia1on node) and the trees displayed by a network remain the same: N 1 5.2.05 5.05.05.05 a b c d e N 2 5 2.07.2 5.05.03 5.05.05 a b c d e By moving re1cula1ons always down, N 1 and N 2 both end up becoming the same network. 5 N* True in general: indis1nguishable networks always transform into the same network the canonical form of N 1 and N 2..2.2.2 5 5.05.05 a b c d e

Canonical networks What it means for the evolu1onary biologist If N is reconstructed by a "classic" inference method, then even assuming perfect and unlimited data, the best you can hope is that the true phylogene1c network is just one of the many that are indis1nguishable from N 5.2.05 5.05.05.05 a b c d e N 5.2.2.2 5 5.05.05 a b c d e N* The canonical form of N is a unique representa1ve of the networks indis1nguishable from N, that excludes their unrecoverable aspects

Canonical networks Take home message for the computational biologist If N is reconstructed by a "classic" inference method, then even assuming perfect and unlimited data, the best you can hope is that the true phylogene1c network is just one of the many that are indis1nguishable from N Canonical networks Network space Classes of indistinguishable networks "Classic" network inference methods should only a^empt to reconstruct canonical networks.

Dealing with deep coalescence Are gene trees always displayed by the network? The approaches above assume that deep coalescence cannot occur, so gene trees are necessarily displayed trees, but this is not always the case Currently accepted introgression scenario among modern humans, Neanderthals and Denisovans, based on sequenced (ancient) nuclear genomes: Phylogene1c tree for the mitochondrial genome (Krause et al Nature 2010): 1.04 Myr (779 kyr 1.3 Myr) 1 466 kyr (321 618 kyr) 1 1 1 1 0.99 1 1 1 1 1 Afr. Mbuti 2 2 Afr. Mbuti 3 Afr. Hausa 4 Afr. San 5 Afr. Ibo 2 6 Afr. San 2 7 Afr. Mbenzele 8 Afr. Biaka 2 9 Afr. Mbenzele 2 10 Afr. Biaka 11 Afr. Kikuyu 12 Afr. Ibo 13 Afr. Mandenka 14 Afr. Effik 15 Afr. Effik 2 16 Afr. Ewondo 17 Afr. Lisongo 18 Afr. Yoruba 19 Afr. Bamileke 20 Afr. Yoruba 21 Asia Evenki 22 Asia Khirgiz 23 Asia Buriat 24 Am. Warao 2 25 Am. Warao 26 New Guinea Coast 27 Aust. Aborigine 3 Modern 28 Asia Chinese 29 Asia Japanese humans 30 Asia Siberian 31 Asia Japanese 32 Am. Guarani 33 Asia Indian 34 Afr. Mkamba 35 Asia Chukchi 36 New Guinea Coast 2 37 New Guinea High 38 New Guinea High 2 13 39 Eur. Italian 40 EMH Kostenki 14 41 Asia Uzbek 42 Asia Korean 43 Polynesia Samoan 44 Am. Piman 45 Eur. German 46 Asia Georgian 47 Asia Chinese 48 Eur. English 49 Eur. Saami 50 Eur. French 51 Asia Crimean Tatar 52 Eur. Dutch 53 rcrs 54 Aust. Aborigine 55 Aust. Aborigine 2 56 Mezmaiskaya 57 Vi336 58 Feldhofer 2 59 Neanderthals Sidron 1253 60 Vi33.25 61 Feldhofer 1 62 Denisova Denisova

ILS An example of incomplete lineage sor1ng (ILS) (a) Population view (b) Reconciliation representation

ILS ILS and the probability of gene trees A B C A B C A B C a b c b a c Rannal & Yang 2003 (Genealogies) Degnan & Salter 2005 (Topologies) c a b

ILS ILS and the probability of gene trees A B C A B C A B C a b c b a c c a b

ILS ILS and the probability of gene trees a b c d a c d b b c a d b d c a a b c d a b d c a d b c b c d a c d a b a c b d a c b d a d c b b d a c c d b a a d b c

ILS Es1ma1ng the species tree under the coalescent model Since standard phylogene1c methods can be inconsistent under ILS (Degnan & Rosenberg 2006), new methods have been developed to cope for this bias, eg: STEM (Kubatko, Carstens & Knowles 2009) STAR (Liu, Yu, Pearl & Edwards 2009) MP-EST (Liu, Yu & Edwards 2010) They es1mate a phylogene1c species tree given a sample of gene trees (with branch lengths or not) under the coalescent model.

ILS Es1ma1ng the species tree under the coalescent model Since standard phylogene1c methods can be inconsistent under ILS (Degnan & Rosenberg 2006), new methods have been developed to cope for this bias, eg: STEM (Kubatko, Carstens & Knowles 2009) STAR (Liu, Yu, Pearl & Edwards 2009) MP-EST (Liu, Yu & Edwards 2010) The same can be done with networks! A B C Yu et al. PNAS 2014

Phylogene1c networks + ILS Phylogene1c network inference under ILS N a b c d e f Pr(A 1,A 2,...,A m N) = my my X Pr(A i N) = Pr(A i T )Pr(T N). i=1 i=1 T 2T (N) Y Z?

Phylogene1c networks + ILS Phylogene1c network inference under ILS N a b c d e f my my X Pr(A 1,A 2,...,A m N) = Pr(A i N) = Pr(A i T )Pr(T N). i=1 i=1 T 2T (N) my Z Pr(A 1,A 2,...,A m N) = i=1 Pr(A 1,A 2,...,A m N) = G Pr(A i )p(g N). Y m i=1 p(g i N).?

Phylogene1c networks + ILS Phylogene1c network inference under ILS Y Yu & L Nakhleh's group : Yu et al. BMC Bioinforma1cs 2013 (without branch lengths) Yu et al. PNAS 2014 (with branch lengths) Wen el al. PLOS Gene1cs 2016 (Bayesian method) PhyloNet h^p://bioinfo.cs.rice.edu/phylonet Y Y X But also other groups contributed significantly, e.g. L Kubabtko's group and C Ane s group. Y Z Pr(A 1,A 2,...,A m N) = my p(g i N). i=1

Dealing with deep coalescence Are gene trees always displayed by the network? The approaches above assume that deep coalescence cannot occur, so gene trees are necessarily displayed trees, but this is not always the case: Currently accepted introgression scenario among modern humans, Neanderthals and Denisovans, based on sequenced (ancient) nuclear genomes:

Dealing with deep coalescence Are gene trees always displayed by the network? The approaches above assume that deep coalescence cannot occur, so gene trees are necessarily displayed trees, but this is not always the case: Currently accepted introgression scenario among modern humans, Neanderthals and Denisovans, based on sequenced (ancient) nuclear genomes: Phylogene1c tree for the mitochondrial genome (Krause et al Nature 2010): 1.04 Myr (779 kyr 1.3 Myr) 1 466 kyr (321 618 kyr) 1 1 1 1 0.99 1 1 1 1 1 Afr. Mbuti 2 2 Afr. Mbuti 3 Afr. Hausa 4 Afr. San 5 Afr. Ibo 2 6 Afr. San 2 7 Afr. Mbenzele 8 Afr. Biaka 2 9 Afr. Mbenzele 2 10 Afr. Biaka 11 Afr. Kikuyu 12 Afr. Ibo 13 Afr. Mandenka 14 Afr. Effik 15 Afr. Effik 2 16 Afr. Ewondo 17 Afr. Lisongo 18 Afr. Yoruba 19 Afr. Bamileke 20 Afr. Yoruba 21 Asia Evenki 22 Asia Khirgiz 23 Asia Buriat 24 Am. Warao 2 25 Am. Warao 26 New Guinea Coast 27 Aust. Aborigine 3 Modern 28 Asia Chinese 29 Asia Japanese humans 30 Asia Siberian 31 Asia Japanese 32 Am. Guarani 33 Asia Indian 34 Afr. Mkamba 35 Asia Chukchi 36 New Guinea Coast 2 37 New Guinea High 38 New Guinea High 2 13 39 Eur. Italian 40 EMH Kostenki 14 41 Asia Uzbek 42 Asia Korean 43 Polynesia Samoan 44 Am. Piman 45 Eur. German 46 Asia Georgian 47 Asia Chinese 48 Eur. English 49 Eur. Saami 50 Eur. French 51 Asia Crimean Tatar 52 Eur. Dutch 53 rcrs 54 Aust. Aborigine 55 Aust. Aborigine 2 56 Mezmaiskaya 57 Vi336 58 Feldhofer 2 59 Neanderthals Sidron 1253 60 Vi33.25 61 Feldhofer 1 62 Denisova Denisova

Dealing with deep coalescence Deep coalescence (ILS) can solve indis1nguishability The following two scenarios would be indis1nguishable on the basis of the trees they display: A B C A B C * *

Dealing with deep coalescence Deep coalescence (ILS) can solve indis1nguishability The following two scenarios would be indis1nguishable on the basis of the trees they display: Imagine sampling two alleles from the hybrid species * In absence of ILS the only observable gene trees are: A B C A B C * *

Dealing with deep coalescence Deep coalescence (ILS) can solve indis1nguishability The following two scenarios would be indis1nguishable on the basis of the trees they display: Imagine sampling two alleles from the hybrid species * If the two alleles from * do not coalesce, other three gene trees can be observed: A B C A B C * *

Dealing with deep coalescence Deep coalescence (ILS) can solve indis1nguishability The following two scenarios would be indis1nguishable on the basis of the trees they display: Imagine sampling two alleles If this is the from the hybrid species * true network If the two alleles from * do not coalesce, other three gene trees can be observed: A B C A B C * * mostly these

Dealing with deep coalescence Deep coalescence (ILS) can solve indis1nguishability The following two scenarios would be indis1nguishable on the basis of the trees they display: Imagine sampling two alleles If this is the from the hybrid species * true network If the two alleles from * do not coalesce, other three gene trees can be observed: A B C A B C * * mostly these

Dealing with deep coalescence Network Mul1-Species Coalescent (NMSC) Given a network with branch lengths (in coalescent units) and inheritance probabili1es, we can calculate the probability of every possible gene tree: Pr = ¼ Pr = ⅛ Pr = ⅛ Pr = ¼ Pr = ¼ ½ ½ ½ ½ 0 A * B C Pr = 0

Dealing with deep coalescence Network Mul1-Species Coalescent (NMSC) Given a network with branch lengths (in coalescent units) and inheritance probabili1es, we can calculate the probability of every possible gene tree: Pr = ¼ Pr = ⅛ Pr = ⅛ Pr = ¼ Pr = ¼ ½ ½ ½ ½ 0 A * B C The probability of a gene tree T is a weighted average of the probabili1es of T under a number of "parental trees" (14 here)

Dealing with deep coalescence Network Mul1-Species Coalescent (NMSC) Given a network with branch lengths (in coalescent units) and inheritance probabili1es, we can calculate the probability of every possible gene tree: Pr = ¼ Pr = ⅛ Pr = ⅛ Pr = ¼ Pr = ¼ ½ ½ ½ ½ 0 A * B C This may solve the iden1fiability issues for several prac1cal cases. Note that gene trees can fail to be displayed by the network also because of allopolyploidy.

Problems Some issues Searching the space of phylogene1c networks The space of networks with k re1cula1ons is infinite. Controlling for Model Complexity Because any network with k re1cula1ons provides a more complex model than any network with (k-1) re1cula1ons, we must handle the model selec1on problem (AIC, BIC, K-fold cross-valida1on, ). Iden1fiability issues Not accoun1ng for ILS and allopolyploidy

h^p://phylnet.univ-mlv.fr/ h^p://phylonetworks.blogspot.fr THANKS FOR YOUR ATTENTION