Reconstruction of species trees from gene trees using ASTRAL. Siavash Mirarab University of California, San Diego (ECE)

Size: px
Start display at page:

Download "Reconstruction of species trees from gene trees using ASTRAL. Siavash Mirarab University of California, San Diego (ECE)"

Transcription

1 Reconstruction of species trees from gene trees using ASTRAL Siavash Mirarab University of California, San Diego (ECE)

2 Phylogenomics Orangutan Chimpanzee gene 1 gene 2 gene 999 gene 1000 Gorilla Human ACTGCACACCG ACTGC-CCCCG AATGC-CCCCG -CTGCACACGG CTGAGCATCG CTGAGC-TCG ATGAGC-TC- CTGA-CAC-G AGCAGCATCGTG AGCAGC-TCGTG AGCAGC-TC-TG C-TA-CACGGTG CAGGCACGCACGAA AGC-CACGC-CATA ATGGCACGC-C-TA AGCTAC-CACGGAT Inference error gene here refers to a portion of the genome (not a functional gene) More data (#genes) 2

3 Gene tree discordance gene 1 gene1000 Gorilla Human Chimp Orang. Gorilla Chimp Human Orang. 3

4 Gene tree discordance The species tree gene 1 gene1000 Gorilla Human Chimp Orangutan Gorilla Human Chimp Orang. Gorilla Chimp Human Orang. A gene tree 3

5 Gene tree discordance The species tree gene 1 gene1000 Gorilla Human Chimp Orangutan Gorilla Human Chimp Orang. Gorilla Chimp Human Orang. A gene tree Causes of gene tree discordance include: Incomplete Lineage Sorting (ILS) Duplication and loss Horizontal Gene Transfer (HGT) 3

6 Incomplete Lineage Sorting (ILS) A random process related to the coalescence of alleles across various populations Tracing alleles through generations 4

7 Incomplete Lineage Sorting (ILS) A random process related to the coalescence of alleles across various populations Tracing alleles through generations 4

8 Incomplete Lineage Sorting (ILS) A random process related to the coalescence of alleles across various populations Tracing alleles through generations Omnipresent: possible for every tree Likely for short branches or large population sizes 4

9 MSC and Identifiability The statistical process multi-species coalescent (MSC) is used to model ILS. 5

10 MSC and Identifiability The statistical process multi-species coalescent (MSC) is used to model ILS. Any species tree defines a unique distribution on the set of all possible gene trees 5

11 MSC and Identifiability The statistical process multi-species coalescent (MSC) is used to model ILS. Any species tree defines a unique distribution on the set of all possible gene trees In principle, the species tree can be identified despite high discordance from the gene tree distribution 5

12 Multi-gene tree estimation pipelines gene 1 ACTGCACACCG ACTGC-CCCCG AATGC-CCCCG -CTGCACACGG gene 1 supermatrix ACTGCACACCG CTGAGCATCG ACTGC-CCCCG CTGAGC-TCG AATGC-CCCCG ATGAGC-TC- -CTGCACACGGCTGA-CAC-G gene 2 gene 1000 Approach 1: Concatenation Orangutan CAGAGCACGCACGAA AGCA-CACGC-CATA ATGAGCACGC-C-TA AGC-TAC-CACGGAT Phylogeny inference Gorilla Chimpanzee Human gene 2 CTGAGCATCG CTGAGC-TCG ATGAGC-TC- CTGA-CAC-G Gene tree estimation Approach 2: Summary methods Chimp Gorilla gene 1000 CAGGCACGCACGAA AGC-CACGC-CATA ATGGCACGC-C-TA AGCTAC-CACGGAT gene 1 gene 2 gene 1000 Human Gorilla Human Orang. Human Orang. Chimp Orang. Chimp Gorilla Summary method Orangutan Gorilla Chimpanzee Human There are also other very good approaches: co-estimation (e.g., *BEAST), site-based (e.g., SVDQuartets, SNAPP) 6

13 Multi-gene tree estimation pipelines gene 1 ACTGCACACCG ACTGC-CCCCG AATGC-CCCCG -CTGCACACGG gene 2 CTGAGCATCG CTGAGC-TCG ATGAGC-TC- CTGA-CAC-G gene 1 supermatrix ACTGCACACCG CTGAGCATCG ACTGC-CCCCG CTGAGC-TCG AATGC-CCCCG ATGAGC-TC- -CTGCACACGGCTGA-CAC-G gene 2 gene 1000 Gene tree estimation Approach 1: Concatenation Orangutan CAGAGCACGCACGAA AGCA-CACGC-CATA ATGAGCACGC-C-TA AGC-TAC-CACGGAT Phylogeny inference Approach 2: Summary methods Gorilla Chimpanzee Statistically inconsistent [Roch and Steel, 2014] Human Chimp Gorilla Error gene 1000 CAGGCACGCACGAA AGC-CACGC-CATA ATGGCACGC-C-TA AGCTAC-CACGGAT gene 1 gene 2 gene 1000 Human Gorilla Human Orang. Human Orang. Chimp Orang. Chimp Gorilla Summary method Orangutan Gorilla Chimpanzee Human Data Can be statistically consistent given true gene trees Error-free gene trees 6

14 Multi-gene tree estimation pipelines Error gene 1 ACTGCACACCG ACTGC-CCCCG AATGC-CCCCG -CTGCACACGG gene 2 CTGAGCATCG CTGAGC-TCG ATGAGC-TC- CTGA-CAC-G gene 1000 CAGGCACGCACGAA AGC-CACGC-CATA ATGGCACGC-C-TA AGCTAC-CACGGAT gene 1 ACTGCACACCG CTGAGCATCG ACTGC-CCCCG CTGAGC-TCG AATGC-CCCCG ATGAGC-TC- -CTGCACACGGCTGA-CAC-G gene 1 supermatrix gene 2 gene 1000 Gene tree estimation Chimp Approach 1: Concatenation Orangutan CAGAGCACGCACGAA AGCA-CACGC-CATA ATGAGCACGC-C-TA AGC-TAC-CACGGAT Gorilla Approach 2: Summary methods Summary method Phylogeny inference Gorilla Orangutan Human Orang. BUCKy (population tree), Gorilla Chimp gene 2 MP-EST, NJst (ASTRID), Human Orang. Gorilla Chimpanzee Statistically inconsistent [Roch and Steel, 2014] STAR, STELLS, GLASS, Human Chimpanzee Human ASTRAL, Orang. ASTRAL-II, ASTRAL-III Chimp gene 1000 Human Gorilla Data Can be statistically consistent given true gene trees Error-free gene trees 6

15 ASTRAL used by the biologists Plants: Wickett, et al., 2014, PNAS Birds: Prum, et al., 2015, Nature Xenoturbella, Cannon et al., 2016, Nature Xenoturbella, Rouse et al., 2016, Nature Flatworms: Laumer, et al., 2015, elife Shrews: Giarla, et al., 2015, Syst. Bio. Frogs: Yuan et al., 2016, Syst. Bio. Tomatoes: Pease, et al., 2016, PLoS Bio. ASTRAL-II ASTRAL Angiosperms: Huang et al., 2016, MBE Worms: Andrade, et al., 2015, MBE

16 This talk: ASTRAL Outlines of the method Accuracy in simulation studies With strong model violations The impact of Fragmentary sequence data Sampling multiple individuals Visualization 8

17 Unrooted quartets under MSC model For a quartet (4 species), the unrooted species tree topology has at least 1/3 probability in gene trees (Allman, et al. 2010) Chimp Gorilla θ 1 =70% θ 2 =15% θ 3 =15% d=0.8 Chimp Gorilla Orang. Chimp Gorilla Chimp Human Orang. Human Orang. Human Gorilla Human Orang. 9

18 Unrooted quartets under MSC model For a quartet (4 species), the unrooted species tree topology has at least 1/3 probability in gene trees (Allman, et al. 2010) Chimp Gorilla θ 1 =70% θ 2 =15% θ 3 =15% d=0.8 Chimp Gorilla Orang. Chimp Gorilla Chimp Human Orang. Human Orang. Human Gorilla Human Orang. The most frequent gene tree = The most likely species tree 9

19 Unrooted quartets under MSC model For a quartet (4 species), the unrooted species tree topology has at least 1/3 probability in gene trees (Allman, et al. 2010) Chimp Gorilla θ 1 =70% θ 2 =15% θ 3 =15% d=0.8 Chimp Gorilla Orang. Chimp Gorilla Chimp Human Orang. Human Orang. Human Gorilla Human Orang. The most frequent gene tree = The most likely species tree shorter branches more discordance a harder species tree reconstruction problem 9

20 More than 4 species For >4 species, the species tree topology can be different from the most like gene tree (called anomaly zone) (Degnan, 2013) Gorilla Human Chimp Orang. Rhesus 10

21 More than 4 species For >4 species, the species tree topology can be different from the most like gene tree (called anomaly zone) (Degnan, 2013) Gorilla Human Chimp Orang. Rhesus 1. Break gene trees into ( n 4 ) quartets of species 2. Find the dominant tree for all quartets of taxa 3. Combine quartet trees Some tools (e.g.. BUCKy-p [Larget, et al., 2010]) 10

22 More than 4 species For >4 species, the species tree topology can be different from the most like gene tree (called anomaly zone) (Degnan, 2013) Gorilla Human Chimp Orang. Rhesus Alternative: 1. Break gene trees into ( n 4 ) quartets of species weight all 3( n 4 ) quartet topologies 2. Find the dominant tree for all quartets of taxa by their frequency and find the optimal tree 3. Combine quartet trees Some tools (e.g.. BUCKy-p [Larget, et al., 2010]) 10

23 Maximum Quartet Support Species Tree Optimization problem: Find the species tree with the maximum number of induced quartet trees shared with the collection of input gene trees Score(T )= mx Q(T ) \ Q(t i ) 1 the set of quartet trees induced by T a gene tree 11

24 Maximum Quartet Support Species Tree Optimization problem: Find the species tree with the maximum number of induced quartet trees shared with the collection of input gene trees Score(T )= mx Q(T ) \ Q(t i ) 1 the set of quartet trees induced by T a gene tree Theorem: Statistically consistent under the multispecies coalescent model when solved exactly 11

25 ASTRAL-I and ASTRAL-II [Mirarab, et al., Bioinformatics, 2014] [Mirarab and Warnow, Bioinformatics, 2015] Solve the problem exactly using dynamic programming Constrains the search space to make large datasets feasible The constrained version remains statistically consistent Running time: polynomially increases with the number of genes and the number of species 12

26 ASTRAL-I and ASTRAL-II [Mirarab, et al., Bioinformatics, 2014] [Mirarab and Warnow, Bioinformatics, 2015] Solve the problem exactly using dynamic programming Constrains the search space to make large datasets feasible The constrained version remains statistically consistent Running time: polynomially increases with the number of genes and the number of species ASTRAL-II: Increased the search space Improved the running time Can handle polytomies (lack of resolution) in input gene trees 12

27 Simulation study True (model) species tree True gene trees Sequence data Finch Falcon Owl Eagle Pigeon Finch Owl Falcon Eagle Pigeon Es mated species tree Es mated gene trees 13

28 Simulation study True (model) species tree True gene trees Sequence data Finch Falcon Owl Eagle Pigeon Finch Owl Falcon Eagle Pigeon Es mated species tree Es mated gene trees Evaluate using the FN rate: the percentage of branches in the true tree that are missing from the estimated tree 13

29 Number of species impacts estimation error in the species tree 1000 genes, medium levels of ILS, simulated species trees [S. Mirarab, T. Warnow, 2015] 14

30 ASTRAL: accurate and scalable Species tree topological error (FN) 16% 12% 8% 4% ASTRAL II MP EST genes, medium levels of ILS, simulated species trees [S. Mirarab, T. Warnow, 2015] 15

31 ASTRAL: accurate and scalable Species tree topological error (FN) 16% 12% 8% 4% ASTRAL II MP EST number of species 1000 genes, medium levels of ILS, simulated species trees [S. Mirarab, T. Warnow, 2015] 15

32 Comparison to concatenation: depends on the ILS level 30% 1000 genes 200 genes 50 genes Species tree topological error (FN) 20% 10% ASTRAL II NJst CA ML 0% L M H L M H L M H 10M 2M 500K 10M 2M 500K 10M 2M 500K tree length (controls the amount of ILS) more ILS more ILS more ILS 200 species, recent ILS 16

33 Comparison to concatenation: depends on the ILS level 30% 1000 genes 200 genes 50 genes Species tree topological error (FN) 20% 10% ASTRAL II NJst CA ML 0% L M H L M H L M H 10M 2M 500K 10M 2M 500K 10M 2M 500K tree length (controls the amount of ILS) more ILS more ILS more ILS 200 species, recent ILS 16

34 Horizontal Gene Transfer (HGT) [R. Davidson et al., BMC Genomics. 16 (2015)] Model violation: the simulated discordance is due to both ILS and randomly distributed HGT ~30% discordance ~35% ~55% ~70% 50 species, 10 genes

35 Horizontal Gene Transfer (HGT) [R. Davidson et al., BMC Genomics. 16 (2015)] Randomly distributed HGT is tolerated with enough genes # genes 50 species, varying # genes ~30% discordance ~55% ~35% ~70%

36 UCSD Work 1. Statistical support 2. ASTRAL-III Better running time GPU and CPU Parallelism Multi-Individual datasets 3. Impact of fragmentary data 19

37 Branch support Traditional approach: Multi-locus bootstrapping (MLBS) Slow: requires bootstrapping all genes (e.g., 100 m ML trees) Inaccurate and hard to interpret [Mirarab et al., Sys bio, 2014; Bayzid et al., PLoS One, 2015] We can do better! 20 [Mirarab et al., Sys bio, 2014]

38 Local posterior probability Recall quartet frequencies follow a multinomial distribution m 1 = 80 m 2 = 63 m 3 = 57 Chimp Gorilla Orang. Chimp Gorilla Chimp Human θ 1 Orang. Human Gorilla Human θ 2 θ 3 Orang. P ( topology seen in m 1 / m gene trees is the species tree ) = P ( θ 1 > 1/3 ) 21

39 Local posterior probability Recall quartet frequencies follow a multinomial distribution m 1 = 80 m 2 = 63 m 3 = 57 Chimp Gorilla Orang. Chimp Gorilla Chimp Human Orang. Gorilla Human Orang. P ( topology seen in m 1 / m gene trees is the species tree ) = P ( θ 1 > 1/3 ) θ 1 Human Can be analytically solved θ 2 θ 3 We implemented this idea in astral and called the measure the local posterior probability (localpp) 21

40 Quartet support v.s. localpp quartet frequency Increased number of genes (m) increased support Decreased discordance increased support 22

41 localpp is more accurate than bootstrapping 1.00 MLBS Local PP Recall X faster False Positive Rate Avian simulated dataset (48 taxa, 1000 genes) [Sayyari and Mirarab, MBE, 2016] 23

42 ASTRAL can also estimate internal branch lengths estimated branch length (log scale) True gene trees true branch length (log scale) With true gene trees, ASTRAL correctly estimates BL 24

43 ASTRAL can also estimate internal branch lengths estimated branch length (log scale) low gene tree error Medium g.t. error Moderate g.t. error True gene trees High gene tree error true branch length (log scale) With error-prone With true estimated gene trees, gene ASTRAL trees, correctly ASTRAL estimates underestimates BL BL 24

44 ASTRAL-III and beyond (unpublished work) Maryam Rabiee Hashemi John Yin 25 Chao Zhang

45 ASTRAL-III Chao Zhang Improved running time (to be released in July) Can handle polytomies without slowing down Can exploit similarities between gene trees to reduce running time 26

46 ASTRAL-III Chao Zhang Improved running time (to be released in July) Can handle polytomies without slowing down Can exploit similarities between gene trees to reduce running time To be released in August Handling datasets with multiple individuals per species GPU and CPU parallelism 26

47 Low support branches Does it help to contract branches with low support? Yes, but only for 1000 very low supports 16% Mean GT error Low (<50%) Theoretical justifications not clear Species tree error (FN ratio) 12% 8% High (>50%) Simulations: 100 taxa, simphy, ILS: around 46% true discordance FastTree, support from bootstrapping 4% non contraction threshold 27

48 Low support branches Does it help to contract branches with low support? Yes, but only for very low supports 25% 16% Mean GT error Low (<50%) Theoretical justifications not clear Species tree error (FN ratio) 20% 12% 15% 8% 10% High (>50%) Simulations: 100 taxa, simphy, ILS: around 46% true discordance FastTree, support from bootstrapping 4% non contraction threshold 27

49 Multiple individuals Maryam Rabiee Hashemi What if we sample multiple individuals from each species? In recently diverged species individuals may have different trees for each gene Sampling multiple individuals may provide extra signal 28

50 Multiple individuals helpful? Species tree error (RF distance) ind. 2 ind. 5 ind. Variable effort 500 genes; 200 species Fixed effort genes x inds = 1000; 200 species Fixed effort 500 genes; species x inds = 200 Yes, it marginally helps accuracy 29

51 Multiple individuals helpful? Species tree error (RF distance) ind. 2 ind. 5 ind. Variable effort 500 genes; 200 species Fixed effort genes x inds = 1000; 200 species Fixed effort 500 genes; species x inds = 200 Yes, it marginally helps accuracy But not if sequencing effort is kept fixed 29

52 Multiple individuals helpful? Species tree error (RF distance) ind. 2 ind. 5 ind. Variable effort 500 genes; 200 species Fixed effort genes x inds = 1000; 200 species Fixed effort 500 genes; species x inds = 200 Yes, it marginally helps accuracy But not if sequencing effort is kept fixed 29

53 Parallelism 25 John Yin GPU (1000 species) 20 GPU (500 species) 15 Speed up species 500 species # CPU cores (comet) Can infer trees with 10,000 species & 400 genes in a day 30

54 Impact of fragmentary data Erfan Sayyari James B. Whitfield 31

55 Two types of missing data Missing genes: a taxon misses the gene fully (taxon occupancy) Figure from Hosner et al, 2016 Fragmentary data: the species is partially sequenced for some genes 32

56 Does missing data matter? Missing genes: Evidence suggests summary methods are often robust 33

57 Does missing data matter? Missing genes: Evidence suggests summary methods are often robust Fragmentary data: Less extensively studied Hosner et al. indicated: Fragments matter They suggest removing genes with many fragments 33

58 Our study Empirical data: An insect dataset of 144 species and 1478 genes 600 million years of evolution Transcriptomics, so plenty of missing data of both types Simulated data: 100 taxon, medium ILS, 1000 genes Added fragmentation with patterns similar to the insect data 34

59 Fragmentary data hurt gene trees and species trees 1.00 Gene tree error NRF Distance Species tree error: 2.8% error 0.25 no filtering orig seq with fragments Filtering method without fragments 35

60 Fragmentary data hurt gene trees and species trees 1.00 Fragmentary data dramatically increase gene tree and the species tree error Gene tree error NRF Distance Species tree error: 7.0% error Species tree error: 2.8% error 0.25 no filtering orig seq with fragments Filtering method without fragments 35

61 How do we solve this? Hosner et al. suggested removing entire genes. This is too conservative in our opinion. 36

62 How do we solve this? Hosner et al. suggested removing entire genes. This is too conservative in our opinion. We simply remove the fragmentary sequence from the gene and keep the rest of the gene What is fragmentary? Anything that has at most X% of sites We study different X values 36

63 Filtering helps to some level (simulations) FastTree RAxML 0.6 NRF Distance no filtering orig seq Filtering method Gene tree error 37

64 Filtering helps to some level (simulations) FastTree RAxML 0.06 NRF Distance no filtering orig seq Filtering method no-filter Gene tree error Species tree error 37

65 Filtering helps to some level (simulations) FastTree RAxML 0.06 NRF Distance Novel results: RAxML is much better than 0.04 FastTree 0.03 no filtering orig seq Filtering method no-filter Gene tree error Species tree error 37

66 How about real data? 38

67 Gene trees seem to improve (less gene tree discordance) % filtering Proportion of relevant gene trees no-filtering Diptera Mecoptera Siphonaptera Lepidoptera Trichoptera Coleoptera Strepsiptera Neuropterida Hymenoptera Psocodea Hemiptera Thysanoptera Blattodea Mantodea Grylloblattodea Orthoptera Phasmida Embiidina Plecoptera Dermaptera Ephemeroptera Odonata Thysanura Archaeognatha Diplura Collembola Important Ingroup A B C B/C D D/B/C E F G H I J K L M N P Strongly Supported Weakly Supported Weakly Reject Strongly Reject 39

68 Gene trees seem to improve (less gene tree discordance) % filtering Proportion of relevant gene trees no-filtering Diptera Mecoptera Siphonaptera Lepidoptera Trichoptera Coleoptera Strepsiptera Neuropterida Hymenoptera Psocodea Hemiptera Thysanoptera Blattodea Mantodea Grylloblattodea Orthoptera Phasmida Embiidina Plecoptera Dermaptera Ephemeroptera Odonata Thysanura Archaeognatha Diplura Collembola Important Ingroup A B C B/C D D/B/C E F G H I J K L M N P Strongly Supported Weakly Supported Weakly Reject Strongly Reject 39

69 The species tree also improves ASTRAL trees differed with concatenation initially Ingroup A B C B/C D D/B/C After filtering, ASTRAL and concatenation were very similar E F G H I J K L Differences on controversial nodes M N P no-filtering no-filtering RAxML+ASTRAL is more robust than FastTree+ASTRAL 40 FastTree Weakly rejected (<0.95) Supported Strongly rejected (>=0.95) 0% 50% 100% RAxML

70 Visualizing discordance: DiscoVista Erfan Sayyari James B. Whitfield 41

71 DiscoVista Simple and interpretable Easy to run Focus on important branches 42

72 Quartet-based gene tree discordance Input: A species tree A set of gene trees Mapping of species to groups of interest Output: Quartet frequency per interesting branches Dataset (plants): Wicket et al, PNAS,

73 Compare different datasets Datasets (metazoa): Rouse et al, Nature, 2016, Cannon et al, Nature,

74 More traditional gene tree discordance measures Input: Gene trees Hypothetical clades Output: What percent of gene trees are compatible with those clades Dataset (insects): Misof et al, Science,

75 Species tree comparison Datasets (metazoa): Rouse et al, Nature, 2016, Cannon et al, Nature,

76 Summarize ASTRAL reconstructs species trees from gene trees with both high accuracy scalability to large datasets robust to some levels of model violations Like any other summary method, ASTRAL remains sensitive to gene tree estimation error try best to obtain highly accurate gene trees removing fragmentary data seems to help Visualizing discordance may prove illuminating 47

77 Future of ASTRAL Can it be changed to use characters directly? More broadly, can alternative scalable methods be developed for better gene tree estimation? 48

78 Future of ASTRAL Can it be changed to use characters directly? More broadly, can alternative scalable methods be developed for better gene tree estimation? ASTRAL scales to 10K leaves. We have 90K bacterial genomes. Can we scale further? 48

79 Tandy Warnow James B. Whitfield Maryam Rabiee Hashemi Chao Zhang S.M. Bayzid Théo Zimmermann John Yin Erfan Sayyari Shubhanshu Shekhar

80 Theoretical sample complexity results How many genes are enough to reconstruct the tree? ε # genes Balanced Caterpillar Theoretical f

81 Gene trees seem to improve (bootstrap) d 50% 40% Percent 30% 20% 10% no filtering Fragment filtering threshold Bootstrap support =0 <=33 >=75 =100 51

82 Gene trees seem to improve (less gene tree discordance) ) FN distance between species and gene trees occupancy no-filtering Filtering method 52

83 Impact of gene tree error (using true gene trees) ASTRAL II ASTRAL II (true gt) CA ML Species tree topological error (FN) Species tree topo Low ILS Medium ILS High ILS genes 53

84 Impact of gene tree error (using true gene trees) ASTRAL II ASTRAL II (true gt) CA ML Species tree topological error (FN) Species tree topo Low ILS Medium ILS High ILS genes When we divide our 50 replicates into low, medium, or high gene tree estimation error, ASTRAL tends to be better with low error 53

85 Running time (ho ASTRAL-I versus ASTRAL-II genes 1e 07 ASTRAL I ASTRAL II ASTRAL II + true st Species tree topological error (FN) 0.8 igure S2: 0.6 Comparison of various variants of ASTRAL with 200 taxa and varying tree sha 0.0 Low ILS Medium ILS High ILS nd number 0.4 of genes. Species tree accuracy (top) and running times (bottom) are shown. ASTRALrue st shows 0.2the case where the true species tree is added to the search space; this is included to approxim n ideal (e.g. exact) solution to the quartet problem genes s) species, deep ILS 54 1e

86 Running time (ho ASTRAL-I versus ASTRAL-II genes 1e 07 ASTRAL I ASTRAL II ASTRAL II + true st Species tree topological error (FN) 0.8 igure S2: 0.6 Comparison of various variants of ASTRAL with 200 taxa and varying tree sha 0.0 Low ILS Medium ILS High ILS nd number 0.4 of genes. Species tree accuracy (top) and running times (bottom) are shown. ASTRALrue st shows 0.2the case where the true species tree is added to the search space; this is included to approxim n ideal (e.g. exact) solution to the quartet problem genes s) species, deep ILS 54 1e

87 Dynamic programming L A L-A 55

88 Dynamic programming L A L-A A A-A A A-A A A-A Recursively break subsets of species into smaller subsets 55

89 Dynamic programming L A A L-A u L-A A-A A A-A A A-A A A-A Recursively break subsets of species into smaller subsets w(u) : Compare u against input gene trees and compute quartets from gene trees satisfied by u 55

90 Dynamic programming L Exponential A A L-A u L-A A-A A A-A A A-A A A-A Recursively break subsets of species into smaller subsets w(u) : Compare u against input gene trees and compute quartets from gene trees satisfied by u 55

91 Constrained version L A1 A2 A L-A A4 A10 A5 A6 A8 A9 A3 A A-A A A-A A A-A A7 56

92 Constrained version L A1 A2 A L-A A4 A10 A5 A6 A8 A9 A3 A A-A A A-A A A-A A7 Restrict branches in the species tree to a given constraint set X. 56

93 Deep ILS 30% 1000 genes 200 genes 50 genes Species tree topological error (FN) 20% 10% ASTRAL II NJst CA ML 10M 2M 500K 10M 2M 500K 10M 2M 500K tree length (controls the amount of ILS) 200 species, simulated species trees, deep ILS [Mirarab and Warnow, ISMB, 2015] 57

Fast coalescent-based branch support using local quartet frequencies

Fast coalescent-based branch support using local quartet frequencies Fast coalescent-based branch support using local quartet frequencies Molecular Biology and Evolution (2016) 33 (7): 1654 68 Erfan Sayyari, Siavash Mirarab University of California, San Diego (ECE) anzee

More information

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support Siavash Mirarab University of California, San Diego Joint work with Tandy Warnow Erfan Sayyari

More information

Upcoming challenges in phylogenomics. Siavash Mirarab University of California, San Diego

Upcoming challenges in phylogenomics. Siavash Mirarab University of California, San Diego Upcoming challenges in phylogenomics Siavash Mirarab University of California, San Diego Gene tree discordance The species tree gene1000 Causes of gene tree discordance include: Incomplete Lineage Sorting

More information

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

More information

Construc)ng the Tree of Life: Divide-and-Conquer! Tandy Warnow University of Illinois at Urbana-Champaign

Construc)ng the Tree of Life: Divide-and-Conquer! Tandy Warnow University of Illinois at Urbana-Champaign Construc)ng the Tree of Life: Divide-and-Conquer! Tandy Warnow University of Illinois at Urbana-Champaign Phylogeny (evolutionary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the Life Website,

More information

Anatomy of a species tree

Anatomy of a species tree Anatomy of a species tree T 1 Size of current and ancestral Populations (N) N Confidence in branches of species tree t/2n = 1 coalescent unit T 2 Branch lengths and divergence times of species & populations

More information

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees Concatenation Analyses in the Presence of Incomplete Lineage Sorting May 22, 2015 Tree of Life Tandy Warnow Warnow T. Concatenation Analyses in the Presence of Incomplete Lineage Sorting.. 2015 May 22.

More information

From Gene Trees to Species Trees. Tandy Warnow The University of Texas at Aus<n

From Gene Trees to Species Trees. Tandy Warnow The University of Texas at Aus<n From Gene Trees to Species Trees Tandy Warnow The University of Texas at Aus

More information

Fragmentary Gene Sequences Negatively Impact Gene Tree and Species Tree Reconstruction

Fragmentary Gene Sequences Negatively Impact Gene Tree and Species Tree Reconstruction Fragmentary Gene Sequences Negatively Impact Gene Tree and Species Tree Reconstruction Erfan Sayyari, 1 James B. Whitfield, 2 and Siavash Mirarab*,1 1 Department of Electrical and Computer Engineering,

More information

Unit 3 Insect Orders

Unit 3 Insect Orders Unit 3 Insect Orders General Directions: 1. To complete this study guide, please read the assigned readings for Unit 3 and watch the lecture. If you need additional information to complete this study guide,

More information

Workshop III: Evolutionary Genomics

Workshop III: Evolutionary Genomics Identifying Species Trees from Gene Trees Elizabeth S. Allman University of Alaska IPAM Los Angeles, CA November 17, 2011 Workshop III: Evolutionary Genomics Collaborators The work in today s talk is joint

More information

Phylogenetic Geometry

Phylogenetic Geometry Phylogenetic Geometry Ruth Davidson University of Illinois Urbana-Champaign Department of Mathematics Mathematics and Statistics Seminar Washington State University-Vancouver September 26, 2016 Phylogenies

More information

New methods for es-ma-ng species trees from genome-scale data. Tandy Warnow The University of Illinois

New methods for es-ma-ng species trees from genome-scale data. Tandy Warnow The University of Illinois New methods for es-ma-ng species trees from genome-scale data Tandy Warnow The University of Illinois Phylogeny (evolu9onary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the Life Website,

More information

Constrained Exact Op1miza1on in Phylogene1cs. Tandy Warnow The University of Illinois at Urbana-Champaign

Constrained Exact Op1miza1on in Phylogene1cs. Tandy Warnow The University of Illinois at Urbana-Champaign Constrained Exact Op1miza1on in Phylogene1cs Tandy Warnow The University of Illinois at Urbana-Champaign Phylogeny (evolu1onary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the Life Website,

More information

The impact of missing data on species tree estimation

The impact of missing data on species tree estimation MBE Advance Access published November 2, 215 The impact of missing data on species tree estimation Zhenxiang Xi, 1 Liang Liu, 2,3 and Charles C. Davis*,1 1 Department of Organismic and Evolutionary Biology,

More information

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu Mul$ple Sequence Alignment Methods Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu Species Tree Orangutan Gorilla Chimpanzee Human From the Tree of the Life

More information

Jed Chou. April 13, 2015

Jed Chou. April 13, 2015 of of CS598 AGB April 13, 2015 Overview of 1 2 3 4 5 Competing Approaches of Two competing approaches to species tree inference: Summary methods: estimate a tree on each gene alignment then combine gene

More information

Taming the Beast Workshop

Taming the Beast Workshop Workshop and Chi Zhang June 28, 2016 1 / 19 Species tree Species tree the phylogeny representing the relationships among a group of species Figure adapted from [Rogers and Gibbs, 2014] Gene tree the phylogeny

More information

Phylogenetics in the Age of Genomics: Prospects and Challenges

Phylogenetics in the Age of Genomics: Prospects and Challenges Phylogenetics in the Age of Genomics: Prospects and Challenges Antonis Rokas Department of Biological Sciences, Vanderbilt University http://as.vanderbilt.edu/rokaslab http://pubmed2wordle.appspot.com/

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

CS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign

CS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign CS 581 Algorithmic Computational Genomics Tandy Warnow University of Illinois at Urbana-Champaign Today Explain the course Introduce some of the research in this area Describe some open problems Talk about

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Phylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign

Phylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign Phylogenomics, Multiple Sequence Alignment, and Metagenomics Tandy Warnow University of Illinois at Urbana-Champaign Phylogeny (evolutionary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the

More information

The Mathema)cs of Es)ma)ng the Tree of Life. Tandy Warnow The University of Illinois

The Mathema)cs of Es)ma)ng the Tree of Life. Tandy Warnow The University of Illinois The Mathema)cs of Es)ma)ng the Tree of Life Tandy Warnow The University of Illinois Phylogeny (evolu9onary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the Life Website, University of Arizona

More information

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics Tandy Warnow Founder Professor of Engineering The University of Illinois at Urbana-Champaign http://tandy.cs.illinois.edu

More information

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species DNA sequences II Analyses of multiple sequence data datasets, incongruence tests, gene trees vs. species tree reconstruction, networks, detection of hybrid species DNA sequences II Test of congruence of

More information

CS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign

CS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign CS 581 Algorithmic Computational Genomics Tandy Warnow University of Illinois at Urbana-Champaign Course Staff Professor Tandy Warnow Office hours Tuesdays after class (2-3 PM) in Siebel 3235 Email address:

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Estimating phylogenetic trees from genome-scale data

Estimating phylogenetic trees from genome-scale data Ann. N.Y. Acad. Sci. ISSN 0077-8923 ANNALS OF THE NEW YORK ACADEMY OF SCIENCES Issue: The Year in Evolutionary Biology Estimating phylogenetic trees from genome-scale data Liang Liu, 1,2 Zhenxiang Xi,

More information

Genome-scale Es-ma-on of the Tree of Life. Tandy Warnow The University of Illinois

Genome-scale Es-ma-on of the Tree of Life. Tandy Warnow The University of Illinois Genome-scale Es-ma-on of the Tree of Life Tandy Warnow The University of Illinois Phylogeny (evolu9onary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the Life Website, University of Arizona

More information

In comparisons of genomic sequences from multiple species, Challenges in Species Tree Estimation Under the Multispecies Coalescent Model REVIEW

In comparisons of genomic sequences from multiple species, Challenges in Species Tree Estimation Under the Multispecies Coalescent Model REVIEW REVIEW Challenges in Species Tree Estimation Under the Multispecies Coalescent Model Bo Xu* and Ziheng Yang*,,1 *Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China and Department

More information

On the variance of internode distance under the multispecies coalescent

On the variance of internode distance under the multispecies coalescent On the variance of internode distance under the multispecies coalescent Sébastien Roch 1[0000 000 7608 8550 Department of Mathematics University of Wisconsin Madison Madison, WI 53706 roch@math.wisc.edu

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Genome-scale Es-ma-on of the Tree of Life. Tandy Warnow The University of Illinois

Genome-scale Es-ma-on of the Tree of Life. Tandy Warnow The University of Illinois WABI 2017 Factsheet 55 proceedings paper submissions, 27 accepted (49% acceptance rate) 9 additional poster-only submissions (+8 posters accompanying papers) 56 PC members Each paper reviewed by at least

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting

Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting arxiv:1509.06075v3 [q-bio.pe] 12 Feb 2016 Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting Claudia Solís-Lemus 1 and Cécile Ané 1,2 1 Department of Statistics,

More information

Estimating phylogenetic trees from genome-scale data

Estimating phylogenetic trees from genome-scale data Estimating phylogenetic trees from genome-scale data Liang Liu 1,2, Zhenxiang Xi 3, Shaoyuan Wu 4, Charles Davis 3, and Scott V. Edwards 4* 1 Department of Statistics, University of Georgia, Athens, GA

More information

Quartet Inference from SNP Data Under the Coalescent Model

Quartet Inference from SNP Data Under the Coalescent Model Bioinformatics Advance Access published August 7, 2014 Quartet Inference from SNP Data Under the Coalescent Model Julia Chifman 1 and Laura Kubatko 2,3 1 Department of Cancer Biology, Wake Forest School

More information

Properties of Consensus Methods for Inferring Species Trees from Gene Trees

Properties of Consensus Methods for Inferring Species Trees from Gene Trees Syst. Biol. 58(1):35 54, 2009 Copyright c Society of Systematic Biologists DOI:10.1093/sysbio/syp008 Properties of Consensus Methods for Inferring Species Trees from Gene Trees JAMES H. DEGNAN 1,4,,MICHAEL

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

The Wonderful World of Insects. James A. Bethke University of California Cooperative Extension Farm Advisor Floriculture and Nursery San Diego County

The Wonderful World of Insects. James A. Bethke University of California Cooperative Extension Farm Advisor Floriculture and Nursery San Diego County The Wonderful World of Insects James A. Bethke University of California Cooperative Extension Farm Advisor Floriculture and Nursery San Diego County Taxonomy The Insects The Orders Part I Taxonomy Scientific

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

More information

Species Tree Inference using SVDquartets

Species Tree Inference using SVDquartets Species Tree Inference using SVDquartets Laura Kubatko and Dave Swofford May 19, 2015 Laura Kubatko SVDquartets May 19, 2015 1 / 11 SVDquartets In this tutorial, we ll discuss several different data types:

More information

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila

More information

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Comparative Methods on Phylogenetic Networks

Comparative Methods on Phylogenetic Networks Comparative Methods on Phylogenetic Networks Claudia Solís-Lemus Emory University Joint Statistical Meetings August 1, 2018 Phylogenetic What? Networks Why? Part I Part II How? When? PCM What? Phylogenetic

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

SEPP and TIPP for metagenomic analysis. Tandy Warnow Department of Computer Science University of Texas

SEPP and TIPP for metagenomic analysis. Tandy Warnow Department of Computer Science University of Texas SEPP and TIPP for metagenomic analysis Tandy Warnow Department of Computer Science University of Texas Phylogeny (evolutionary tree) Orangutan From the Tree of the Life Website, University of Arizona Gorilla

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

Understanding How Stochasticity Impacts Reconstructions of Recent Species Divergent History. Huateng Huang

Understanding How Stochasticity Impacts Reconstructions of Recent Species Divergent History. Huateng Huang Understanding How Stochasticity Impacts Reconstructions of Recent Species Divergent History by Huateng Huang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor

More information

CS 394C Algorithms for Computational Biology. Tandy Warnow Spring 2012

CS 394C Algorithms for Computational Biology. Tandy Warnow Spring 2012 CS 394C Algorithms for Computational Biology Tandy Warnow Spring 2012 Biology: 21st Century Science! When the human genome was sequenced seven years ago, scientists knew that most of the major scientific

More information

Organizing Life s Diversity

Organizing Life s Diversity 17 Organizing Life s Diversity section 2 Modern Classification Classification systems have changed over time as information has increased. What You ll Learn species concepts methods to reveal phylogeny

More information

Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets

Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets Xiaofan Zhou, 1,2 Xing-Xing Shen, 3 Chris Todd Hittinger, 4 and Antonis Rokas*,3 1 Integrative Microbiology

More information

Hokie Bugfest (October 17, 2015)

Hokie Bugfest (October 17, 2015) Hokie Bugfest (October 17, 2015) It s time to get collecting!! Start an insect collection and have it judged at the Hokie Bugfest on October 17. The Bugfest will be held from 10 a.m. to 5 p.m. at the Inn

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

Non-binary Tree Reconciliation. Louxin Zhang Department of Mathematics National University of Singapore

Non-binary Tree Reconciliation. Louxin Zhang Department of Mathematics National University of Singapore Non-binary Tree Reconciliation Louxin Zhang Department of Mathematics National University of Singapore matzlx@nus.edu.sg Introduction: Gene Duplication Inference Consider a duplication gene family G Species

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Efficient Bayesian Species Tree Inference under the Multispecies Coalescent

Efficient Bayesian Species Tree Inference under the Multispecies Coalescent Syst. Biol. 66(5):823 842, 2017 The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com

More information

Biology ENTOMOLOGY Dr. Tatiana Rossolimo, Class syllabus

Biology ENTOMOLOGY Dr. Tatiana Rossolimo,   Class syllabus Biology 3327.03 ENTOMOLOGY Dr. Tatiana Rossolimo, e-mail: trossoli@dal.ca Class syllabus Insects are the most biodiverse group of organisms on the Earth. They far surpass other terrestrial animals in abundance

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

394C, October 2, Topics: Mul9ple Sequence Alignment Es9ma9ng Species Trees from Gene Trees

394C, October 2, Topics: Mul9ple Sequence Alignment Es9ma9ng Species Trees from Gene Trees 394C, October 2, 2013 Topics: Mul9ple Sequence Alignment Es9ma9ng Species Trees from Gene Trees Mul9ple Sequence Alignment Mul9ple Sequence Alignments and Evolu9onary Histories (the meaning of homologous

More information

Hokie BugFest (October 20, 2018)

Hokie BugFest (October 20, 2018) Hokie BugFest (October 20, 2018) It s time to get collecting!! Start an insect collection and have it judged at the Hokie BugFest on October 20 th. The BugFest will be held from 10 a.m. to 5 p.m. at the

More information

A phylogenomic toolbox for assembling the tree of life

A phylogenomic toolbox for assembling the tree of life A phylogenomic toolbox for assembling the tree of life or, The Phylota Project (http://www.phylota.org) UC Davis Mike Sanderson Amy Driskell U Pennsylvania Junhyong Kim Iowa State Oliver Eulenstein David

More information

Large-scale gene family analysis of 76 Arthropods

Large-scale gene family analysis of 76 Arthropods Large-scale gene family analysis of 76 Arthropods i5k webinar / September 5, 28 Gregg Thomas @greggwcthomas Indiana University The genomic basis of Arthropod diversity https://www.biorxiv.org/content/early/28/8/4/382945

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

arxiv: v1 [math.st] 22 Jun 2018

arxiv: v1 [math.st] 22 Jun 2018 Hypothesis testing near singularities and boundaries arxiv:1806.08458v1 [math.st] Jun 018 Jonathan D. Mitchell, Elizabeth S. Allman, and John A. Rhodes Department of Mathematics & Statistics University

More information

Fine-Scale Phylogenetic Discordance across the House Mouse Genome

Fine-Scale Phylogenetic Discordance across the House Mouse Genome Fine-Scale Phylogenetic Discordance across the House Mouse Genome Michael A. White 1,Cécile Ané 2,3, Colin N. Dewey 4,5,6, Bret R. Larget 2,3, Bret A. Payseur 1 * 1 Laboratory of Genetics, University of

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

To link to this article: DOI: / URL:

To link to this article: DOI: / URL: This article was downloaded by:[ohio State University Libraries] [Ohio State University Libraries] On: 22 February 2007 Access Details: [subscription number 731699053] Publisher: Taylor & Francis Informa

More information

Techniques for generating phylogenomic data matrices: transcriptomics vs genomics. Rosa Fernández & Marina Marcet-Houben

Techniques for generating phylogenomic data matrices: transcriptomics vs genomics. Rosa Fernández & Marina Marcet-Houben Techniques for generating phylogenomic data matrices: transcriptomics vs genomics Rosa Fernández & Marina Marcet-Houben DE NOVO Raw reads Sanitize Filter Assemble Translate Reduce reduncancy Download DATABASES

More information

Isolating - A New Resampling Method for Gene Order Data

Isolating - A New Resampling Method for Gene Order Data Isolating - A New Resampling Method for Gene Order Data Jian Shi, William Arndt, Fei Hu and Jijun Tang Abstract The purpose of using resampling methods on phylogenetic data is to estimate the confidence

More information

SEPP and TIPP for metagenomic analysis. Tandy Warnow Department of Computer Science University of Texas

SEPP and TIPP for metagenomic analysis. Tandy Warnow Department of Computer Science University of Texas SEPP and TIPP for metagenomic analysis Tandy Warnow Department of Computer Science University of Texas Metagenomics: Venter et al., Exploring the Sargasso Sea: Scientists Discover One Million New Genes

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

WenEtAl-biorxiv 2017/12/21 10:55 page 2 #2

WenEtAl-biorxiv 2017/12/21 10:55 page 2 #2 WenEtAl-biorxiv 0// 0: page # Inferring Phylogenetic Networks Using PhyloNet Dingqiao Wen, Yun Yu, Jiafan Zhu, Luay Nakhleh,, Computer Science, Rice University, Houston, TX, USA; BioSciences, Rice University,

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University kubatko.2@osu.edu

More information

Misconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants

Misconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants Syst. Biol. 66(3):399 412, 2017 The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com

More information

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1-2010 Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci Elchanan Mossel University of

More information

Hexapoda Origins: Monophyletic, Paraphyletic or Polyphyletic? Rob King and Matt Kretz

Hexapoda Origins: Monophyletic, Paraphyletic or Polyphyletic? Rob King and Matt Kretz Hexapoda Origins: Monophyletic, Paraphyletic or Polyphyletic? Rob King and Matt Kretz Outline Review Hexapod Origins Response to Hexapod Origins How the same data = different trees Arthropod Origins The

More information

PhyloNet. Yun Yu. Department of Computer Science Bioinformatics Group Rice University

PhyloNet. Yun Yu. Department of Computer Science Bioinformatics Group Rice University PhyloNet Yun Yu Department of Computer Science Bioinformatics Group Rice University yy9@rice.edu Symposium And Software School 2016 The University Of Texas At Austin Installation System requirement: Java

More information

Methods to reconstruct phylogene1c networks accoun1ng for ILS

Methods to reconstruct phylogene1c networks accoun1ng for ILS Methods to reconstruct phylogene1c networks accoun1ng for ILS Céline Scornavacca some slides have been kindly provided by Fabio Pardi ISE-M, Equipe Phylogénie & Evolu1on Moléculaires Montpellier, France

More information

TheDisk-Covering MethodforTree Reconstruction

TheDisk-Covering MethodforTree Reconstruction TheDisk-Covering MethodforTree Reconstruction Daniel Huson PACM, Princeton University Bonn, 1998 1 Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document

More information

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome Dr. Dirk Gevers 1,2 1 Laboratorium voor Microbiologie 2 Bioinformatics & Evolutionary Genomics The bacterial species in the genomic era CTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAACATGTTATTCAG GTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAAATTATGTTTCCCATGCATCAGG

More information

Phylogenetic Assumptions

Phylogenetic Assumptions Substitution Models and the Phylogenetic Assumptions Vivek Jayaswal Lars S. Jermiin COMMONWEALTH OF AUSTRALIA Copyright htregulation WARNING This material has been reproduced and communicated to you by

More information

From Individual-based Population Models to Lineage-based Models of Phylogenies

From Individual-based Population Models to Lineage-based Models of Phylogenies From Individual-based Population Models to Lineage-based Models of Phylogenies Amaury Lambert (joint works with G. Achaz, H.K. Alexander, R.S. Etienne, N. Lartillot, H. Morlon, T.L. Parsons, T. Stadler)

More information

Multiple Sequence Alignment. Sequences

Multiple Sequence Alignment. Sequences Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe

More information