STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

Size: px
Start display at page:

Download "STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)"

Transcription

1 STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University June 7, 2013

2 What is STEM-hy? Assumptions and Methods Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Background: STEM s Hybrid Species Models

3 Assumptions and Methods What is STEM-hy? STEM-hy is a program to perform maximum likelihood analysis for estimation of the species tree from multilocus data under the coalescent process. It includes the capability of evaluating hybrid taxa. Basic functions: Return the ML species tree. Search the space of all species trees and return the k trees with the highest likelihoods found. Compute the likelihood of a user-specified tree with branch lengths. Find optimal branch lengths on a user-specified tree. Carry out a bootstrap analysis to obtain bootstrap support values for nodes in the species tree. Evaluate hypotheses of hybridization in a model selection framework.

4 Assumptions and Methods Assumptions No recombination within loci Free recombination between loci No gene flow following speciation Only source of variability in single-gene histories is due to the coalescence process There is a single θ for the entire tree, for each locus Evolutionary rates may vary across loci

5 Assumptions and Methods Methods: ML Estimate of the Species Tree Liu et al. (2009) showed that the ML estimate of the species tree can be computed by sequentially clustering minimum observed divergence times between pairs of species across genes. They have shown that when gene trees are known without error, the ML species tree is a consistent estimator. A similar result was obtained by Roch & Mossel (2010) they call their estimator the GLASS tree (an acronym for Global LAteSt Split, based on the algorithm they developed to compute it). STEM computes the ML estimate of the species tree this way.

6 Assumptions and Methods Methods: Estimation of ML Times for an Arbitrary Species Tree The results of Liu et al. (2009) can be extended to derive the ML estimates of the speciation times for an arbitrary species tree. Thus, the likelihood of any species tree can be readily computed by using this result to obtain ML branch lengths. This is important in that it allows us to compare alternative phylogenetic hypotheses.

7 Assumptions and Methods Methods: Searching Species Tree Space for Trees of High Likelihood A simulated annealing algorithm is used to search the space of all species trees for trees that have high likelihoods. The k best trees found during the search are saved and printed to a file (k is set by the user). Exploration of the likelihood surface is particularly important for many of these problems. The details of the simulated annealing algorithm are similar to those given in Salter & Pearl (2001).

8 Assumptions and Methods Features of STEM-hy No limits (that I know of) on the number of taxa or the number of loci. Can handle intraspecific sampling. Allows information concerning mutation rate for each locus to be used in the analysis. Can handle different taxon samples across genes. Version 1.1 is written in Java (using Clojure).

9 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Data Preparation - Gene Trees STEM-hy takes as its input one gene tree for each locus. Thus, a first step in an analysis using STEM-hy is to estimate gene trees with branch lengths for each locus. Any method can be used to do this, but note a couple requirements: Branch lengths are assumed to be in units of expected number of substitutions per site per unit time. Branch lengths must be estimated subject to a molecular clock. This is not checked by the program. Gene trees must be fully resolved; however, polytomies can be included by setting branch lengths to 0 for an arbitrary resolution of the polytomy.

10 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Data Preparation - Population Genetics Parameters A value of the parameter θ = 4Nµ must be provided. Note that this is the per-site θ, not a per-locus value as used by other population genetics programs. This will be used to convert gene tree branch lengths to coalescent units (number of 2N generations) by dividing all gene tree branch lengths by θ. Estimates of θ could be obtained by standard methods. Typical values of θ will be between and 0.1.

11 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Data Preparation - Population Genetics and Mutation Parameters Each locus can also be given a rate multiplier. These can adjust for Variation in mutation rate across loci. Ploidy (e.g., haploid loci mtdna should be given a rate of 0.5). At the least, one should estimate rate variation from the data by something like the following: Compute average pairwise sequence divergence of each sequence to the outgroup. Divide all of these values by their overall mean, and assign that number as the rate multiplier for each gene. Adjust specific genes for ploidy, if necessary.

12 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Start with a small example where we can work things out by hand Four species, eight lineages, and two loci (N = 2) Suppose that the gene trees for the two loci are

13 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis STEM Now we can run STEM and look at output First, let s compute the relevant distances by hand: {Dab 1 }: {Dab 2 }:

14 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis STEM Now we can run STEM and look at output First, let s compute the relevant distances by hand: S1 S2 S3 S4 S S S3-1.2 S S1 S2 S3 S4 S S S3-1.1 S4 -

15 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis STEM Now we can run STEM and look at output First, let s compute the relevant distances by hand: S1 S2 S3 S4 S S S3-1.2 S4 - S1 S2 S3 S4 S S S3-1.1 S4 - S1 S2 S3 S4 S S S3-1.1 S4 -

16 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis STEM First, let s compute the relevant distances by hand: S1 S2 S3 S4 S S S3-1.2 S4 - S1 S2 S3 S4 S S S3-1.1 S4-1.2 S1 S2 S3 S4 S S S3-1.1 S

17 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Step 1: Prepare the gene trees Option 1: Place all gene trees in a single file called genetrees.tre: Newick format required One gene tree per line Rate multipliers must be given in brackets in front of each gene tree [1.0](((Name1: ,Name2: ): ,(Name3: ,Name4: ): ):0.0010, ((Name5:0.0010,Name6:0.0010):0.0014,(MyName7:0.0012,Name8:0.0012):0.0012): ); [1.0]((((Name1: ,Name2: ): ,(Name3:0.0012,Name4:0.0012): ):0.0003, (Name5:0.0010,Name6:0.0010): ): ,(MyName7:0.0011,Name8:0.0011):0.0024);

18 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Step 1: Prepare the gene trees Option 2: Place sets of gene trees in separate files File names will be supplied to STEM-hy in the settings file Rate multipliers will also be supplied in the settings file All genes in a single file are assumed to have the same rate genetrees1.tre: (((Name1: ,Name2: ): ,(Name3: ,Name4: ): ):0.0010, ((Name5:0.0010,Name6:0.0010):0.0014,(MyName7:0.0012,Name8:0.0012):0.0012): ); genetrees2.tre: ((((Name1: ,Name2: ): ,(Name3:0.0012,Name4:0.0012): ):0.0003, (Name5:0.0010,Name6:0.0010): ): ,(MyName7:0.0011,Name8:0.0011):0.0024);

19 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Step 2: Prepare the settings file - input option 1 yaml format: headings with indented parameters defined below properties: species: run: 1 #0=user-tree, 1=MLE, 2=search, 3=hybridization, 4=bootstrap theta: num saved trees: 15 beta: seed: Species1: Name1, Name2, Name3 Species2: Name4, Name5 Species3: Name6, MyName7 Species4: Name8

20 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Step 2: Prepare the settings file - input option 2 yaml format: headings with indented parameters defined below properties: species: files: run: 1 #0=user-tree, 1=MLE, 2=search, 3=hybridization, 4=bootstrap theta: num saved trees: 15 beta: seed: Species1: Name1, Name2, Name3 Species2: Name4, Name5 Species3: Name6, MyName7 Species4: Name8 genetrees1.tre: 1.0 # notice the space after each : genetrees2.tre: 1.0

21 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Step 2: Prepare the settings file yaml format: headings with indented parameters defined below properties: species: files: run: 1 #0=user-tree, 1=MLE, 2=search, 3=... theta: num saved trees: 15 beta: seed: Species1: Name1, Name2, Name3 Species2: Name4, Name5 Species3: Name6, MyName7 Species4: Name8 genetrees1.tre: 1.0 # notice the space after each : genetrees2.tre: 1.0 Some parameters will only be used for certain run settings. They are ignored otherwise, and can be omitted from the settings file.

22 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis - Results Analysis 1: Find the ML species tree (run with run: 1) Run at the command line with: java -jar stem-hy.jar *************************************** ** Welcome to STEM 2.0 ** *************************************** The settings file was successfully parsed... Using theta = The settings file contained 4 species and 8 lineages. The species-to-lineage mappings are: Species4: Name8 Species3: MyName7, Name6 Species2: Name4, Name5 Species1: Name1, Name2, Name3

23 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis - Results Analysis 1: Find the ML species tree (run with run: 1) Run at the command line with: java -jar stem.jar Results are written to the file mle.tre ****************Results***************** D AB Matrix: [ ] [ ] [ ] [ ] Likelihood Species Tree (Newick format): (Species1: ,(Species4: ,(Species2: ,Species3: ): ): ); Log likelihood for tree: ****************** Done ****************

24 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis - Results Analysis 2: Find likelihood of all 15 trees (run with run: 2) Output files: *************************************** ** Welcome to STEM 2.0 ** *************************************** The settings file was successfully parsed Beginning search now (this could take a while)... Search completed. Here are the results (also written to file search.tre ): [ ] (Species1: ,(Species4: ,(Species2: ,Species3: ): ): ); [ ] (Species1: ,(Species3: ,(Species2: ,Species4: ): ): ); [ ] ((Species4: ,Species1: ): ,(Species2: ,Species3: ): ); [ ] (Species4: ,(Species1: ,(Species2: ,Species3: ): ): ); [ ] (Species4: ,(Species2: ,(Species1: ,Species3: ): ): ); [ ] ((Species1: ,Species3: ): ,(Species2: ,Species4: ): ); [ ] (Species3: ,(Species1: ,(Species2: ,Species4: ): ): ); [ ] (Species2: ,((Species1: ,Species4: ): ,Species3: ): ); [ ] (Species2: ,((Species1: ,Species3: ): ,Species4: ): );

25 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis - Results Analysis 3: Find the likelihood of a particular species tree Place the tree(s) of interest in the file user tree in the same directory as STEM-hy ((Species1: ,Species3: ): ,(Species2: ,Species4: ): ); Branch lengths must be included. STEM-hy gives the likelihood of the tree with the user-specified branch lengths, as well as the ML branch lengths along the user tree.

26 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis - Results *************************************** ** Welcome to STEM 2.0 ** *************************************** The settings file was successfully parsed Read 1 species tree[s] from user.tre ****************Results***************** User tree: ((Species1: ,Species3: ): ,(Species2: ,Species4: ): ) Log likelihood for tree: **************Optimized Trees************ Optimized user tree: ((Species1: ,Species3: ): ,(Species2: ,Species4: ): ); Log likelihood: ****************** Done ****************

27 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Example 2: Missing Data Example genetrees.tre: [1.0](((Name1: ,Name2: ): ,(Name3: ,Name4: ): ):0.0010, ((Name5:0.0010,Name6:0.0010):0.0014,(MyName7:0.0012,Name8:0.0012):0.0012): ); [1.0]((((Name1: ,Name2: ): ,(Name3:0.0012,Name4:0.0012): ): , (Name5:0.0010,Name6:0.0010): ); [1.0](((Name1: ,Name2: ): ,(Name3: ,Name4: ): ):0.0010, ((Name5:0.0010,Name6:0.0010):0.0014,(MyName7:0.0012,Name8:0.0012):0.0012): ); [1.0]((((Name1: ,Name2: ): ,(Name3:0.0012,Name4:0.0012): ):0.0003, (Name5:0.0010,Name6:0.0010): ): ,(MyName7:0.0011,Name8:0.0011):0.0024); [1.0](((Name1: ,Name2: ): ,(Name3: ,Name4: ): ):0.0010, ((Name5:0.0010,Name6:0.0010):0.0014,(MyName7:0.0012,Name8:0.0012):0.0012): ); [1.0]((((Name1: ,Name2: ): ,(Name3:0.0012,Name4:0.0012): ):0.0003, (Name5:0.0010,Name6:0.0010): ): ,(MyName7:0.0011,Name8:0.0011):0.0024); [1.0](((Name1: ,Name2: ): ,(Name3: ,Name4: ): ):0.0010, ((Name5:0.0010,Name6:0.0010):0.0014,(MyName7:0.0012,Name8:0.0012):0.0012): ); [1.0]((((Name1: ,Name2: ): ,(Name3:0.0012,Name4:0.0012): ):0.0003, (Name5:0.0010,Name6:0.0010): ): ,(MyName7:0.0011,Name8:0.0011):0.0024);

28 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Example 2: Missing Data Example Look at gene trees: Name8 MyName7 Name6 Name5 Name4 Name3 Name2 Name1 Name6 Name5 Name4 Name3 Name2 Name1 Name8 MyName7 Name6 Name5 Name4 Name3 Name2 Name1 4 loci 1 locus 3 loci

29 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Example 2: Missing Data Example Note: The settings file remains unchanged. Below is the output. ****************Results***************** D AB Matrix: [ ] [ ] [ ] [ ] Maximum Likelihood Species Tree (Newick format): (Species1: ,(Species4: ,(Species2: ,Species3: ): ): ); log likelihood for tree: ****************** Done ****************

30 What is STEM-hy? Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Example Data: Heliconius Butterflies ABCD 3 2 BCD BD CD 1 H. hecale H. melpomene H. heurippa H. cydno

31 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Example 3: Bootstrap Analysis The current version of STEM-hy can be used to estimate bootstrap proportions on the ML tree, as well as to construct a bootstrap consensus tree. Sequence data must be provided in PHYLIP format (separate files need to be used for each gene). Each gene is bootstrapped a user-specified number of times, B, to produce B bootstrap samples (alignments) for each gene. Gene trees are estimated for each bootstrap sample using the program SSA. This program uses a simulated annealing method to estimate gene trees under the assumption of a molecular clock. B species trees are reconstructed using STEM-hy and printed to both the screen and to the file bootstrap.results.

32 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Example 3: Bootstrap Analysis For this example, we ll consider four taxa and six genes in Heliconius butterflies. The settings file is shown below, with changes in blue properties: species: run: 4 #0=user-tree, 1=MLE, 2=search, 3=hybridization, 4=bootstrap bootstrap samples: 100 phylip files: co 4tax.phy,dll 4tax.phy,inv 4tax.phy,sd 4tax.phy,tpi 4tax.phy,white 4tax.phy theta: 0.01 num saved trees: 15 beta: seed: H. melpomene: M95 H. hecale: Hh H. cordula: M187 H. heurippa: Strib40

33 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Example 3: Bootstrap Analysis Below is the output. All bootstrap trees are written to a file called bootstrap.results and can be read into another program and summarized.... The species-to-lineage mappings are: H. heurippa: Strib40 H. cordula: M187 H. hecale: Hh H. melpomene: M95 Bootstrapping trees (this might take a while)... ****************Results***************** The maximum likelihood species tree estimate is: (H. hecale: ,(h. melpomene: ,(h. heurippa: ,h. cordula: ): ): ); The 100 bootstrapped species trees: (H. heurippa: ,(h. hecale: ,(h. melpomene: ,h. cydno: ): ): ); (H. hecale: ,(h. melpomene: ,(h. heurippa: ,h. cydno: ): ): );

34 Data Preparation Example 2: Small Example with Missing Data Example 3: Bootstrap Analysis Some Notes on Program Versions There are some important differences between STEMv1.1a and STEMv2.0/STEM-hyv1.0 Multifurcations are handled differently. STEM v1.1a and lower: Zero-length branches are set to STEMv2.0 / STEM-hyv1.0: Zero-length branches are treated as missing data. Other big differences are improvements to input format and increased functionality in later versions.

35 Background: STEM s Hybrid Species Models STEM s Hybrid Species Model τ γ τ A B C P(C(AB)) = 1 (2/3)exp( τ) P(A(BC))=(1/3)exp( τ) P(B(AC))=(1/3)exp( τ) Mutation Process A B C 1 γ A τ B C P(C(AB))=(1/3)exp( τ) P(A(BC))=1 (2/3)exp( τ) P(B(AC))=(1/3)exp( τ) Mutation Process

36 Background: STEM s Hybrid Species Models STEM s Hybrid Species Model Species tree subject to hybridization τ γ τ A B C P(C(AB)) = 1 (2/3)exp( τ) P(A(BC))=(1/3)exp( τ) P(B(AC))=(1/3)exp( τ) Mutation Process A B C 1 γ A τ B C P(C(AB))=(1/3)exp( τ) P(A(BC))=1 (2/3)exp( τ) P(B(AC))=(1/3)exp( τ) Mutation Process

37 Background: STEM s Hybrid Species Models STEM s Hybrid Species Model Hybridization parameter to model the extent of the contribution from each parent τ γ τ A B C P(C(AB)) = 1 (2/3)exp( τ) P(A(BC))=(1/3)exp( τ) P(B(AC))=(1/3)exp( τ) Mutation Process A B C 1 γ A τ B C P(C(AB))=(1/3)exp( τ) P(A(BC))=1 (2/3)exp( τ) P(B(AC))=(1/3)exp( τ) Mutation Process

38 Background: STEM s Hybrid Species Models STEM s Hybrid Species Model Possible parental species trees τ γ τ A B C P(C(AB)) = 1 (2/3)exp( τ) P(A(BC))=(1/3)exp( τ) P(B(AC))=(1/3)exp( τ) Mutation Process A B C 1 γ A τ B C P(C(AB))=(1/3)exp( τ) P(A(BC))=1 (2/3)exp( τ) P(B(AC))=(1/3)exp( τ) Mutation Process

39 Background: STEM s Hybrid Species Models STEM s Hybrid Species Model Probabilities associated with each gene tree topology for each parental tree under the coalescent model τ γ τ A B C P(C(AB)) = 1 (2/3)exp( τ) P(A(BC))=(1/3)exp( τ) P(B(AC))=(1/3)exp( τ) Mutation Process A B C 1 γ A τ B C P(C(AB))=(1/3)exp( τ) P(A(BC))=1 (2/3)exp( τ) P(B(AC))=(1/3)exp( τ) Mutation Process

40 Background: STEM s Hybrid Species Models STEM s Hybrid Species Model Sequence evolution proceeds along gene trees τ γ τ A B C P(C(AB)) = 1 (2/3)exp( τ) P(A(BC))=(1/3)exp( τ) P(B(AC))=(1/3)exp( τ) Mutation Process A B C 1 γ A τ B C P(C(AB))=(1/3)exp( τ) P(A(BC))=1 (2/3)exp( τ) P(B(AC))=(1/3)exp( τ) Mutation Process

41 Background: STEM s Hybrid Species Models Inference of Trees Subject to Hybridization Assumptions: Hybridization results in a mosaic genome, so that a sampled gene has a probability distribution that its history originated from one of several parental species trees Genes in the sample are independent given the species tree Hybridization events happen only between sister taxa No factors other than coalescence and hybridization lead to incongruence between gene trees and the species tree

42 Background: STEM s Hybrid Species Models Likelihood Calculation for the Three-taxon Case Let f (g i S) be the probability density of gene tree g i given species tree S under the coalescent model (Rannala and Yang, 2003)

43 Background: STEM s Hybrid Species Models Likelihood Calculation for the Three-taxon Case Let f (g i S) be the probability density of gene tree g i given species tree S under the coalescent model (Rannala and Yang, 2003) The likelihood function for the three-taxon case is N {γf (g i S 1 ) + (1 γ)f (g i S 2 )} i=1 where S 1 and S 2 are two possible parental species trees γ [0, 1]

44 Background: STEM s Hybrid Species Models Likelihood Calculation for the Three-taxon Case N {γf (g i S 1 ) + (1 γ)f (g i S 2 )} i=1 τ γ τ A B C f(g S1) Mutation Process A B C 1 γ A τ B C f(g S2) Mutation Process

45 Background: STEM s Hybrid Species Models Beyond Three Taxa... Propose a method which incorporates any number of hybridization events, provided they occur between sister taxa Each putative hybridization event is assigned a parameter, γ 1, γ 2,... The likelihood is computed by looking at all combinations of possible parental species trees, weighted appropriately by the γ j parameters

46 Background: STEM s Hybrid Species Models A Bigger Example Motivating example: A B C D E F A B C D E F A B C D E F A B C D E F

47 Background: STEM s Hybrid Species Models A Bigger Example Consider the hybrid species tree: Motivating example: A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F

48 Background: STEM s Hybrid Species Models The Likelihood Function S 1 S 3 A B C D E F A B C D E F γ 1 γ 2 S 2 A B C D E F (1 γ 1 )γ 2 S 4 A B C D E F γ 1 1 γ 2 ) A B C D E F (1 γ 1 )(1 γ 2 ) N i=1 {γ 1 γ 2 f (g i S 1 ) + γ 1 (1 γ 2 )f (g i S 2 ) +(1 γ 1 )γ 2 f (g i S 3 ) + (1 γ 1 )(1 γ 2 )f (g i S 4 )}

49 Background: STEM s Hybrid Species Models Comments on Computation Parameters in the likelihood function: γ 1, γ 2, branch lengths For a given hybrid species tree and sample of gene trees with divergence times, maximum likelihood branch lengths can be analytically determined Fitting the likelihood model for a hypothesized hybrid species tree only requires optimization of γ parameters Implemented in a modified version of the program STEM, called STEM-hy

50 Background: STEM s Hybrid Species Models Selecting the Best Hybrid Species Tree For the example hybrid species tree, pick the best hybrid model from among possible models using the AIC: Model Tree γ 1 γ 2 Number of Parameters 1 A B C D E F A B C D E F A B C D E F A B C D E F 1 1 5

51 Background: STEM s Hybrid Species Models Selecting the Best Hybrid Species Tree Model Tree γ 1 γ 2 Number of Parameters A B C D E F 5 0 (0,1) 6 A B C D E F 6 1 (0,1) 6 A B C D E F 7 (0,1) 0 6 A B C D E F 8 (0,1) 1 6 A B C D E F 9 (0,1) (0,1) 7

52 Background: STEM s Hybrid Species Models STEM-hy: Assumptions In practice, the γ i are not given (neither are times of speciation or hybridization events). The algorithm finds MLEs for these parameters. STEM-hy inherits all of STEM-hy s other assumptions (e.g., no gene flow after speciation if no hybridization, gene tree variability is not taken into consideration, etc.).

53 Background: STEM s Hybrid Species Models STEM-hy: Assumptions One important point: STEM-hy looks for evidence of hybridization in the presence of incomplete lineage sorting. By using the model in STEM-hy to compute likelihoods, the coalescent process is incorporated. The AIC is used to compare models: AIC = 2lnL(M D) + 2k where M is the model and D is the data. LnL(M D) is the likelihood from STEM-hy for the hybridization model under consideration.

54 Background: STEM s Hybrid Species Models Input data format is the same as for previous analyses: Gene trees are placed in the file called genetrees.tre (option 1) or the files containing the gene trees are listed in the settings file (option 2). The settings file (in yaml format) is used to give user settings (e.g., θ). The run option is set to 3.

55 Background: STEM s Hybrid Species Models The user must additionally provide information about hybridization: The only option at present is to use a user-specified tree the present version of the program assumes that the overall species phylogeny is known. The user-specified tree is one of the possible parental trees it doesn t matter which one. The putative hybrid species are identified in the settings.yaml file.

56 What is STEM-hy? Background: STEM s Hybrid Species Models ABCD 3 2 BCD BD CD 1 H. hecale H. melpomene H. heurippa H. cydno

57 Background: STEM s Hybrid Species Models STEM-hy Example: Heliconius Butterflies Example genetrees.tre file: [ ]((Hheurippa: ,(Hcydno: ,Hmelpomene: ): ): ,Hhecale: ); [ ]((Hmelpomene: ,(Hcydno: ,Hheurippa: ): ):0.001,Hhecale: ); [ ](((Hcydno: ,Hheurippa: ): ,Hmelpomene: ): ,Hhecale: ); [ ](((Hheurippa: ,Hcydno: ): ,Hmelpomene: ): ,Hhecale: ); [ ](((Hheurippa: ,Hmelpomene: ): ,Hcydno: ): ,Hhecale: ); [ ](((Hheurippa: ,Hcydno: ): ,Hmelpomene: ): ,Hhecale: );

58 Background: STEM s Hybrid Species Models STEM-hy Example: Heliconius Butterflies Example settings file: properties: species: run: 3 theta: beta: burnin: 100 seed: bound total iter: 20 num saved trees: 10 hybrid species: H. heurippa hybrid tree: user-heliconius.tre H. melpomene: M95 H. hecale: Hh H. cordula: M187 H. heurippa: Strib40

59 Background: STEM s Hybrid Species Models Example user-heliconius.tre: (((H. heurippa: ,h. cydno: ): ,h. melpomene: ): ,h. hecale: );

60 Background: STEM s Hybrid Species Models ****************Results*****************... Parental trees: gamma(h. heurippa) = 1 ((H. cydno: ,(h. heurippa: ,h. melpomene: ): ): ,h. hecale: ); Lik: AIC: k: 3 gamma(h. heurippa) = 0 (((H. heurippa: ,h. cydno: ): ,h. melpomene: ): ,h. hecale: ); Lik: AIC: k: 3 Hybrid trees: (((H. heurippa: ,h. cydno: ): ,h. melpomene: ): ,h. hecale: ); Lik: gamma(h. heurippa): AIC: k: 4 ****************** Done ****************

61 Background: STEM s Hybrid Species Models What hybrid species can be considered? Care must be taken in selecting hybrid species: Both members of a sister group cannot be selected as hybrid taxa in a single analysis. However, two analyses can be run (one with each of the sister group identified as the hybrid) and results will be comparable across runs. The outgroup cannot be selected as a hybrid. Both of these restrictions result from the fact that for now hybridization is only considered between sister taxa. More general hybridization relationships can be considered by hand using the user-specified tree feature of STEM-hy.

62 Background: STEM s Hybrid Species Models STEM-hy: Strengths and Weaknesses STEM-hy makes some fairly strong assumptions: Error in estimating gene trees and branch lengths is not incorporated!!!! But the possibility of carrying out bootstrap analysis helps. Information in the sequence data is not used directly; it is only used as summarized by estimated gene divergence times. There is a single value of θ for the entire tree.

63 Background: STEM s Hybrid Species Models STEM-hy: Strengths and Weaknesses STEM-hy makes some fairly strong assumptions: Error in estimating gene trees and branch lengths is not incorporated!!!! But the possibility of carrying out bootstrap analysis helps. Information in the sequence data is not used directly; it is only used as summarized by estimated gene divergence times. There is a single value of θ for the entire tree. There are trade-offs involved, and STEM-hy does some things well: It is quick (even the tree search does not take long). It can handle missing data easily and intuitively. Simulations demonstrate reasonable performance (unlikely to be misleading; may be uninformative).

64 Challenge Datasets I ve created four datasets under varying conditions: M1 No hybridization, long intervals between speciation events. M2 No hybridization, short intervals between speciation events. M3 Low-levels of hybridization - B is a hybrid of A and C (species tree as in M1 and M2). M4 Extensive hybridization - B is a hybrid of A and C (species tree as in M1 and M2). All data sets have 6 species, 2 individuals/species, and 10 loci. GOAL: match the data set to the condition listed above Solutions are at lkubatko/solutions.html

65 STEM-hy Information, References, etc. Recommended citations - species tree estimation: Kubatko, L.S., B. C.Carstens, and L. L. Knowles STEM: Species Tree Estimation using Maximum likelihood under coalescence. Bioinformatics 25(7): Liu, L., L. Yu, and D.K. Pearl Maximum tree: a consistent estimator of the species tree. Journal of Mathematical Biology 60(1): Mossel, E. and S. Roch Incomplete lineage sorting: Consistent phylogeny estimation from multiple loci. IEEE/ACM Transactions on Computational Biology and Bioinformatics 7(1): Recommended citations - hybridization: Kubatko, LS Identifying Hybridization Events in the Presence of Coalescence via Model Selection, Systematic Biology 58(5): Thank you! STEM-hy is available at lkubatko/software/stem/ Questions concerning the programs can be sent to kubatko.2@osu.edu.

Anatomy of a species tree

Anatomy of a species tree Anatomy of a species tree T 1 Size of current and ancestral Populations (N) N Confidence in branches of species tree t/2n = 1 coalescent unit T 2 Branch lengths and divergence times of species & populations

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

Quartet Inference from SNP Data Under the Coalescent Model

Quartet Inference from SNP Data Under the Coalescent Model Bioinformatics Advance Access published August 7, 2014 Quartet Inference from SNP Data Under the Coalescent Model Julia Chifman 1 and Laura Kubatko 2,3 1 Department of Cancer Biology, Wake Forest School

More information

Taming the Beast Workshop

Taming the Beast Workshop Workshop and Chi Zhang June 28, 2016 1 / 19 Species tree Species tree the phylogeny representing the relationships among a group of species Figure adapted from [Rogers and Gibbs, 2014] Gene tree the phylogeny

More information

WenEtAl-biorxiv 2017/12/21 10:55 page 2 #2

WenEtAl-biorxiv 2017/12/21 10:55 page 2 #2 WenEtAl-biorxiv 0// 0: page # Inferring Phylogenetic Networks Using PhyloNet Dingqiao Wen, Yun Yu, Jiafan Zhu, Luay Nakhleh,, Computer Science, Rice University, Houston, TX, USA; BioSciences, Rice University,

More information

PhyloNet. Yun Yu. Department of Computer Science Bioinformatics Group Rice University

PhyloNet. Yun Yu. Department of Computer Science Bioinformatics Group Rice University PhyloNet Yun Yu Department of Computer Science Bioinformatics Group Rice University yy9@rice.edu Symposium And Software School 2016 The University Of Texas At Austin Installation System requirement: Java

More information

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees Concatenation Analyses in the Presence of Incomplete Lineage Sorting May 22, 2015 Tree of Life Tandy Warnow Warnow T. Concatenation Analyses in the Presence of Incomplete Lineage Sorting.. 2015 May 22.

More information

Properties of Consensus Methods for Inferring Species Trees from Gene Trees

Properties of Consensus Methods for Inferring Species Trees from Gene Trees Syst. Biol. 58(1):35 54, 2009 Copyright c Society of Systematic Biologists DOI:10.1093/sysbio/syp008 Properties of Consensus Methods for Inferring Species Trees from Gene Trees JAMES H. DEGNAN 1,4,,MICHAEL

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Species Tree Inference using SVDquartets

Species Tree Inference using SVDquartets Species Tree Inference using SVDquartets Laura Kubatko and Dave Swofford May 19, 2015 Laura Kubatko SVDquartets May 19, 2015 1 / 11 SVDquartets In this tutorial, we ll discuss several different data types:

More information

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species DNA sequences II Analyses of multiple sequence data datasets, incongruence tests, gene trees vs. species tree reconstruction, networks, detection of hybrid species DNA sequences II Test of congruence of

More information

Understanding How Stochasticity Impacts Reconstructions of Recent Species Divergent History. Huateng Huang

Understanding How Stochasticity Impacts Reconstructions of Recent Species Divergent History. Huateng Huang Understanding How Stochasticity Impacts Reconstructions of Recent Species Divergent History by Huateng Huang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Workshop III: Evolutionary Genomics

Workshop III: Evolutionary Genomics Identifying Species Trees from Gene Trees Elizabeth S. Allman University of Alaska IPAM Los Angeles, CA November 17, 2011 Workshop III: Evolutionary Genomics Collaborators The work in today s talk is joint

More information

Intraspecific gene genealogies: trees grafting into networks

Intraspecific gene genealogies: trees grafting into networks Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1-2010 Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci Elchanan Mossel University of

More information

To link to this article: DOI: / URL:

To link to this article: DOI: / URL: This article was downloaded by:[ohio State University Libraries] [Ohio State University Libraries] On: 22 February 2007 Access Details: [subscription number 731699053] Publisher: Taylor & Francis Informa

More information

Jed Chou. April 13, 2015

Jed Chou. April 13, 2015 of of CS598 AGB April 13, 2015 Overview of 1 2 3 4 5 Competing Approaches of Two competing approaches to species tree inference: Summary methods: estimate a tree on each gene alignment then combine gene

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies Inferring Phylogenetic Trees Distance Approaches Representing distances in rooted and unrooted trees The distance approach to phylogenies given: an n n matrix M where M ij is the distance between taxa

More information

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Efficient Bayesian Species Tree Inference under the Multispecies Coalescent

Efficient Bayesian Species Tree Inference under the Multispecies Coalescent Syst. Biol. 66(5):823 842, 2017 The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com

More information

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009 Gene Genealogies Coalescence Theory Annabelle Haudry Glasgow, July 2009 What could tell a gene genealogy? How much diversity in the population? Has the demographic size of the population changed? How?

More information

Coalescent Histories on Phylogenetic Networks and Detection of Hybridization Despite Incomplete Lineage Sorting

Coalescent Histories on Phylogenetic Networks and Detection of Hybridization Despite Incomplete Lineage Sorting Syst. Biol. 60(2):138 149, 2011 c The Author(s) 2011. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com

More information

Maximum Likelihood Inference of Reticulate Evolutionary Histories

Maximum Likelihood Inference of Reticulate Evolutionary Histories Maximum Likelihood Inference of Reticulate Evolutionary Histories Luay Nakhleh Department of Computer Science Rice University The 2015 Phylogenomics Symposium and Software School The University of Michigan,

More information

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13 Phylogenomics Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University How may we improve our inferences? How may we improve our inferences? Inferences Data How may we improve

More information

Phylogenetic analyses. Kirsi Kostamo

Phylogenetic analyses. Kirsi Kostamo Phylogenetic analyses Kirsi Kostamo The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species,

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D 7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

SpeciesNetwork Tutorial

SpeciesNetwork Tutorial SpeciesNetwork Tutorial Inferring Species Networks from Multilocus Data Chi Zhang and Huw A. Ogilvie E-mail: zhangchi@ivpp.ac.cn January 21, 2018 Introduction This tutorial covers SpeciesNetwork, a fully

More information

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi) Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction Lesser Tenrec (Echinops telfairi) Goals: 1. Use phylogenetic experimental design theory to select optimal taxa to

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

In comparisons of genomic sequences from multiple species, Challenges in Species Tree Estimation Under the Multispecies Coalescent Model REVIEW

In comparisons of genomic sequences from multiple species, Challenges in Species Tree Estimation Under the Multispecies Coalescent Model REVIEW REVIEW Challenges in Species Tree Estimation Under the Multispecies Coalescent Model Bo Xu* and Ziheng Yang*,,1 *Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China and Department

More information

Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting

Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting arxiv:1509.06075v3 [q-bio.pe] 12 Feb 2016 Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting Claudia Solís-Lemus 1 and Cécile Ané 1,2 1 Department of Statistics,

More information

The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection

The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection Yun Yu 1, James H. Degnan 2,3, Luay Nakhleh 1 * 1 Department of Computer Science, Rice

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall

More information

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support Siavash Mirarab University of California, San Diego Joint work with Tandy Warnow Erfan Sayyari

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Efficient Bayesian species tree inference under the multi-species coalescent

Efficient Bayesian species tree inference under the multi-species coalescent Efficient Bayesian species tree inference under the multi-species coalescent arxiv:1512.03843v1 [q-bio.pe] 11 Dec 2015 Bruce Rannala 1 and Ziheng Yang 2 1 Department of Evolution & Ecology, University

More information

Phylogeny Tree Algorithms

Phylogeny Tree Algorithms Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2011 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley B.D. Mishler March 31, 2011. Reticulation,"Phylogeography," and Population Biology:

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

I. Short Answer Questions DO ALL QUESTIONS

I. Short Answer Questions DO ALL QUESTIONS EVOLUTION 313 FINAL EXAM Part 1 Saturday, 7 May 2005 page 1 I. Short Answer Questions DO ALL QUESTIONS SAQ #1. Please state and BRIEFLY explain the major objectives of this course in evolution. Recall

More information

ESS 345 Ichthyology. Systematic Ichthyology Part II Not in Book

ESS 345 Ichthyology. Systematic Ichthyology Part II Not in Book ESS 345 Ichthyology Systematic Ichthyology Part II Not in Book Thought for today: Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else,

More information

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence Are directed quartets the key for more reliable supertrees? Patrick Kück Department of Life Science, Vertebrates Division,

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Evaluation of a Bayesian Coalescent Method of Species Delimitation

Evaluation of a Bayesian Coalescent Method of Species Delimitation Syst. Biol. 60(6):747 761, 2011 c The Author(s) 2011. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com

More information

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley

Integrative Biology 200A PRINCIPLES OF PHYLOGENETICS Spring 2012 University of California, Berkeley Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley B.D. Mishler April 12, 2012. Phylogenetic trees IX: Below the "species level;" phylogeography; dealing

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Inferring Speciation Times under an Episodic Molecular Clock

Inferring Speciation Times under an Episodic Molecular Clock Syst. Biol. 56(3):453 466, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701420643 Inferring Speciation Times under an Episodic Molecular

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

Phylogenetics in the Age of Genomics: Prospects and Challenges

Phylogenetics in the Age of Genomics: Prospects and Challenges Phylogenetics in the Age of Genomics: Prospects and Challenges Antonis Rokas Department of Biological Sciences, Vanderbilt University http://as.vanderbilt.edu/rokaslab http://pubmed2wordle.appspot.com/

More information

Consensus methods. Strict consensus methods

Consensus methods. Strict consensus methods Consensus methods A consensus tree is a summary of the agreement among a set of fundamental trees There are many consensus methods that differ in: 1. the kind of agreement 2. the level of agreement Consensus

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Phylogeny. November 7, 2017

Phylogeny. November 7, 2017 Phylogeny November 7, 2017 Phylogenetics Phylon = tribe/race, genetikos = relative to birth Phylogenetics: study of evolutionary relationships among organisms, sequences, or anything in between Related

More information

Taxon: generally refers to any named group of organisms, such as species, genus, family, order, etc.. Node: represents the hypothetical ancestor

Taxon: generally refers to any named group of organisms, such as species, genus, family, order, etc.. Node: represents the hypothetical ancestor A quick review Taxon: generally refers to any named group of organisms, such as species, genus, family, order, etc.. Node: represents the hypothetical ancestor Branches: lines diverging from a node Root:

More information

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Statistical estimation of models of sequence evolution Phylogenetic inference using maximum likelihood:

More information

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS Masatoshi Nei" Abstract: Phylogenetic trees: Recent advances in statistical methods for phylogenetic reconstruction and genetic diversity analysis were

More information

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley

Integrative Biology 200A PRINCIPLES OF PHYLOGENETICS Spring 2012 University of California, Berkeley Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley B.D. Mishler Feb. 7, 2012. Morphological data IV -- ontogeny & structure of plants The last frontier

More information

Estimating Species Phylogeny from Gene-Tree Probabilities Despite Incomplete Lineage Sorting: An Example from Melanoplus Grasshoppers

Estimating Species Phylogeny from Gene-Tree Probabilities Despite Incomplete Lineage Sorting: An Example from Melanoplus Grasshoppers Syst. Biol. 56(3):400-411, 2007 Copyright Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701405560 Estimating Species Phylogeny from Gene-Tree Probabilities

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

C.DARWIN ( )

C.DARWIN ( ) C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2011 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley B.D. Mishler Feb. 1, 2011. Qualitative character evolution (cont.) - comparing

More information

DNA-based species delimitation

DNA-based species delimitation DNA-based species delimitation Phylogenetic species concept based on tree topologies Ø How to set species boundaries? Ø Automatic species delimitation? druhů? DNA barcoding Species boundaries recognized

More information

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

More information

reconciling trees Stefanie Hartmann postdoc, Todd Vision s lab University of North Carolina the data

reconciling trees Stefanie Hartmann postdoc, Todd Vision s lab University of North Carolina the data reconciling trees Stefanie Hartmann postdoc, Todd Vision s lab University of North Carolina 1 the data alignments and phylogenies for ~27,000 gene families from 140 plant species www.phytome.org publicly

More information

Reconstructing the history of lineages

Reconstructing the history of lineages Reconstructing the history of lineages Class outline Systematics Phylogenetic systematics Phylogenetic trees and maps Class outline Definitions Systematics Phylogenetic systematics/cladistics Systematics

More information

Processes of Evolution

Processes of Evolution 15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

Bayesian Models for Phylogenetic Trees

Bayesian Models for Phylogenetic Trees Bayesian Models for Phylogenetic Trees Clarence Leung* 1 1 McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada ABSTRACT Introduction: Inferring genetic ancestry of different species

More information

Fine-Scale Phylogenetic Discordance across the House Mouse Genome

Fine-Scale Phylogenetic Discordance across the House Mouse Genome Fine-Scale Phylogenetic Discordance across the House Mouse Genome Michael A. White 1,Cécile Ané 2,3, Colin N. Dewey 4,5,6, Bret R. Larget 2,3, Bret A. Payseur 1 * 1 Laboratory of Genetics, University of

More information

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses Anatomy of a tree outgroup: an early branching relative of the interest groups sister taxa: taxa derived from the same recent ancestor polytomy: >2 taxa emerge from a node Anatomy of a tree clade is group

More information

Methods to reconstruct phylogene1c networks accoun1ng for ILS

Methods to reconstruct phylogene1c networks accoun1ng for ILS Methods to reconstruct phylogene1c networks accoun1ng for ILS Céline Scornavacca some slides have been kindly provided by Fabio Pardi ISE-M, Equipe Phylogénie & Evolu1on Moléculaires Montpellier, France

More information

molecular evolution and phylogenetics

molecular evolution and phylogenetics molecular evolution and phylogenetics Charlotte Darby Computational Genomics: Applied Comparative Genomics 2.13.18 https://www.thinglink.com/scene/762084640000311296 Internal node Root TIME Branch Leaves

More information

Theory of Evolution Charles Darwin

Theory of Evolution Charles Darwin Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057 Bootstrapping and Tree reliability Biol4230 Tues, March 13, 2018 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 Rooting trees (outgroups) Bootstrapping given a set of sequences sample positions randomly,

More information

Consistency Index (CI)

Consistency Index (CI) Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

More information

Lecture 6 Phylogenetic Inference

Lecture 6 Phylogenetic Inference Lecture 6 Phylogenetic Inference From Darwin s notebook in 1837 Charles Darwin Willi Hennig From The Origin in 1859 Cladistics Phylogenetic inference Willi Hennig, Cladistics 1. Clade, Monophyletic group,

More information

OMICS Journals are welcoming Submissions

OMICS Journals are welcoming Submissions OMICS Journals are welcoming Submissions OMICS International welcomes submissions that are original and technically so as to serve both the developing world and developed countries in the best possible

More information