A (short) introduction to phylogenetics

Similar documents
Multivariate analysis of genetic data exploring group diversity

Constructing Evolutionary/Phylogenetic Trees

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Constructing Evolutionary/Phylogenetic Trees

Dr. Amira A. AL-Hosary

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Algorithms in Bioinformatics

Phylogenetic Tree Reconstruction

BINF6201/8201. Molecular phylogenetic methods

Phylogenetic inference

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

How to read and make phylogenetic trees Zuzana Starostová

Phylogenetic analyses. Kirsi Kostamo

EVOLUTIONARY DISTANCES

Theory of Evolution Charles Darwin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Multivariate analysis of genetic data: exploring groups diversity

Evolutionary Tree Analysis. Overview

How should we organize the diversity of animal life?

What is Phylogenetics

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Phylogeny: building the tree of life

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Phylogenetic inference: from sequences to trees

Multiple Sequence Alignment. Sequences

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?


9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Phylogenetics. BIOL 7711 Computational Bioscience

Lecture 6 Phylogenetic Inference

Phylogeny. November 7, 2017

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Consistency Index (CI)

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Phylogeny: traditional and Bayesian approaches

C.DARWIN ( )

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Intraspecific gene genealogies: trees grafting into networks

Phylogenetic trees 07/10/13

Multivariate analysis of genetic data an introduction

Classification and Phylogeny

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Evolutionary Models. Evolutionary Models

Phylogenetics: Building Phylogenetic Trees

Systematics - Bio 615

Classification and Phylogeny

Concepts and Methods in Molecular Divergence Time Estimation

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Introduction to characters and parsimony analysis

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Molecular Evolution, course # Final Exam, May 3, 2006

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

molecular evolution and phylogenetics

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

A Phylogenetic Network Construction due to Constrained Recombination

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species

An Adaptive Association Test for Microbiome Data

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Is the equal branch length model a parsimony model?

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Phylogeny Tree Algorithms

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Lecture 11 Friday, October 21, 2011

Consensus Methods. * You are only responsible for the first two

8/23/2014. Phylogeny and the Tree of Life

Phylogenetic Analysis

Phylogenetic Analysis

Chapter 16: Reconstructing and Using Phylogenies

Estimating Evolutionary Trees. Phylogenetic Methods

Theory of Evolution. Charles Darwin

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Chapter 19: Taxonomy, Systematics, and Phylogeny

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Modern Evolutionary Classification. Section 18-2 pgs

Phylogenetic Analysis

Reconstructing Evolutionary Trees. Chapter 14

Multivariate analysis of genetic data: an introduction

Reconstruire le passé biologique modèles, méthodes, performances, limites

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution?

Transcription:

A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field Station 16 Aug 2016 1/39

Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more... Outline 2/39

Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more... Outline 3/39

Phylogenetics: from the origins... From the first growth of the tree, many a limb and branch has decayed and dropped off; and these fallen branches of various sizes may represent those whole orders, families, and genera which have now no living representatives, and which are known to us only in a fossil state. C. Darwin, Notebook, 1837. 4/39

Phylogenetics:...to the present phylogenetic trees are part of the standard toolbox of genetic data analysis represent the evolutionary history of a set of (sampled) taxa Bininda-Emonds et al., 2007, Nature. 5/39

And the main difference is... Current trees look better! (and some other minor differences) 6/39

And the main difference is... Current trees look better! (and some other minor differences) 6/39

And the main difference is... Current trees look better! (and some other minor differences) 6/39

About the minor differences... DNA sequencing revolution huge data banks freely available (e.g. GenBank) easier, cheaper, faster to obtain DNA sequences increasing number of full genomes available Different ways to exploit this information. 7/39

About the minor differences... DNA sequencing revolution huge data banks freely available (e.g. GenBank) easier, cheaper, faster to obtain DNA sequences increasing number of full genomes available Different ways to exploit this information. 7/39

Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more... Outline 8/39

Genetic changes (substitution) accumulate over time Substitution: replacement of a nucleotide (e.g. a t)...attgtacgtaggatctagg......attgaacgtaggatctagg......attgtacgtaccatctagg... Time...attgaacgtagttgctagg......atcgtacgtaccatctggg... 9/39

Genetic changes (substitution) accumulate over time Substitution: replacement of a nucleotide (e.g. a t)...attgtacgtaggatctagg......attgaacgtaggatctagg......attgtacgtaccatctagg... Time...attgaacgtagttgctagg......atcgtacgtaccatctggg... 9/39

Genetic changes (substitution) accumulate over time Substitution: replacement of a nucleotide (e.g. a t)...attgtacgtaggatctagg......attgaacgtaggatctagg......attgtacgtaccatctagg... Time...attgaacgtagttgctagg......atcgtacgtaccatctggg... 9/39

Genetic changes (substitution) accumulate over time Substitution: replacement of a nucleotide (e.g. a t)...attgtacgtaggatctagg......attgaacgtaggatctagg......attgtacgtaccatctagg... Time...attgaacgtagttgctagg......atcgtacgtaccatctggg... 9/39

Genetic changes (substitution) accumulate over time Substitution: replacement of a nucleotide (e.g. a t)...attgtacgtaggatctagg......attgaacgtaggatctagg......attgtacgtaccatctagg... Time...attgaacgtagttgctagg......atcgtacgtaccatctggg... 9/39

Genetic changes (substitution) accumulate over time Substitution: replacement of a nucleotide (e.g. a t)...attgtacgtaggatctagg......attgaacgtaggatctagg......attgtacgtaccatctagg... Time...attgaacgtagttgctagg......atcgtacgtaccatctggg... Substitution patterns reflect the evolutionary history 9/39

Using substitution patterns to reconstruct the evolutionary history...attaaacgtaggatctagg......attaaacgtaggatctagg......attcatacgtaggatcagg......attgtacgtaggatctttt......attgtacgtaggatctttt......attgcatgtaggatctttt... Reconstructed phylogeny 10/39

Using substitution patterns to reconstruct the evolutionary history...attaaacgtaggatctagg......attaaacgtaggatctagg......attcatacgtaggatcagg......attgtacgtaggatctttt......attgtacgtaggatctttt......attgcatgtaggatctttt... Reconstructed phylogeny 10/39

Using substitution patterns to reconstruct the evolutionary history...attaaacgtaggatctagg......attaaacgtaggatctagg......attcatacgtaggatcagg......attgtacgtaggatctttt......attgtacgtaggatctttt......attgcatgtaggatctttt... Reconstructed phylogeny Phylogenetics aim to reconstruct evolutionary trees (phylogenies) from genetic sequence data. 10/39

Using trees to represent the evolutionary history 11/39

Using trees to represent the evolutionary history - "taxa" - "tips" - "leaves" - "Operational Taxonomic Units (OTUs)" 11/39

Using trees to represent the evolutionary history Most Recent Common Ancestors (MRCA) - "nodes" - "Hypothetical Taxonomic Units (HTUs)" 11/39

Using trees to represent the evolutionary history branch - "edge" - length = amount of evolution (not time, as a rule) - length is optional 11/39

Using trees to represent the evolutionary history distances between tips - "patristic" distance: sum of branch lengths - other measures of distance/dissimilarity - vertical axis meaningless 11/39

Using trees to represent the evolutionary history Root - oldest part of the tree - defined using an outgroup - optional 11/39

Using trees to represent the evolutionary history Root - oldest part of the tree - defined using an outgroup - optional How to we build them? 11/39

Workflow Prepare data align sequences: alignment software + manual refinement Build the tree distance-based methods maximum parsimony likelihood-based methods (ML, Bayesian) Analyse the tree assess uncertainty test phylogenetic signal model trait evolution... 12/39

Workflow Prepare data align sequences: alignment software + manual refinement Build the tree distance-based methods maximum parsimony likelihood-based methods (ML, Bayesian) Analyse the tree assess uncertainty test phylogenetic signal model trait evolution... 12/39

Workflow Prepare data align sequences: alignment software + manual refinement Build the tree distance-based methods maximum parsimony likelihood-based methods (ML, Bayesian) Analyse the tree assess uncertainty test phylogenetic signal model trait evolution... 12/39

Workflow Prepare data align sequences: alignment software + manual refinement Build the tree distance-based methods maximum parsimony likelihood-based methods (ML, Bayesian) Analyse the tree assess uncertainty test phylogenetic signal model trait evolution... 12/39

Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more... Outline 13/39

Distance-based phylogenetic reconstruction Approaches relying on agglomerative clustering algorithms (e.g. Single linkage, UPGMA, Neighbor-Joining) Rationale 1. compute pairwise genetic distances D 2. group closest sequences 3. update D 4. go back to 2) until all sequences are grouped 14/39

Distance-based phylogenetic reconstruction 15/39

Distance-based phylogenetic reconstruction 15/39

Distance-based phylogenetic reconstruction 15/39

Distance-based phylogenetic reconstruction 15/39

Distance-based phylogenetic reconstruction 15/39

Distance-based phylogenetic reconstruction 15/39

What is the distance between a node and tips? Hierarchical clustering: k D =... k,g single linkage: D k,g = min(d k,i, D k,j ) complete linkage: D k,g = max(d k,i, D k,j ) g i D i,j j UPGMA: D k,g = D k,i+d k,j 2 Neighbor joining: Transforms original distances to account for heterogeneous rates of evolution. 16/39

What is the distance between a node and tips? Hierarchical clustering: k D =... k,g single linkage: D k,g = min(d k,i, D k,j ) complete linkage: D k,g = max(d k,i, D k,j ) g i D i,j j UPGMA: D k,g = D k,i+d k,j 2 Neighbor joining: Transforms original distances to account for heterogeneous rates of evolution. 16/39

What is the distance between a node and tips? Hierarchical clustering: k D =... k,g single linkage: D k,g = min(d k,i, D k,j ) complete linkage: D k,g = max(d k,i, D k,j ) g i D i,j j UPGMA: D k,g = D k,i+d k,j 2 Neighbor joining: Transforms original distances to account for heterogeneous rates of evolution. 16/39

What is the distance between a node and tips? Hierarchical clustering: k D =... k,g single linkage: D k,g = min(d k,i, D k,j ) complete linkage: D k,g = max(d k,i, D k,j ) g i D i,j j UPGMA: D k,g = D k,i+d k,j 2 Neighbor joining: Transforms original distances to account for heterogeneous rates of evolution. 16/39

What is the distance between a node and tips? Hierarchical clustering: k D =... k,g single linkage: D k,g = min(d k,i, D k,j ) complete linkage: D k,g = max(d k,i, D k,j ) g i D i,j j UPGMA: D k,g = D k,i+d k,j 2 Neighbor joining: Transforms original distances to account for heterogeneous rates of evolution. 16/39

Distance-based phylogenetic reconstruction Advantages simple flexible (many distances and clustering algorithms) fast and scalable (applicable to large datasets) Limitations sensitive to distance/clustering chosen evolutionary rates are not estimated no measure of uncertainty for the tree obtained 17/39

Distance-based phylogenetic reconstruction Advantages simple flexible (many distances and clustering algorithms) fast and scalable (applicable to large datasets) Limitations sensitive to distance/clustering chosen evolutionary rates are not estimated no measure of uncertainty for the tree obtained 17/39

Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more... Outline 18/39

Maximum parsimony phylogenies Approaches relying on finding the tree with the smallest number of character changes (substitutions) Rationale 1. start from a pre-defined tree 2. compute initial parsimony score 3. permute branches and compute parsimony score 4. accept new tree if the parsimony score is improved 5. go back to 3) until convergence 19/39

Maximum parsimony phylogenies score: 8 score: 6 Initial tree score: 5 20/39

Maximum parsimony phylogenies score: 8 score: 6 Initial tree score: 5 20/39

Maximum parsimony phylogenies score: 8 score: 6 Initial tree score: 5 20/39

Maximum parsimony phylogenies score: 8 score: 6 Initial tree score: 5 20/39

Maximum parsimony phylogenies score: 8 score: 6 Initial tree score: 5 20/39

Maximum parsimony phylogenies Advantages applicable to any discontinuous characters (not just DNA) intuitive explanation: simplest evolutionary scenario Limitations evolutionary rates are not estimated no measure of uncertainty for the tree obtained computer-intensive different types of substitutions ignored evolution not necessarily parsimonious sensitive to heterogeneous rates of evolution (long branch attraction) 21/39

Maximum parsimony phylogenies Advantages applicable to any discontinuous characters (not just DNA) intuitive explanation: simplest evolutionary scenario Limitations evolutionary rates are not estimated no measure of uncertainty for the tree obtained computer-intensive different types of substitutions ignored evolution not necessarily parsimonious sensitive to heterogeneous rates of evolution (long branch attraction) 21/39

Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more... Outline 22/39

Likelihood-based phylogenies (ML / Bayesian) Approaches relying on a model of sequence evolution: ML: find tree and evolutionary rates with highest likelihood Bayesian: find tree and evolutionary rates to posterior probability Rationale 1. start from a pre-defined tree 2. compute initial likelihood/posterior 3. permute branches, sample new parameters and compute likelihood/posterior 4. accept new tree and parameters based on likelihood/posterior improvement 5. go back to 3) until convergence 23/39

Likelihood-based phylogenies (ML / Bayesian) Approaches relying on a model of sequence evolution: ML: find tree and evolutionary rates with highest likelihood Bayesian: find tree and evolutionary rates to posterior probability Rationale 1. start from a pre-defined tree 2. compute initial likelihood/posterior 3. permute branches, sample new parameters and compute likelihood/posterior 4. accept new tree and parameters based on likelihood/posterior improvement 5. go back to 3) until convergence 23/39

Likelihood-based phylogenies (ML / Bayesian) Likelihood / Posterior Density 24/39

Likelihood-based phylogenies (ML / Bayesian) Advantages very flexible consistent with a model of evolution statistically consistent (model comparison) parameter estimation (Bayesian) several trees measure of uncertainty Limitations computer-intensive choice of the model of evolution (ML) no measure of uncertainty for the tree obtained (Bayesian) need to find a consensus tree 25/39

Likelihood-based phylogenies (ML / Bayesian) Advantages very flexible consistent with a model of evolution statistically consistent (model comparison) parameter estimation (Bayesian) several trees measure of uncertainty Limitations computer-intensive choice of the model of evolution (ML) no measure of uncertainty for the tree obtained (Bayesian) need to find a consensus tree 25/39

Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more... Outline 26/39

How do we know the tree is robust? Main issue: assess the uncertainty of the tree topology / individual nodes Approaches ML: model selection to compare trees (whole tree) Bayesian methods: between-samples variability (individual nodes) any method: bootstrap (individual nodes) 27/39

How do we know the tree is robust? Main issue: assess the uncertainty of the tree topology / individual nodes Approaches ML: model selection to compare trees (whole tree) Bayesian methods: between-samples variability (individual nodes) any method: bootstrap (individual nodes) 27/39

How do we know the tree is robust? Main issue: assess the uncertainty of the tree topology / individual nodes Approaches ML: model selection to compare trees (whole tree) Bayesian methods: between-samples variability (individual nodes) any method: bootstrap (individual nodes) 27/39

How do we know the tree is robust? Main issue: assess the uncertainty of the tree topology / individual nodes Approaches ML: model selection to compare trees (whole tree) Bayesian methods: between-samples variability (individual nodes) any method: bootstrap (individual nodes) 27/39

How do we know the tree is robust? Main issue: assess the uncertainty of the tree topology / individual nodes Approaches ML: model selection to compare trees (whole tree) Bayesian methods: between-samples variability (individual nodes) any method: bootstrap (individual nodes) 27/39

Bootstrapping phylogenies assess variability due to sampling the genome and conflicting signals relies on analysing resampled datasets Rationale 1. obtain a reference tree 2. resample the sites with replacement 3. obtain a tree for the resampled dataset 4. go back to 2) until the desired number of bootstrapped trees is attained 5. compute the frequency of each bifurcation of the reference tree occuring in bootstrapped trees 28/39

Bootstrapping phylogenies assess variability due to sampling the genome and conflicting signals relies on analysing resampled datasets Rationale 1. obtain a reference tree 2. resample the sites with replacement 3. obtain a tree for the resampled dataset 4. go back to 2) until the desired number of bootstrapped trees is attained 5. compute the frequency of each bifurcation of the reference tree occuring in bootstrapped trees 28/39

Bootstrapping phylogenies...attaaacgtaggatctagg......attaaacgtaggatctagg......attcatacgtaggatcagg......attgtacgtaggatctttt......attgtacgtaggatctttt......attgcatgtaggatctttt... Reconstructed phylogeny sampling sites with replacement compare topologies...ttttaaaccatggatctagg......ttttaaaccatggatctagg......ttttaaaccatggatctagg......ttttaaaccatggatctagg......ttttaaaccatggatctagg......atttcatacgtaggatcagg......ttttaaaccatggatctagg......atttcatacgtaggatcagg......ttttaaaccatggatctagg......atttcatacgtaggatcagg......aaatgtaccatggatcttgt......atttcatacgtaggatcagg......aaatgtaccatggatcttgt......atttcatacgtaggatcagg......aaatgtaccatggatcttgt......aaatgtaccatggatcttgt......aaatgtaccatggatcttgt......aaatgcatcatggatcttgt......aaatgtaccatggatcttgt......aaatgcatcatggatcttgt......aaatgtaccatggatcttgt......aaatgcatcatggatcttgt......aaatgcatcatggatcttgt......aaatgcatcatggatcttgt... Reconstructed phylogeny Reconstructed phylogeny Reconstructed phylogeny Reconstructed phylogeny Reconstructed phylogeny 29/39

Bootstrapping phylogenies Advantages standard simple to implement Limitations possibly computer-intensive assumes that the genome has been sampled randomly (often wrong) 30/39

Bootstrapping phylogenies Advantages standard simple to implement Limitations possibly computer-intensive assumes that the genome has been sampled randomly (often wrong) 30/39

Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more... Outline 31/39

Plotting trees as rooted 9 1 3 10 11 2 12 4 13 7 14 5 9 2 11 6 8 15 6 15 12 3 9 2 1 4 13 7 14 5 10 8 11 12 1 11 15 6 15 6 10 4 8 7 13 14 11 6 3 5 3 15 1 4 13 7 14 5 9 2 9 2 12 5 14 3 7 10 10 8 5 10 8 8 3 9 12 14 2 1 7 13 6 11 13 4 4 15 Never plot an unrooted tree as rooted. 1 12 32/39

Plotting trees as rooted 9 1 3 10 11 2 12 4 13 7 14 5 9 2 11 6 8 15 6 15 12 3 9 2 1 4 13 7 14 5 10 8 11 12 1 11 15 6 15 6 10 4 8 7 13 14 11 6 3 5 3 15 1 4 13 7 14 5 9 2 9 2 12 5 14 3 7 10 10 8 5 10 8 8 3 9 12 14 2 1 7 13 6 11 13 4 4 15 Never plot an unrooted tree as rooted. 1 12 32/39

Interpreting distances 33/39

Interpreting distances 33/39

Interpreting distances meaningless meaningful distance = sum of branch lengths 33/39

The paradox of divergent clusters MRCA and genetic distances may give different information. 34/39

The paradox of divergent clusters D(red, red)>d(red,blue) MRCA and genetic distances may give different information. 34/39

The paradox of divergent clusters D(red, red)>d(red,blue) MRCA and genetic distances may give different information. 34/39

Taking uncertainty into account 193 HIV 1 sequences from DRC (Strimmer & Pybus 2001) 0.2 0.15 0.1 0.05 0 At best, the tree is an estimate of the likely evolutionary history of the taxa studied. 35/39

Taking uncertainty into account 193 HIV 1 sequences from DRC (Strimmer & Pybus 2001) Collapsed tree (threshold length 0.01) 0.2 0.15 0.1 0.05 0 0.2 0.15 0.1 0.05 0 At best, the tree is an estimate of the likely evolutionary history of the taxa studied. 35/39

Taking uncertainty into account 193 HIV 1 sequences from DRC (Strimmer & Pybus 2001) Collapsed tree (threshold length 0.01) 0.2 0.15 0.1 0.05 0 0.2 0.15 0.1 0.05 0 At best, the tree is an estimate of the likely evolutionary history of the taxa studied. 35/39

(Over, Mis)Interpreting temporal trends t6 t62 t4 t48 t72 t29t14 t83 t39 t69 t96 t63 t44 t24 t51 t95 t85 t68 t92 t87 t31 t81 t99 t73 t30t25 t64 t46 t33 t54 t10 t26 t76 t59 t16 t32 t100 t91 t77 t42 t36 t58 t98 t56 t74 t65t79 t37 t80 t12t93 t78 t90 t18 t11 t94 t5 t45 t60t67 t53 t88 t52 t22 t27 t23 t38 t40 t3 t15 t13 t61 t55 t35 t89 t75 t86 t70 t66 t34 t2 t17 t28 t47 t19 t43 t41 t82 t49 t1 t84 t20 t97 t21 t7 t71 t50 t57 12 10 8 6 4 2 0 Time trees only make sense under a near-perfect molecular clock. 36/39

(Over, Mis)Interpreting temporal trends t6 t62 t4 t48 t72 t29t14 t83 t39 t69 t96 t63 t44 t24 t51 t95 t85 t68 t92 t87 t31 t81 t99 t73 t30t25 t64 t46 t33 t54 t10 t26 t76 t59 t16 t32 t100 t91 t77 t42 t36 t58 t98 t56 t74 t65t79 t37 t80 t12t93 t78 t90 t18 t11 t94 t5 t45 t60t67 t53 t88 t52 t22 t27 t23 t38 t40 t3 t15 t13 t61 t55 t35 t89 t75 t86 t70 t66 t34 t2 t17 t28 t47 t19 t43 t41 t82 t49 t1 t84 t20 t97 t21 t7 t71 t50 t57 Mutations to the root 0 2 4 6 8 10 12 > anova(lm(d.root~-1+t.root)) anova(lm(d.root~-1+t.root)) Analysis of Variance Table Response: d.root Df Sum Sq Mean Sq F value Pr(>F) t.root 1 5434.7 5434.7 214.66 < 2.2e-16 *** Residuals 99 2506.4 25.3 12 10 8 6 4 2 0 0 20 40 60 80 Time to the root Time trees only make sense under a near-perfect molecular clock. 36/39

(Over, Mis)Interpreting temporal trends t6 t62 t4 t48 t72 t29t14 t83 t39 t69 t96 t63 t44 t24 t51 t95 t85 t68 t92 t87 t31 t81 t99 t73 t30t25 t64 t46 t33 t54 t10 t26 t76 t59 t16 t32 t100 t91 t77 t42 t36 t58 t98 t56 t74 t65t79 t37 t80 t12t93 t78 t90 t18 t11 t94 t5 t45 t60t67 t53 t88 t52 t22 t27 t23 t38 t40 t3 t15 t13 t61 t55 t35 t89 t75 t86 t70 t66 t34 t2 t17 t28 t47 t19 t43 t41 t82 t49 t1 t84 t20 t97 t21 t7 t71 t50 t57 Mutations to the root 0 2 4 6 8 10 12 > anova(lm(d.root~-1+t.root)) anova(lm(d.root~-1+t.root)) Analysis of Variance Table Response: d.root Df Sum Sq Mean Sq F value Pr(>F) t.root 1 5434.7 5434.7 214.66 < 2.2e-16 *** Residuals 99 2506.4 25.3 12 10 8 6 4 2 0 0 20 40 60 80 Time to the root Time trees only make sense under a near-perfect molecular clock. 36/39

Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more... Outline 37/39

This is only the beginning Many things can be done with trees estimate divergence time model trait evolution (phylogenetic comparative method) reconstruct ancestral states measure diversity infer past demographics/effective population size (coalescence)... and also, other approaches than phylogenetics to analyse genetic data 38/39

This is only the beginning Many things can be done with trees estimate divergence time model trait evolution (phylogenetic comparative method) reconstruct ancestral states measure diversity infer past demographics/effective population size (coalescence)... and also, other approaches than phylogenetics to analyse genetic data 38/39

This is only the beginning Many things can be done with trees estimate divergence time model trait evolution (phylogenetic comparative method) reconstruct ancestral states measure diversity infer past demographics/effective population size (coalescence)... and also, other approaches than phylogenetics to analyse genetic data 38/39

This is only the beginning Many things can be done with trees estimate divergence time model trait evolution (phylogenetic comparative method) reconstruct ancestral states measure diversity infer past demographics/effective population size (coalescence)... and also, other approaches than phylogenetics to analyse genetic data 38/39

This is only the beginning Many things can be done with trees estimate divergence time model trait evolution (phylogenetic comparative method) reconstruct ancestral states measure diversity infer past demographics/effective population size (coalescence)... and also, other approaches than phylogenetics to analyse genetic data 38/39

This is only the beginning Many things can be done with trees estimate divergence time model trait evolution (phylogenetic comparative method) reconstruct ancestral states measure diversity infer past demographics/effective population size (coalescence)... and also, other approaches than phylogenetics to analyse genetic data 38/39

This is only the beginning Many things can be done with trees estimate divergence time model trait evolution (phylogenetic comparative method) reconstruct ancestral states measure diversity infer past demographics/effective population size (coalescence)... and also, other approaches than phylogenetics to analyse genetic data 38/39

Time to get your hands dirty! The pdf of the practical is online: http://adegenet.r-forge.r-project.org/ or Google adegenet documents GDAR August 2016 39/39