Phylogenetic methods in molecular systematics

Similar documents
Introduction to characters and parsimony analysis

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

Phylogenetic Analysis

Phylogenetic Analysis

Phylogenetic Analysis

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Dr. Amira A. AL-Hosary

Systematics - Bio 615

How should we organize the diversity of animal life?

Systematics - BIO 615

Consensus methods. Strict consensus methods

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Chapter 26 Phylogeny and the Tree of Life

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

What is Phylogenetics

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

PHYLOGENY & THE TREE OF LIFE

Reconstructing the history of lineages

8/23/2014. Phylogeny and the Tree of Life

1/27/2010. Systematics and Phylogenetics of the. An Introduction. Taxonomy and Systematics

Classification and Phylogeny

Chapter 19: Taxonomy, Systematics, and Phylogeny

Lecture 11 Friday, October 21, 2011

Phylogeny & Systematics: The Tree of Life

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Classification and Phylogeny

Lecture 6 Phylogenetic Inference

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Classification, Phylogeny yand Evolutionary History

Patterns of Evolution

Classifications can be based on groupings g within a phylogeny

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

The Life System and Environmental & Evolutionary Biology II

--Therefore, congruence among all postulated homologies provides a test of any single character in question [the central epistemological advance].

Introduction to Biosystematics - Zool 575

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

CHAPTER 26 PHYLOGENY AND THE TREE OF LIFE Connecting Classification to Phylogeny

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008

Introduction to Biosystematics - Zool 575

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

Phylogeny and the Tree of Life

Need for systematics. Applications of systematics. Linnaeus plus Darwin. Approaches in systematics. Principles of cladistics

Lecture V Phylogeny and Systematics Dr. Kopeny

Chapter 26: Phylogeny and the Tree of Life

PHYLOGENY AND SYSTEMATICS

Chapter 26 Phylogeny and the Tree of Life

SPECIATION. REPRODUCTIVE BARRIERS PREZYGOTIC: Barriers that prevent fertilization. Habitat isolation Populations can t get together

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology

How Biological Diversity Evolves

ESS 345 Ichthyology. Systematic Ichthyology Part II Not in Book

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Chapter 16: Reconstructing and Using Phylogenies

Organizing Life s Diversity

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogeny and the Tree of Life

Biology 211 (2) Week 1 KEY!

Systematics Lecture 3 Characters: Homology, Morphology

Workshop: Biosystematics

Integrating Fossils into Phylogenies. Throughout the 20th century, the relationship between paleontology and evolutionary biology has been strained.

C.DARWIN ( )

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Name: Class: Date: ID: A

Chapter 19 Organizing Information About Species: Taxonomy and Cladistics

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

BINF6201/8201. Molecular phylogenetic methods

Outline. Classification of Living Things

Fig. 26.7a. Biodiversity. 1. Course Outline Outcomes Instructors Text Grading. 2. Course Syllabus. Fig. 26.7b Table

Phylogeny and the Tree of Life

1. Construct and use dichotomous keys to identify organisms.

How to read and make phylogenetic trees Zuzana Starostová

CLASSIFICATION OF LIVING THINGS. Chapter 18

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Biologists have used many approaches to estimating the evolutionary history of organisms and using that history to construct classifications.

Phylogeny and the Tree of Life

Chapter 26. Phylogeny and the Tree of Life. Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick Pearson Education, Inc.

BIOLOGY. Phylogeny and the Tree of Life CAMPBELL. Reece Urry Cain Wasserman Minorsky Jackson

Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab

Constructing Evolutionary/Phylogenetic Trees

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

A Phylogenetic Network Construction due to Constrained Recombination

Phylogenetic Tree Reconstruction

Wake Acceleration Academy - Biology Note Guide Unit 6: Evolution & The Diversity of Life

Phylogenetic analysis. Characters

The practice of naming and classifying organisms is called taxonomy.

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Phylogeny and the Tree of Life

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

BIOL 428: Introduction to Systematics Midterm Exam

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Phylogeny & Systematics

Biodiversity. The Road to the Six Kingdoms of Life

Consistency Index (CI)

Chapter 17. Organizing Life's Diversity

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Consensus Methods. * You are only responsible for the first two

AP Biology Notes Outline Enduring Understanding 1.B. Big Idea 1: The process of evolution drives the diversity and unity of life.

Transcription:

Phylogenetic methods in molecular systematics Niklas Wahlberg Stockholm University Acknowledgement Many of the slides in this lecture series modified from slides by others www.dbbm.fiocruz.br/james/lectures.html www.molecularevolution.org Some slides scavenged from the internet, source no longer known... The questions Refreshing the basics, cladistics What is a phylogeny? Why are we interested in phylogenies? Why should we use molecular data (sequences) to infer phylogenies?

The very basic facts What we see today in nature is the outcome of what has happened in the past Ecology and evolution are inseparable Species are not individual entities without any connections to other species - phylogeny What is a phylogeny? A phylogeny is the historical genealogy of a group of species Envisioned as a dichotomously branching tree A phylogeny cannot be observed A phylogenetic hypothesis can be inferred from observed data The grand meaning Phylogenies the Tree of Life With phylogenies we are attempting to get a good working framework for Life Getting to the root of how evolution has worked The rise of systematics Within the last 15 years the number of phylogenetic studies has skyrocketed Largely due to the advent of easy DNA sequencing methods Promises to help us understand biodiversity and evolutionary processes better

Systematics is. Systematics is the scientific study of the kinds and diversity of organisms and of all relationships among them Provides the essential framework without which we could not recognise or study biological diversity & evolution Why molecular systematics? Ease of data generation for large numbers of taxa Relative ease of generating a large number of independent data sets for given taxa Molecular characters behind the morphological characters we see Molecules as documents of evolutionary history We may ask the question where in the now living systems the greatest amount of information of their past history has survived and how it can be extracted Best fit are the different types of macromolecules (sequences) which carry the genetic information Zuckerkandl & Pauling 1965 Molecules as documents of evolutionary history Number of sequences in GenBank Currently 100 billion bases from over 165,000 species

Systematics (traditionally) comprises: Classification - ordering of organisms into groups (taxa) - can be phenetic or phylogenetic Nomenclature - the naming of organisms Identification - placement of a new organism into a previously described group Systematics (today) In addition to the previous Is part of evolutionary biology Provides Provides the temporal aspect for studying evolution The fundamental unit Genetic relationships exist between individuals within populations These include ancestor-descendent relationships and more indirect relationships based on common ancestry Within sexually repro producing populations there is a network of relationships Genetic relations within populations can be measured with a coefficient of genetic relatedness Phylogenetic Relationships Phylogenetic relationships exist between lineages (e.g. species, genes) These include ancestor-descendent relationships and more indirect relationships based on common ancestry Phylogenetic relationships between species or lineages are (expected to be) tree-like Phylogenetic relationships are not usually measured with a simple coefficient

Phylogeny Tokogeny Phylogenetic Relationships Traditionally phylogeny reconstruction was dominated by the search for ancestors, and ancestor-descendant relationships In modern phylogenetics there is an emphasis on indirect relationships Given that all lineages are related, closeness of phylogenetic relationships is a relative concept. Phylogenetic relationships Two lineages are more closely related to each other than to some other lineage if they share a more recent common ancestor - this is the cladistic concept of relationships Phylogenetic hypotheses are hypotheses of common ancestry Hypothetical ancestral lineage Toad Oak (,Toad)Oak Characters and Character States Organisms comprise sets of features When organisms/taxa differ with respect to a feature (e.g. its presence or absence or different nucleotide bases at specific sites in a sequence) the different conditions are called character states The collection of character states with respect to a feature constitute a character

Character evolution Heritable changes (in morphology, gene sequences, etc.) produce different character states Similarities and differences in character states provide the basis for inferring phylogeny (i.e. provide evidence of relationships) The utility of this evidence depends on how often the evolutionary changes that produce the different character states occur independently Unique and unreversed characters Given a heritable evolutionary change that is unique and unreversed (e.g. the origin of hair) in an ancestral species, the presence of the novel character state in any taxa must be due to inheritance from the ancestor Similarly, absence in any taxa must be because the taxa are not descendants of that ancestor Unique and unreversed characters The novelty is a homology acting as badge or marker for the descendants of the ancestor The taxa with the novelty are a clade (e.g. Mammalia) Unique and unreversed characters Because hair evolved only once and is unreversed (not subsequently lost) it is homologous and provides unambiguous evidence for of relationships Lizard change or step Dog HAIR absent present

Homoplasy - Independent evolution Homoplasy is similarity that is not homologous (not due to common ancestry) It is the result of independent evolution (convergence, parallelism, reversal) Homoplasy can provide misleading evidence of phylogenetic relationships (if mistakenly interpreted as homology) Homoplasy - independent evolution Loss of tails evolved independently in humans and frogs - there are two steps on the true tree Lizard Dog TAIL (adult) absent present Homoplasy - misleading evidence of phylogeny If misinterpreted as homology, the absence of tails would be evidence for a wrong tree: grouping humans with frogs and lizards with dogs Lizard Dog TAIL absent present Homoplasy - reversal Reversals are evolutionary changes back to an ancestral condition As with any homoplasy,, reversals can provide misleading evidence of relationships True tree Wrong tree 1 2 3 4 5 6 7 8 9 10 1 2 7 8 3 4 5 6 9 10

Homoplasy - a fundamental problem of phylogenetic inference If there were no homoplastic similarities inferring phylogeny would be easy - all the pieces of the jig-saw would fit together neatly Distinguishing the misleading evidence of homoplasy from the reliable evidence of homology is a fundamental problem of phylogenetic inference Homoplasy and Incongruence If we assume that there is a single correct phylogenetic tree then: When characters support conflicting phylogenetic trees we know that there must be some misleading evidence of relationships among the incongruent or incompatible characters Incongruence between two characters implies that at least one of the characters is homoplastic and that at least one of the trees the character supports is wrong Incongruence or Incompatibility Lizard Dog HAIR absent present These trees and characters are incongruent - both trees cannot be correct, at least one is wrong and at least one character must be homoplastic Lizard Dog TAIL absent present Distinguishing homology and homoplasy Morphologists use a variety of techniques to distinguish homoplasy and homology Homologous features are expected to display detailed similarity (in position, structure, development) whereas homoplastic similarities are more likely to be superficial As recognised by Charles Darwin congruence with other characters provides the most compelling evidence for homology

The importance of congruence The importance, for classification, of trifling characters, mainly depends on their being correlated with several other characters of more or less importance. The value indeed of an aggregate of characters is very evident... a classification founded on any single character, however important that may be, has always failed. Charles Darwin: Origin of Species, Ch. 13 Congruence We prefer the true tree because it is supported by multiple congruent characters Lizard MAMMALIA Hair Single bone in lower jaw Dog Lactation etc. Homoplasy in molecular data Incongruence and therefore homoplasy is common in molecular sequence data There are a limited number of alternative character states ( e.g. Only A, G, C, T and gap in DNA) Rates of evolution are sometimes high Homoplasy in molecular data Character states are chemically identical homology and homoplasy are equally similar cannot be distinguished by detailed study of similarity and differences

Parsimony analysis Parsimony methods provide one way of choosing among alternative phylogenetic hypotheses The parsimony criterion favours hypotheses that maximise congruence and minimise homoplasy It depends on the idea of the fit of a character to a tree Character Fit Initially, we can define the fit of a character to a tree as the minimum number of steps required to explain the observed distribution of character states among taxa Cocodile Bird Kangeroo Character Fit Bat Kangeroo Cocodile Bat Bird This is determined by parsimonious character optimization Characters differ in their fit to different trees Tree A 1 step Hair absent present Tree B 2 steps

Parsimonious Character Optimization 0 0 1 1 0 A B C D E * = 1 => 0 origin and reversal (ACCTRAN) * 0 => 1 = OR parallelism 2 separate origins 0 => 1 (DELTRAN) Homoplastic characters often have alternative equally parsimonious optimizations Commonly used varieties are: ACCTRAN - accelerated transformation DELTRAN - delayed transformation Consequently, branch lengths are not always fully determined * * * Missing data Missing data is ignored in tree building but can lead to alternative equally parsimonious optimizations in the absence of homoplasy 1?? 0 0 A B C D E Abundant missing data can single lead to multiple equally origin parsimonious trees. 0 => 1 on any one of 3 branches This may be a problem with molecular data taken from e.g. GenBank Parsimony Analysis Given a set of characters, such as aligned sequences, parsimony analysis works by determining the fit (number of steps) of each character on a given tree The sum over all characters is called Tree Length Most parsimonious trees (MPTs( MPTs) ) have the minimum tree length needed to explain the observed distributions of all the characters Parsimony in practice T A X A FIT Bird Crocodile Kangeroo Bat Tree 1 Tree 2 CHARACTERS 1 2 3 4 5 6 amnion hair - + + + + + 1 1 lactation placenta antorbital fenestra wings - - - - - - - - + + - - - + - + + - - - + + + - + + + + - TREE - LENGTH 1 1 1 1 2 7 2 2 2 2 1 10 Tree 1 Tree 2 Of these two trees, Tree 1 has the shortest length and is the most parsimonious Both trees require some homoplasy (extra steps) Bird Cocodile 6 5 Cocodile Kangeroo 5 Kangeroo 1 Bat 1 4 2 2 3 Bat 3 Bird 5 2 3 6 6 4 4

Results of parsimony analysis One or more most parsimonious trees Hypotheses of character evolution associated with each tree (where and how changes have occurred) Branch lengths (amounts of change associated with branches) Various tree and character statistics describing the fit between tree and data Suboptimal trees - optional Characters and character states molecular data Seq 1 Seq 2 Seq 1 AGCGAG Seq 2 GCGGAC Number of changes 1 2 3 C G T A C A 1 Character types Characters may differ in the costs (contribution to tree length) made by different kinds of changes Wagner (ordered, additive) 0 1 2 (morphology, unequal costs) Fitch (unordered, non-additive) A G (morphology, molecules) T C (equal costs for all changes) one step two steps

Character types Sankoff (generalised) A G (morphology, molecules) one step T C (user specified costs) five steps For example, differential weighting of transitions and transversions Costs are specified in a stepmatrix Costs are usually symmetric but can be asymmetric also (e.g. costs more to gain than to loose a restriction site) Guanine-cytosine bonding Adenine-thymine bonding Stepmatrices Stepmatrices specify the costs of changes within a character transversions Py Pu PURINES (Pu) A G T C PYRIMIDINES (Py) transitions Py Py Pu Pu To A C G T A 0 5 1 5 From C 5 0 5 1 G 1 5 0 5 T 5 1 5 0 Different characters (e.g 1st, 2nd and 3rd) codon positions can also have different weights Weighted parsimony If all kinds of steps of all characters have equal weight then parsimony: Minimises homoplasy (extra steps) Maximises the amount of similarity due to common ancestry Minimises tree length If steps are weighted unequally parsimony minimises tree length - a weighted sum of the cost of each character

Why weight characters? Many systematists consider weighting unacceptable, but weighting is unavoidable (unweighted( = equal weights) Transitions may be more common than transversions Different kinds of transitions and transversions may be more or less common Rates of change may vary with codon positions The fit of different characters on trees may indicate differences in their reliabilities Number of Characters 250 200 150 100 50 0 0 Ciliate SSUrDNA data 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Number of steps However,, equal weighting is the commonest procedure Different kinds of changes differ in their frequencies From A C G T To A C G T Transitions Transversions Unambiguous changes on most parsimonious tree of Ciliate SSUrDNA Parsimony - advantages is a simple method - easily understood operation does not seem to depend on an explicit model of evolution gives both trees and associated hypotheses of character evolution should give reliable results if the data is well structured and homoplasy is either rare or widely (randomly) distributed on the tree is not affected by heterogenous rates of evolution in different lineages (heterotachy) Parsimony - disadvantages May give misleading results if homoplasy is common or concentrated in particular parts of the tree, e.g: - base composition biases - long branch attraction Underestimates branch lengths Model of evolution is implicit - behaviour of method not well understood Parsimony often justified on purely philosophical grounds - we must prefer simplest hypotheses - particularly by morphologists For most molecular systematists (it seems) this is uncompelling

Parsimony disadvantages? "despite the lack of an explicitly specified evolutionary model in parsimony analysis, there are reasons to believe that parsimony makes very stringent and unrealistic assumptions about substitution processes Yang and Goldman 1998 The model of evolutionary process is explicit in maximum-likelihood reconstruction, whereas it is largely implicit in other methods such as parsimony and neighbor joining - Bull et al 1993 Parsimony can be inconsistent Felsenstein (1978) developed a simple model phylogeny including four taxa and a mixture of short and long branches Under this model parsimony will give the wrong tree A Model tree p p q q C q D B Rates or Branch lengths p >> q Parsimony tree A C B Wrong With more of the same kind of data the certainty that parsimony will give the wrong tree increases - so that parsimony is said to be statistically inconsistent (see Bergsten 2005) It is now recognised that the long-branch attraction (in the Felsenstein Zone) ) is one of the most serious problems in phylogenetic inference D Long branches are attracted but the similarity is homoplastic Can this be accounted for? Perhaps, by using a model of what may happen over evolutionary time Modeling methods are very popular