Phylogeny: building the tree of life

Similar documents
BINF6201/8201. Molecular phylogenetic methods

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Phylogeny: traditional and Bayesian approaches

Phylogenetic Tree Reconstruction

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Dr. Amira A. AL-Hosary

Algorithms in Bioinformatics


POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Multiple Sequence Alignment. Sequences

Evolutionary Tree Analysis. Overview

Theory of Evolution Charles Darwin

Molecular Evolution and Phylogenetic Tree Reconstruction

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

What is Phylogenetics

Phylogenetic inference

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Constructing Evolutionary/Phylogenetic Trees

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Phylogeny and Molecular Evolution. Introduction

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Theory of Evolution. Charles Darwin

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

Organizing Life s Diversity

Phylogeny and Molecular Evolution. Introduction

9/19/2012. Chapter 17 Organizing Life s Diversity. Early Systems of Classification

A (short) introduction to phylogenetics

8/23/2014. Phylogeny and the Tree of Life

Phylogenetic Trees. How do the changes in gene sequences allow us to reconstruct the evolutionary relationships between related species?

Chapter 16: Reconstructing and Using Phylogenies

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

EVOLUTIONARY DISTANCES

Constructing Evolutionary/Phylogenetic Trees

Lecture 11 Friday, October 21, 2011

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

A Phylogenetic Network Construction due to Constrained Recombination

Phylogenetic analyses. Kirsi Kostamo

molecular evolution and phylogenetics

Chapter 19: Taxonomy, Systematics, and Phylogeny

Phylogenetic trees 07/10/13

Lecture 6 Phylogenetic Inference

Cladistics and Bioinformatics Questions 2013

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Classification, Phylogeny yand Evolutionary History

How should we organize the diversity of animal life?

Phylogenetics. BIOL 7711 Computational Bioscience

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

How to read and make phylogenetic trees Zuzana Starostová

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

Phylogeny Tree Algorithms

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Modern Evolutionary Classification. Section 18-2 pgs

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

C.DARWIN ( )

Macroevolution Part I: Phylogenies

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

Phylogenetics: Building Phylogenetic Trees

Biology. Slide 1 of 24. End Show. Copyright Pearson Prentice Hall

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Phylogeny. November 7, 2017

The practice of naming and classifying organisms is called taxonomy.

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Chapter 26 Phylogeny and the Tree of Life

Lecture V Phylogeny and Systematics Dr. Kopeny

Estimating Evolutionary Trees. Phylogenetic Methods

Name: Class: Date: ID: A

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Lecture 18 Molecular Evolution and Phylogenetics

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

PHYLOGENY & THE TREE OF LIFE

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Classification and Phylogeny

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Introduction to characters and parsimony analysis

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Phylogenetic inference: from sequences to trees

Phylogeny. Properties of Trees. Properties of Trees. Trees represent the order of branching only. Phylogeny: Taxon: a unit of classification

Multiple Alignment. Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Analysis

Phylogenetic Analysis

Phylogenetic Analysis

Intraspecific gene genealogies: trees grafting into networks

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Hierarchical Clustering

Transcription:

Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan http://faculty.pieas.edu.pk/fayyaz/ Acknowledgements: Most of these slides are taken from Lectures by Dr. Asa-Ben-Hur, CSU.

Phylogeny and Phylogenetic Trees The biological evidence suggests that all life evolved from a common ancestor Phylogeny is the study of the evolutionary histroy of a group of species (taxa) The relationship can be represented using a phylogenetic tree Orangutan Gorilla Chimpanzee Human

Phylogeny and Phylogenetic Trees The leaves of the tree corresponds to extant taxa Internal nodes correspond to speciation events, and represent ancestral taxa. Branches define the relationships among the taxa (descent and ancestry) A phylogenetic tree is a binary tree. The branch length usually represents the amount of time since speciation. Orangutan Gorilla Chimpanzee Human

Objectives Reconstruct the ancestry of species, i.e. find the tree of life and To estimate the time of divergence between organisms since they last shared a common ancestor.

Cladograms and Phylogenetic trees from Phonotypical characteristics See video!

A Phylogenetic Tree a b g d e A (rooted) tree for 5 species. Leaves a, b, g, d, and e, correspond to contemporary species, for which data has been collected Internal nodes correspond to (inferred) ancestors

Using Morphological Characteristics Here, the 0s and 1s are indicating the presence or absence of a character (has feathers?, lays eggs?, curved beak?, flies?,... ).

Raccoon or Bear? Based on anatomical and behavioral characteristics, the panda was originally classified as a raccoon (1870) In 1985, the panda was re-classified as a bear when an analysis based on molecular data was done

The Platypus (Ornithorhynchus anatinus) is a venomous, egg-laying, duck-billed mammal; the sole representative of its family (Ornithorhynchidae) and genus (Ornithorhynchus) Platypus

Need for Other Approaches Hard to resolve relationships using morphology and behavior alone 1. Similar characteristics can evolve independently in distantly related organisms convergent-evolution 2. It is often difficult to find characteristics that are common to all the organisms under study

Inferring Phylogenies Trees can be inferred by several criteria: Morphology of the organisms Can lead to mistakes! Sequence comparison Example: Orc: Elf: Dwarf: Hobbit: Human: ACAGTGACGCCCCAAACGT ACAGTGACGCTACAAACGT CCTGTGACGTAACAAACGA CCTGTGACGTAGCAAACGA CCTGTGACGTAGCAAACGA

Humans

What is phylogeny good for? Evolutionary history ( tree of life ) Population history Origins of diseases Prediction of sequence function MSA Can be applied to organisms, sequences, viruses, languages, etc.

The Tree of Life and Human Health

West Nile virus Lanciotti et al. (1999)

Bird flu phylogeny

Zika Virus Lanciotti, Robert S., Amy J. Lambert, Mark Holodniy, Sonia Saavedra, and Leticia del Carmen Castillo Signor. 2016. Phylogeny of Zika Virus in Western Hemisphere, 2015. Emerging Infectious Diseases 22 (5). doi:10.3201/eid2205.160065. 19

Richardson et al. (2006)

kujira

Zuckerkandl & Pauling (1962) Zuckerkandl & Pauling (1962) demonstrated that molecular sequences provide large amounts of information and help understand evolutionary relationships

Sequence Data Nowadays, biologists rely on molecular sequence data, in particular DNA or RNA sequences, which allows the comparison of a broader range of species. E. coli, yeast, clam shell, human,.

Species vs. Gene Trees The taxonomic units (nodes of the tree) can represent genes, species or populations. A gene tree represents the evolutionary history of a single gene; e.g. the evolution of the globin family. A species tree will generally be built using multiple genes. We will focus mainly on species trees.

Methods for Tree Reconstruction Distance based Tree is constructed on the basis of a distance measure between species. Maximum parsimony The tree that explains the data with the smallest number of evolutionary events. Maximum likelihood Statistical method of phylogeny reconstruction Explicit model for how data set is generated with nucleotide or amino acid substitutions Find topology that maximizes the probability of the observed data given the model and the parameter values (estimated from data)

Distance Based Tree Reconstruction Definition: A distance measure d ij between n species is said to be a tree-derived distance if there exists a tree T with the species at the leaves such that d ij is the sum of the branch lengths along a path between i and j. Tree-derived distances are typically called additive distances. In general we don t have available to us the real treedistance. Given surrogate information about the treedistance, we would like to obtain the best estimate of the phylogenetic tree.

Tree Reconstruction - Ultrametric Case If: Pairwise distances d ij among n extant species satisfy the Ultrametric property Then: There exists a unique rooted tree (up to trivial relabeling) with these species as leaves, that gives these distances. Thus the original tree can be recovered exactly. Proof: See Ewens and Grant Statistical Methods in Bioinformatics (2 nd edition)

UPGMA (Unweighted Pair Group Method using Averages) Uses a hierarchical clustering method. Initially each species is in its own cluster. The algorithm sequentially combines two clusters and forms a new node in the tree. First define distance d ij between two clusters C i and C j to be the average distance between pairs of species from each cluster: where C i denotes the number of species in cluster C i. Note that if C k is the union of C i and C j and C l is another cluster, then This is also known as average linkage cluster analysis Alternatives are maximum or minimum linkage

UPGMA Initialization: Assign each species i to its own cluster C i Define a leaf of T for each species. Place it at height zero. Iteration: Find i,j so that d ij is minimal (in case of ties, choose randomly) Define new cluster k by C k = C i union C j. Define d kl for all l by Define a node k with daughter nodes i and j and place it at height d ij /2. Add k to the current clusters and remove i and j. Termination: When only two clusters i,j remain, place the root at height d ij /2.

UPGMA Example (1) (3) (4) (2)

Neighbor joining Other methods constructs unrooted trees

Parsimony One of the most commonly used methods for tree reconstruction Uses sequence information directly rather than distances that reflect sequence similarity. The idea: find a tree that explain the observed sequences with minimal number of substitutions (Occam s razor)

Parsimony Assign a score to a tree (with ancestral sequences included) as the total number of substitutions along all branches of the tree. The objective: find the tree that minimizes this score Minimization is over all choices of inferred ancestral sequences and tree topologies.

Parsimony Suppose we are given the four sequences: AAG, AAA, GGA, AGA For each possible topology for a rooted tree with 4 leaves, assign sequences to the ancestral nodes such that the total number of changes needed is minimized. Minimize this number over all trees

Parsimony Two step process: 1. Given a tree topology and an assignment of residues to the leaf nodes, find an assignment of residues to internal nodes that minimizes the number of mutations (note parsimony treats each position independently) This is sometimes called the SMALL PARSIMONY PROBLEM. Solvable using dynamic programming 2. Search over all possible trees and find the best tree, that is, the tree having the minimal parsimony score. This is sometimes called the LARGE PARSIMONY PROBLEM.

Large Parsimony Exhaustive Search For each unrooted tree with n leaves, calculate the minimum parsimony score and choose the tree with minimum score. Feasible only for small n. Large parsimony is NP complete Parsimony score is independent of root position.

Maximum Likelihood phylogeny Given a tree T with n leaves, where sequence x j is at leaf node j, and a set of edge lengths denoted by t 1, t 2,..., t 2n-2 (written as t), we want to define P(x T, t) This requires a model of evolution probabilities of mutation events that change sequences along the edges of the evolutionary tree

Probability Model Define: P(x y, t ) the probability that a given ancestral sequence y will evolve into x along an edge of length t. x t y t 1 x 1 x 5 t 4 x 4 t 2 t 3 x 2 x 3 For any set of ancestral sequences assigned to internal nodes, we can calculate the probability of observing such a fully specified tree.

Maximum Likelihood Tree The tree with topology T and edge lengths t that maximizes P(x T, t) Finding the maximum likelihood tree involves a search over: Tree topologies Assignment to ancestor sequences Edge lengths t

Phylogenetic Trees using MEGA MEGA is a commonly used tool for this purpose

MEGA steps BLAST the given sequence to get more sequences must be done carefully Construct MSA using MUSCLE or CLUSTALW MUSCLE is better Construct tree Distance based (distance matrix ) Parsimony based Maximum likelihood based Generate confidence values (using bootstrapping) Optional but recommended

For Dengue viruses http://faculty.pieas.edu.pk/fayyaz/ippy/html_ demos/bioinformatics.html

End of Lecture http://io9.gizmodo.com/5936427/the-evolutionary-history-of-dragons-illustrated-by-a-scientist 45