Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Similar documents
Michael Yaffe Lecture #4 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees


Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

BINF6201/8201. Molecular phylogenetic methods

Evolutionary Tree Analysis. Overview

Phylogenetic analyses. Kirsi Kostamo

Phylogeny Tree Algorithms

A (short) introduction to phylogenetics

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Algorithms in Bioinformatics

Phylogenetic inference

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

C3020 Molecular Evolution. Exercises #3: Phylogenetics

What is Phylogenetics

Theory of Evolution. Charles Darwin

Theory of Evolution Charles Darwin

Phylogenetic Tree Reconstruction

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Dr. Amira A. AL-Hosary

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Phylogenetic Analysis

Phylogeny: traditional and Bayesian approaches

Phylogeny. November 7, 2017

Phylogeny: building the tree of life

Phylogenetics. BIOL 7711 Computational Bioscience

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

A Phylogenetic Network Construction due to Constrained Recombination

Phylogenetics: Parsimony

Quantifying sequence similarity

Phylogenetic Analysis

Phylogenetic Analysis

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Lecture 6 Phylogenetic Inference

Introduction to characters and parsimony analysis

Multiple Sequence Alignment. Sequences

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic inference: from sequences to trees

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Molecular Evolution & Phylogenetics

How to read and make phylogenetic trees Zuzana Starostová

Intraspecific gene genealogies: trees grafting into networks

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

X X (2) X Pr(X = x θ) (3)

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

Cladistics and Bioinformatics Questions 2013

8/23/2014. Phylogeny and the Tree of Life

Evolutionary Models. Evolutionary Models

Is the equal branch length model a parsimony model?

Phylogeny. Properties of Trees. Properties of Trees. Trees represent the order of branching only. Phylogeny: Taxon: a unit of classification

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Lecture 11 Friday, October 21, 2011

Phylogenetics Todd Vision Spring Some applications. Uncultured microbial diversity

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

Phylogeny and Molecular Evolution. Introduction

Comparative Bioinformatics Midterm II Fall 2004

EVOLUTIONARY DISTANCES

Consistency Index (CI)

Bayesian Models for Phylogenetic Trees

Classification and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Classification and Phylogeny

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

TheDisk-Covering MethodforTree Reconstruction

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Processes of Evolution

How should we organize the diversity of animal life?

Finding the best tree by heuristic search

Workshop III: Evolutionary Genomics

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Seuqence Analysis '17--lecture 10. Trees types of trees Newick notation UPGMA Fitch Margoliash Distance vs Parsimony

Chapter 27: Evolutionary Genetics

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Biology 211 (2) Week 1 KEY!

Chapter 26 Phylogeny and the Tree of Life

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Chapter 19: Taxonomy, Systematics, and Phylogeny

Molecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

7. Tests for selection

Transcription:

7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D)

Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood Branch and Bound Heuristic Seaching Consensus Trees Software (PHYLIP, PUP) The Tree of Life

Transformed Distance Method UPM assumes constant rate Of evolution across all lineages - can lead to wrong tree topologies Can allow different rates of evolution across different lineages if you normalize using an external reference that diverged early i.e. use an outgroup Define d D =average distance B C D Between outgroup and all ingroups d ij = (d ij d id d jd )/2 + d D Now use d ij to do the clustering..basically just comes from the insight that ingroups evolved separately from each other ONLY FTER they diverged from outgroup

Example Species B C B 9 d B is distance Between and B C 8 11 D 12 15 10 B C D 6 Use D as outgroup 3 3 6 Species B 2 1 B 10/3 d D = 37/3 C 16/3 16/3 Now use UPM to build tree

Neighbor s Relation Method Variant of UPM that pairs species in a way that creates a tree with minimal overall branch lengths. Pairs of sequences separated by only 1 node are said to be neighbors. single central branch a c C e terminal branches B b d D For this tree topology d C +d BD = d D +d BC = a + b + c + d + 2e =d B + d CD +2e For neighbor relations, four-point condition will be true: d B +d CD < d C +d BD and d B +d CD < d D +d BC So just have to consider all pairwise arrangements and determine which one satisfies the four-point condition.

Neighbor-Joining Methods Start with star-like tree. Find neighbors sequentially to minimize total length of all branches B C C D B D Studier & Kepler 1988: Q 12 =(N-2)d 12 - Σd 1i - Σd 2i Where any 2 sequences can be 1 and 2 Try all possible sequence combinations. Whichever combination of pairs gives the smallest Q 12 is the final tree!

Maximum Likelihood purely statistical method. Probablilities for every nucleotide substitution in a set of aligned sequences is considered. Calculation of probabilities is complex since ancestor is unknown Test all possible trees and calculate the aggregate probablility. Tree with single highest aggregate probablilty is the most likely to reflect the true phylogenetic tree. VERY COMPUTTIONLLY INTENSE

: a derogatory term from the 1930s and 1940s To describe someone who was especially careful with Spending money. Biologically: ttach preference to an evolutionary pathway That minimizes the number of mutational events since (1) Mutations are rare events, and (2) The more unlikely events a model postulates, the less l likely the model is to be true. : a character-based method, NOT a distance-based method.

For parsimony analysis, positions in a sequence alignment fall into one of two categories: informative and uninformative. Position Sequence 1 2 3 4 5 6 1 2 T 3 T 4 T C T Only 3 possible unrooted trees you can make

Position Sequence 1 2 3 4 5 6 1 2 T 3 T 4 T C T 1 3 1 2 1 2 2 4 3 4 4 3 Which tree is the right one?

Position Sequence 1 2 3 4 5 6 1 2 3 4 T T T C T 1 3 1 2 1 2 2 4 3 3 4 4 Invariant positions contain NO INFORMTION uninformative

Position Sequence 1 2 3 4 5 6 1 2 3 4 T T T C T 1 3 1 2 1 2 2 4 3 3 4 4 Equally uninformative need one mutation in each tree

Position Sequence 1 2 3 4 5 6 1 2 3 4 T T T C T 1 3 1 1 2 * 2 * 2 * * T 4 * * 3 T 4 4 T 3 lso uninformative need two mutations in each tree

Position Sequence 1 2 3 4 5 6 1 2 3 4 T T C T T 1 T 3 1 2 1 2 * T * * * * 2 * * C 4 * * 3 T C 4 4 C T 3 lso uninformative need three mutations in each tree

Position Sequence 1 2 3 4 5 6 1 2 3 T 4 T C T T 1 3 1 1 2 * 2 2 * * 4 * * 3 4 4 3 Informative! need only one mutation in one tree but two In the other trees!

Position Sequence 1 2 3 4 5 6 1 2 3 T 4 T C T T 1 3 1 T 2 1 T * 2 * T * * 2 * T T 4 3 T 4 4 T 3 Informative! need only one mutation in one tree but two In the other trees!

Position Sequence 1 2 3 4 5 6 1 2 3 4 T T C T T So to be Informative, need at least 2 different nucleotides nd each has to be present at least twice. Every tree is considered for every site, maintaining a running score of the number of mutations required. The tree with the smallest number of invoked mutations is the most parsimonious

1 3 1 1 2 * 2 2 * * 4 * * 3 4 4 3 Mathematically, most likely candidate nts at an Internal node are: {descendent node 1} Ω {descendant node 2} IF this is null set, then most likely candidate nts are: {descendent node 1} U {descendant node 2} Σ U = minimum number of substitutions required to account for nts at terminal nodes since they last shared common ancestor Total number of substitutions, informative + uninformative = tree length

1 3 1 1 2 * 2 2 * * 4 * * 3 4 4 3 If you use parsimony but weigh the mutations by some kind of scoring system that accounts for the likelihood of each mutation weighted parsimony By-product of parsimony is inference of nt identity in the ancestral sequence

10 sequences: > 2 million possible trees Need a better way Branch and bound (Hardy and Penny, 1982) Step 1: Determine an upper bound to the length of the most parsimonious tree = L - either chosen randomly, or else using a computationally fast way like UPM Step 2: Starting from a simple tree with just some of the Sequences, grow trees incrementally by adding branches to a smaller tree that describes just some of the sequences. Step 3: If at any point, the number of required substitutions is > L, abandon that tree. Step 4: s soon as you get a tree with fewer substitutions than L, use that tree as the new upper bound to make remainder of the search even more efficient. Works for <= 20 sequences

For > 20 sequences Heuristic Searches ssumption: lternative trees are not all independent of each other. Most parsimonious trees have similar topologies. Step 1: Construct an initial tree as a good guess: UPM, and use it as a starting point. Step 2: Branch-swap subtrees and graft them onto the starting tree, keeping overall topology. See how many are shorter than the starting tree. The prune and re-graft, and see if it keeps getting better. Step 3: Repeat until a round of branch swapping fails to generate any better trees

Often get tens or hundreds of equally parsimonious trees Build a consensus tree any internal node supported by t least half the trees becomes a simple bifurcation.

Phylogenetic Software PHYLIP: Phylogenetics Inference Package free at http://evolution.genetics.washington.edu Includes many programs including various distance methods, maximum likelihood, parsimony, with many of the options we ve discussed. PUP: Phylogenetic nalysis Using http://www.lms.si.edu/pup - NOT FREE Now includes maximum likelihood and distance methods as well

Tree of Life Carl Woese and colleagues, 1970s Used 16S rrn all organisms possess. Found 3 major evolutionary groups: Bacteria Eucarya rchea (including thermophilic bacteria Human Origins: mt DN sequences huan populations differ by ~ 0.33% (very small). reatest differences NOT between current populations on different continents, but between human populations residing in frica out of frica theory Mitochondrial eve and Y-chromosome adam