"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Similar documents
Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

BINF6201/8201. Molecular phylogenetic methods

Dr. Amira A. AL-Hosary

Constructing Evolutionary/Phylogenetic Trees

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Phylogenetic inference

C.DARWIN ( )

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Multiple Sequence Alignment. Sequences

What is Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Phylogenetic Tree Reconstruction

Constructing Evolutionary/Phylogenetic Trees

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Theory of Evolution Charles Darwin

Evolutionary Tree Analysis. Overview

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Algorithms in Bioinformatics

EVOLUTIONARY DISTANCES


8/23/2014. Phylogeny and the Tree of Life

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Lecture 11 Friday, October 21, 2011

How to read and make phylogenetic trees Zuzana Starostová

Lecture 6 Phylogenetic Inference

Phylogenetics: Building Phylogenetic Trees

Classification and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Quantifying sequence similarity

Introduction to Molecular Phylogeny

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Classification and Phylogeny

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Cladistics and Bioinformatics Questions 2013

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Theory of Evolution. Charles Darwin

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny: building the tree of life

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

A (short) introduction to phylogenetics

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Evolutionary Analysis of Viral Genomes

Probabilistic modeling and molecular phylogeny

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Phylogenetic analyses. Kirsi Kostamo

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Introduction to characters and parsimony analysis

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Understanding relationship between homologous sequences

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Concepts and Methods in Molecular Divergence Time Estimation

Chapter 19: Taxonomy, Systematics, and Phylogeny

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

PHYLOGENY AND SYSTEMATICS

FUNDAMENTALS OF MOLECULAR EVOLUTION

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Introduction to Bioinformatics Introduction to Bioinformatics

Phylogenetic Analysis

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Copyright notice. Molecular Phylogeny and Evolution. Goals of the lecture. Introduction. Introduction. December 15, 2008

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Phylogeny. November 7, 2017

Lecture Notes: BIOL2007 Molecular Evolution

Macroevolution Part I: Phylogenies

Phylogeny and Molecular Evolution. Introduction

Name: Class: Date: ID: A

How should we organize the diversity of animal life?

Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

7.36/7.91 recitation CB Lecture #4

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology

Chapter 16: Reconstructing and Using Phylogenies

Phylogenetic Analysis

Phylogenetic Analysis

Practical Bioinformatics

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Seuqence Analysis '17--lecture 10. Trees types of trees Newick notation UPGMA Fitch Margoliash Distance vs Parsimony

Intraspecific gene genealogies: trees grafting into networks

Phylogeny and Molecular Evolution. Introduction

Molecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences

Molecular Evolution, course # Final Exam, May 3, 2006

Reconstructing the history of lineages

Phylogenetic Tree Generation using Different Scoring Methods

Unit 9: Evolution Guided Reading Questions (80 pts total)

Transcription:

MOLECULAR PHYLOGENY

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally and functionally from their ancestors - biological process by which organisms inherit morphological and physiological features that define a species Darwin 1859 On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for life http://www.literature.org/authors/darwin-charles/the-origin-of-species/

Principles of evolution At the molecular level evolution is a process of mutation with selection Reproduction Variation Competition/selective pressure

Phylogeny Inference of evolutionary relationships Molecular phylogeny uses sequence information (as opposed to other characteristics frequently used in the past such as morphological features) * Lesk chapter 4 * Handout * Lecture slides

How is evolutionary time related to molecular changes in DNA and protein? Comparison of protein and gene sequences from different organims gave rise to the : Molecular clock hypothesis For every given gene or protein, the rate of molecular evolution is approximately constant.

Molecular clock hypothesis * Rates of change are different for each protein * These differences reflect functional constraints imposed by natural selection Richard Dickerson, 1971

A molecular clock may be used in the estimation of time of divergence between two species r = K / 2T or T = K/2r where r = rate of nucleotide substitution (known from fossil records) K = number of substitutions K between the two homologous sequences T = Time of divergence between the two species

However, there are cases where the molecular clock is very inaccurate: - The rate of evolution varies among different organisms. Examples: * viral sequences tend to change very rapidly as compared to other life forms * Rodents have a faster molecular clock than primates

Goals of molecular phylogeny Deduce the correct trees for all species of life

Nomenclature of trees nodes branch external (OTUs) internal root connects 2 nodes OTUs are existing sequences / / species / populations / individuals an internal node is an inferred ancestor (not observed) unscaled tree scaled tree

Cladogram Branches are unscaled (OTUs aligned in a vertical column) Phylogram Branches are scaled, branch lengths are proportional to the number of amino acid or nucleotide changes that occured between sequences

Goals of molecular phylogeny Deduce the correct trees for all species of life Topology Branch lengths

A tree is multifurcated if it has a node with three or more branches (In a bifurcated tree any branch that divides splits into two daughter branches)

Root Common ancestor of all sequences in the tree Rooted tree Root Unique path from the root to each of the other nodes Direction of each path corresponds to evolutionary time Unrooted tree No root No complete definition of evolutionary path Direction of time is not determined

unrooted tree Two methods of rooting: Outgroup. Phylogenetically distant organism is added to the set of sequences. Midpoint rooting. Longest branch is selected as site for rooting.

Comparing the numbers of rooted and unrooted trees - 3 OTUs

and 4 OTUs

Phylogenetic analysis - Selection of sequences for analysis - Multiple sequence alignment - Construction of tree - Evaluation of tree

What sequences to use? DNA? RNA? protein?

Slowly changing sequences * Protein * ribosomal RNA, for instance 16S rrna Useful for comparing widely divergent species. Ribosomal database (rdp.cme.msu.edu) > 50,000 aligned sequences More rapidly changing sequences * DNA * Mitochondrial DNA Useful for comparing more closely related species or populations within a species.

Two homologous protein sequences are more similar than the corresponding DNA sequences. This is to a large extent related to the degeneracy of the genetic code Seq 1 GGC AAG CGA AGU Seq 2 GGA AGA CGT UCA Seq 1 G R R S Seq 2 G K R K

Synonymous and non-synonymous changes Human atg gga caa aag Mouse atg ggc caa gag Human M G Q K Mouse M G Q E Comparison of the rates of nonsynonymous substitution (N) versus synonymous substitution (S) may reveal evidence of positive or negative selection S > N Negative selection. Change in amino acid sequence is restricted because sequence is important for protein function N > S Positive selection. Example: Duplicated gene is under pressure to evolve new function.

Approximate rates of substitution (number of substitutions per site & billion years) rrna ~ 0.1 protein 0.01-10 Hypervariable regions in mitochondria 10 HIV (RNA virus) >1000

Step 2. Producing the multiple alignment Phylogeny is one out of many applications of multiple alignments: * Identify conserved motifs - patterns (PROSITE) * Profiles (Pfam) * Prediction of protein secondary structure

Step 2. Producing the multiple alignment Alignment may be produced * using software such as CLUSTAL or TCOFFEE * or using protein three dimensional structure information (structural alignment)

Inspecting and processing the multiple alignment Critical issues - alignment should contain only homologous sequences - no partial sequences - overall identity should be significant ensuring that alignment is correct - gaps should be avoided - columns containing gaps may well be removed prior to phylogenetic analysis GGGCGGCGAGGCATTTATCGGGGGGTTGCAAAAT GGGCGGTGAGGCATTTATCGGGGGGTTGCAAAAT GGGCGGCGAAGCATAAATCGGGGAGTTGCAAAAT GGGCGGCGAGGCATTTATCGGGGGGTTGCGAAAT GGGCGGCGAGGCATTTATCGGGGGGCTGCAAAAT

Step 3. Construction of the phylogenetic tree Distance methods Character methods Maximum parsimony Maximum likelihood

Distance methods Simplest distance measure: Consider every pair of sequences in the multiple alignment and count the number of differences. Degree of divergence = Hamming distance (D) D = n/n where N = alignment length n = number of sites with differences Example: AGGCTTTTCA AGCCTTCTCA D = 2/10 = 0.2

Problem with distance measure: As the distance between two sequences increases, the the probability increases that more than one mutation has occured at any one site. Therefore, methods have been developed to compensate for this

Corrected distances Jukes and Cantor Kimura two parameter model rate of transitions is different from rate of transversions P = the fraction of sequence positions differing by a transition Q = the fraction of sequence positions differing by a transversion.

Distance methods UPGMA (unweighted pair group method with arithmetic mean) Neighbor-joining