# Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Size: px
Start display at page:

Download "Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science"

Transcription

1 Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science

2 History Aristotle ( BC) classified animals. He found that dolphins do not belong to the fish but to the mammals. Carolus Linneus (1758) introduced binomial classification Darwin 1859 explained evolution as a process of random mutation and natural selection. Zimmerman in the 1930s and Hennig in the 50 s began to define objective measures for reconstructing evolutionary history based on shared attributes of extant and fossil organisms. They worked on cladistics- the systematic classification of organisms based shared derived properties 1965 Zuckerkandl and Pauling were the first to use molecular sequences as indicators of phylogeny

3 Introduction Goal: reconstruct the evolutionary history of life Carl Woese proposed the third domain or kingdom of life based on ribosomal RNA in 1990.

4 Motivation

5 Topology Unrooted Tree Rooted Tree Root Internal node Leaf node topology - shape of tree, branching order between nodes rotation about a branch does not change the topology

6 Tree representations L3 L4 L1 L2 L5 L6 A B C D Figure 7: Least Squares Tree of Matrix D ((A,B)(C,D)) = ((B,A)(C,D)) = ((C,D),(B,A)) Easy Case I: Ultrametric Tree For an ultrametric tree we assume that a) all proteins evolve at the same time (molecular clock), b) all distances can be measured without error and c) all leaves are equidistant from the root. Tree(Tree(Leaf(A,L1+L3,1),L3,Leaf(B,L2+L3,2)), d(a, X) = d(b, X) = d(c, X) 0, Tree(Leaf(D,L6+L4,4),L4,Leaf d(a, Y ) = d(b, Y ) (C,L5+L4,3))) To compute an ultrametric tree, we can apply following algorithm: X Y

7 Tree Components topology - branching pattern of a tree root- place on the tree from which everything evolves- common ancestor of everything at the leaves external nodes, leaves, taxonomic units internal nodes or hypothetical taxonomic units (HTU) represent speciation or gene duplication events branches or edges - can have a length

8 Rooting a tree Most phylogenetic methods produce unrooted trees. This is because they detect differences between sequences, but have no means to orient residue changes relatively to time. There are two ways to root an unrooted tree: use an outgroup- include a group of sequences known to be outside the group of interest assume a molecular clock- all lineages have evolved with the same rate from their common ancestor (usually not a good assumption)

9 Phylogenetic Trees: graphical representation of the evolutionary history of a set of species Monkey Human Chimp Dog Rat Mouse Cow ancestor of mammals Frog ancestor of vertebrates Puffer fish Possum Puffer fish Zebrafish Zebrafish Chicken Chicken Cow Human Chimp Monkey Dog Puffer fish Mouse Puffer fish Rat Vertebrates Possum Frog Vertebrates

10 Phylogeny, Evolution, and Alignments 789: '())*#+*,-+,-.'/(0-12)*++/ ,20.!!""""#""#"!#!""!"#"\$"%%"!!!!"%!%"#!"\$"&!!! '(*,12-1*.6,+.))(3.'1*!!)/+++(63134.) Rice Corn Dog Fly Mosquito alignment implies an evolutionary relationship also represented by Phylogenetic Tree aligns amino acids that diverged from the same residue in (hypothetical) most recent common ancestor darwinian evolution is driven by random mutation and natural selection our model allows for point mutations and insertions/deletions (indels) mutations may be adaptive, neutral or deleterious alignment shows accepted substitutions since divergence proteins evolve under functional constraints - mutations that destroy function do not appear in database via organism death "correct" alignment represents actual events- substitutions, indels impossible to verify -> take alignment with the highest probability that the alignment is correct under our model

11 String Alignments [Rice, Mosquito] triosephosphate isomerase lengths=55,53 simil=117.9, PAM_dist=111, identity=36.4% NGTTDQVDKIVKILNEGQIASTDVVEVVVSPPYVFLPVVKSQLRPEIQVAAQNCW...!..!.!...!.:....!.:.!...!! NGDKASIADLCKVLTTGPLNAD TEVVVGCPAPYLTLARSQLPDSVCVAAQNCY Similarity Score (Likelihood Based) PAM distance (evolutionary distance) For pairwise string alignments, the dynamic programming algorithm guarantees that the highest scoring alignment is found. Local alignment- find the highest scoring substring Global alignment- find the highest score for aligning the complete strings

12 NGTTDQVDKIVKILNEGQIASTDVVEVVVSPPYVFLPVVKSQLRPEIQVAAQNCW.:.:!.:!.! :..!.:. : :..! :::!..:!.! NGDKASIADLCKVLTTGPLNAD TEVVVGCPAPYLTLARSQLPDSVCVAAQNCY PAM distance lengths=55,53 simil=117.9, PAM_dist=111, identity=36.4% Evolutionary NGTTDQVDKIVKILNEGQIASTDVVEVVVSPPYVFLPVVKSQLRPEIQVAAQNCW distance (not time) definition:...!..!.!...!.:. a 1 PAM transformation.. is.!.:.!...! an evolutionary! step where 1% of NGDKASIADLCKVLTTGPLNAD TEVVVGCPAPYLTLARSQLPDSVCVAAQNCY the amino acids are expected to mutate Definition: A 1 PAM (Accepted Point Mutations) transformation is an M is a mutation matrix for which each element describes a probability of evolutionary step where 1(expected) of the amino acids mutate. a mutation This transformation can be described by a 20 x 20 mutation matrix M where M ij = Pr x j x i. M = i=1 f i (1 M ii ) = 0.01 where f i is the naturally occurring frequency of amino acid i Each mutation matrix is associated with some amount of mutation. For historical reasons the unit of mutation is called a PAM (point accepted mutation) unit. The 1 PAM mutation matrix mutates amino acids with

13 by reasons of common ancestry divided by the probabilities that the pairing would occur by chance. Here is how the elements of the Dayhoff Matrix are calculated from Mutation matrices. The probability of alignment by random chance is just the frequencies of the amino acids in the database. The probability of alignment by reasons of ancestry is calculated by summing over all amino acids the probability that amino acid mutated into the ones at this position in the alignment. (see slide) We sum over all amino acids because we do not know what X was. We have a Dayhoff entry for each pair of amino acids. We align sequences such that this probability is maximized. ie We maximize the probability of having evolved from a common ancestor ( a maximum likelihood alignment) against the null hypothesis of being randomly aligned. Similarity score Our score compares two events- the probability of alignment by reasons of common ancestry divided by the probability of alignement by random chance - -A- - sequence 1 - -X- - ancestor X. - -S- - sequence 2 - -A- - -S- Match by Chance P r{a}p r{s} = fa fs Pr{A and S from Ancestor X}! r{x A}P r{x S} X fx P! =!X fx MAX MSX = X fs MAX MXS 2 = fs MAS 2 = fa MSA where fa is the frequency of A in nature Compare Two Events 2 fa MAS CommonAncestry = DAS = 10log10 Chance fa fs dynamic programming maximizes this score and thus maximizes the probability (maximum likelihood) that the two sequences evolved from a common ancestor against the null hypothesis of having occurred by random chance.

14 Dayhoff Matrices 1 PAM C 17.2 S T P A G N D E Q H PAM C 11.5 S T P A G N D E Q H

15 Multiple Sequence alignments Xenopus Gallus Bos Homo Mus Rattus ATGCATGGGCCAACATGACCAGGAGTTGGTGTCGGTCCAAACAGCGTT---GGCTCTCTA ATGCATGGGCCAGCATGACCAGCAGGAGGTAGC---CAAAATAACACCAACATGCAAATG ATGCATCCGCCACCATGACCAGCAGGAGGTAGCACCCAAAACAGCACCAACGTGCAAATG ATGCATCCGCCACCATGACCAGCAGGAGGTAGCACTCAAAACAGCACCAACGTGCAAATG ATGCATCCGCCACCATGACCAGCAGGAGGTAGCACTCAAAACAGCACCAACGTGCAAATG ATGCATCCGCCACCATGACCAGCGGGAGGTAGCTCTCAAAACAGCACCAACGTGCAAATG ****** **** ********* * *** * * *** * * * each column is descended from one position in the sequence of the common ancestor can not be built by algorithms which guarantee optimal score reasonable heuristic algorithms for constructing MSAs existclustal, MAlign, T-Coffee

16 Markovian Model of Evolution mutations occur with probability independent of previous substitutions substitutions occur indepdently at different positions in the polypeptide chain a single substitution matrix represents the probability of amino acid substitution at any position Proteins do not have Markovian Behavior distant residues come together in the 3D fold and influence each other surface amino acids tolerate more variation than interior residues biological function constrains accepted substitutions - active site conservation back mutations are more probable L -> I -> L chemically similar substitutions are more probable nature is too complex to model exactly

17 things that do not fit in our evolutionary Lateral Gene Transfer Reversals (snakes) model Convergent evolution (flight evolved 5 different times)

18 Phylogenetic Trees

19 How to build trees Starting point: molecular sequences (for this discussion) Goal: a phylogenetic tree describing the evolutionary relationships of the taxa

20 How many trees are there? Number of leaves Number of unrooted trees Number of rooted trees 2 NA e e e e+76 n (2n 5)!! (2n 3)!! Unrooted Tree Rooted Tree Conclusion: We can not evaluate every tree topology when searching for the highest scoring tree.

21 Clustering Algorithms For certain types of trees, clustering algorithms will work well Ultrametric Trees Additive Trees Advantage: very fast Disadvantage: most real trees do not satisfy these conditions.

22 Ultrametric Trees d(a, Y ) = d(b, Y ) To compute an ultrametric tree, we can apply following algorithm: X Y D = D = D AX BX CX A B C Figure 8: Ultrametric tree Assume all evolution occurs at the same Choose leaf rate pair with (molecular minimum distance clock) Assume all distances are measured without error 6 Assume all leaves are equidistant from the root UPGMA (unweighted pair group method with arithmetic averages) algorithm for tree building will usually work well for these trees (not mathematically guaranteed)

23 UPGMA Find i and j that have minimum entry D[i,j] in D Create new group (ij) which has nij = ni + nj members connect i and j on the tree to a new node which corresponds to the group (ij). give the two branches connecting i to (ij) and j to (ij) each length Dij/2 compute distances of all nodes k to (ij) - as d[k,ij] = (ni/(ni+nj))*d[k,i] + (nj/(nj+nj))d[k,j] repeat while number of matrix elements is > 1 join d and c a b c d a b c,d a b a c 0 8 b 0 24 d 0 c,d 0 join a and b a,b c,d a,b 0 24 c,d 0

24 Additive Trees assume that Compute pairwise edge lengths distances and construct have tree for no allerror other nodes based on ultrametricity conditions assume that distances in matrix correspond exactly to For non-singleton nodes apply algorithm recursively branch lengths neighbor-joining Easy Case algorithm II: Additivity is guaranteed to recover the For additivity we expect that a) pairwise distances can be measured without true tree if the distance matrix is an exact reflection of error and b) distances correspond exactly to branch lengths. Unfortunately, this the tree is not the case for real biological trees. d(a, B) = L1 + L2 d(a, C) = L1 + L3 + L4 d(b, C) = L2 + L3 + L4 L3 L4 L1 L2 A B C Figure 9: Additive tree

25 neighbor joining algorithm Distance Based Methods: Least Squares (LS) does not assume clock-like evolution For each tip, compute. Choose the and for which is the smallest Join iterms and. Compute the branch length from to the new node ( ) and from to the new node ( ) as: Compute the distance between the new node ( as ) and each of the remaining tips Delete tips and from the tables and replace them by the new node ( ) which is now treated as a tip. if more than 2 nodes remain go back to step 1. Otherwise connect the 2 remaining nodes (say and ) by a branch of length.

26 5 Construct an initial tree Random tree Finding the Optimal Tree Heuristic for specific data types (Neighbor joining or UPGMA) Search for better scoring topologies using 4-, 5-, or 6-optim while evaluating the tree with a given scoring function (parsimony, distance, or likelihood) Continue to optimize under a scoring criterium until the score no longer improves 5.1 General Tree Construction Procedure For any method finding the best tree is NP-hard. Hence the following procedu is performed: 1. Construct an initial tree evaluation step Random tree Heuristic tree (easy cases, Neighbor Joining) 2. Iterative refinement: 4-/5-/6-optim evaluation steps 3. Randomization evaluation steps while improve 6!optim 5!optim while improve Initial Tree Result Tree no improvement while improve 4!optim Figure 6: General Tree Construction Procedure 5.2 Distance Based Methods Input: Distance matrix D: B C D A d(a, B) d(a, C) d(a, D) B d(b, C) d(b, D) C d(c, D)

27 4-optim There are 3 different topologies with 4 subtrees. A C A C L2 L1 L3 L4 L5 B D B A D B D C Divide the tree into 4 subtrees (A, B, C and D) Compute the quality for all possible topologies Select the best configuration Repeat for different subtrees until there is no improvement

28 5-optim and 6-optim 4-optim improves the topologies towards the leaves 5- and 6-optime improve towards the interior of the tree 4-optim 4 subtrees 3 topologies 5-optim 5 subtrees 15 topologies 6-optim 6 subtrees 105 topologies

29 Types of Tree Construction Methods Character based - Parsimony Distance based - least squares Probability based - Maximum Likelihood or Bayesian Distance Parsimony Maximum Likelihood Input pairwise distance matrix character tables (multiple sequence alignment) pairwise dist. matrix multiple sequence alignment Output branch lengths topology topology branch lengths topology

30 asy Case I: Ultrametric Tree Distance trees Input: Distance matrix D describing the measured distance between all taxa of interest A B C B C D D(A,B) D(A,C) D(A,D) D(B,C) D(B,D) D(C,D) D s come from pairwise sequence alignments L1 L3 L2 L5 L4 L6 d(a,b) = L1 + L2 d(a,c) = L1 + L3 + L4 + L5 d(a,d) = L1 + L3 + L4 + L6 d(b,c) = L2 + L3 + L4 + L5 d(b,d) = L2 + L3 + L4 + L6 d(c,d) = L5 + L6 the Ls are fit A B C D Figure 7: Least Squares Tree of Matrix D

31 What to minimize Distance Based Methods: Least Squares (LS) where is what we are trying to minimize, is the number of leaves, is a weighting factor, 1 over the Pam variance, ( ), D is the matrix of experimentally determined distances from the pairwise alignments (for example), d is a matrix of distances calculated from the fit tree.. p.1/1

32 Distance Methods consider pairwise distances as estimates of the branch length separating two species each distance infers the best unrooted tree for that pair of species in effect, we have many estimated 2-species trees and we try to find the best n-species tree implied by them individual distances are not exactly the path lengths in the full n- species tree between any two species we search for the full tree that does the best job of approximating these individual two-species trees search for the branch lengths and topologies that minimize difference between approximated branch lengths and experimental branch lengths for a given topology, it is possible to solve for the branch lengths that minimize Q using standard least squares methods

33 Character Based Methods What is a character? finite number of states discrete backbone skull opening hip socket grasping warmblooded alligator T. rex sparrow chimp human cat

34 each character fits on one branch of a phylogenetic tree changes in character happen only once species with the same character are all under the same subtree Perfect Phylogeny warmblooded backbone skull opening hip socket grasping alligator T. rex sparrow chimp human cat

35 Parsimony For molecular sequence data, each column of the MSA will be considered a character. Xenopus Gallus Bos Homo Mus Rattus ATGCATGGGCCAACATGACCAGGAGTTGGTGTCGGTCCAAACAGCGTT---GGCTCTCTA ATGCATGGGCCAGCATGACCAGCAGGAGGTAGC---CAAAATAACACCAACATGCAAATG ATGCATCCGCCACCATGACCAGCAGGAGGTAGCACCCAAAACAGCACCAACGTGCAAATG ATGCATCCGCCACCATGACCAGCAGGAGGTAGCACTCAAAACAGCACCAACGTGCAAATG ATGCATCCGCCACCATGACCAGCAGGAGGTAGCACTCAAAACAGCACCAACGTGCAAATG ATGCATCCGCCACCATGACCAGCGGGAGGTAGCTCTCAAAACAGCACCAACGTGCAAATG ****** **** ********* * *** * * *** * * *

36 Parsimony The parsimony score is the number of changes of state on the evolutionary tree. The most parsimonious tree is that which minimizes the amount of evolutionary change. The topology is given, parsimony is a method for finding the tree with the least amount of state changes. The highest scoring tree minimizes the number of changes. Occam's Razor- William of Occam ( ): Entities should not be multiplied more than necessary- the fewer assumptions an explanation of a phenomenon depends on, the better it is.

37 Parsimony Algorithm Use labels at leaves to reconstruct the possible labels at internal nodes Compare the labels at each of the two children of each node. If there is an intersection of the two sets of labels, the parent node is labeled with the result of the intersection and there is no penalty If the intersection is empty, then the node is labeled with the union of the two sets of labels and the penalty increases by +1 Continue from the leaves to the root until all nodes have been labeled

38 Parsimony Number of Changes: 3 {G,T} +1 {T} {R,T} +1 {T} {G,T} +1 A B C D E F Characters R T T T G G

39 Optimizing under parsimony For a given topology and alignment position, determine what ancestral residues require the least amount of changes. Compute this for each alignment column (character). Add the number of changes for each position together to obtain the parsimony score (length of the tree). Compute this score for many tree topologies and keep the one(s) with the lowest score.

40 Assigning ancestral states start at the root, if the set contains more than one character, pick one at random Move from the root towards the leaves. If an intersection exists between the chosen state of the parent and the child, choose it. If not, choose another character at random Many trees may exist with the same parsimony score Number of Changes: 3 {G,T} +1 Number of Changes: 3 T {T} T {R,T} +1 {T} {G,T} +1 T T T A B C D E F A B C D E F Characters R T T T G G Characters R T T T G G

41 Parsimony problems Inconsistency C D C D Backflips C A G A B A B A true tree parsimony tree there is no information on branch length, only change or no change

42 Maximum Likelihood Maximum Likelihood: general parameter estimation procedure Parameters are estimated from the data D such that the likelihood L of the data given the parameters is maximized parameters - tree topology and branch lengths input data - aligned molecular sequences goal: find the topology and branch lengths that maximize the likelihood of the data Use Dayhoff matrices to obtain the likelihood of a transition for a given period of time (PAM distance).

43 Maximum Liklihood Y X Z L1 L2 L3 L5 L4 L6 Y Z A B C D L1 L2 L5 Figure 15: Maximum Likelihood Tree For the whole tree T at position i: L6 A B C D Figure 15: Maximum Likelihood Tree L(T ) i = For Pthe r(x whole i ) tree T atpposition r L3 (X i: i Y i )P r L1 (Y i A i )P r L2 (Y i B i ) XL(T ) i = P r(x ) i Y i P r L3 (X i Y i )P r L1 (Y i A i )P r L2 (Y i B i ) X i Y i Z i P r L4 (X i Z i )P r L5 (Z i C i )P r L6 (Z i D i ) Z i P r L4 (X i Z i )P r L5 (Z i C i )P r L6 (Z i D i ) Over all positions and transformed to a log-scale: Over all positions and transformed to a log-scale: log(l(t )) = log(l(t )) = 8 Data selection N log(l(t ) i ) N i=1 log(l(t ) i ) i=1 We carefully have to select the data to build the tree on. We should verify that

44 Selecting data to Reconstruct Species Trees Sequences must be derived from a common ancestor (Homologous) Orthologs - sequences related by a speciation event Paralogs- sequences related by a gene-duplication event

45 Tree of Life

### BINF6201/8201. Molecular phylogenetic methods

BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

### Evolutionary Tree Analysis. Overview

CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

### Phylogenetic Tree Reconstruction

I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

### Algorithms in Bioinformatics

Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

### Phylogeny: building the tree of life

Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

### Phylogenetic trees 07/10/13

Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share

### What is Phylogenetics

What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

### Theory of Evolution Charles Darwin

Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

### Phylogenetic inference

Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

### Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

### Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

### Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny

### EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

### Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

### Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

### THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

### Phylogeny: traditional and Bayesian approaches

Phylogeny: traditional and Bayesian approaches 5-Feb-2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275-284, 2003 1 Phylogeny A graph depicting the ancestor-descendent

### 9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

### UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

- Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the

### C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

### Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Phylogeny Chapter 26 Taxonomy Taxonomy: ordered division of organisms into categories based on a set of characteristics used to assess similarities and differences Carolus Linnaeus developed binomial nomenclature,

### Multiple Sequence Alignment. Sequences

Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe

### POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

### "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

### Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

### Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

### Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

### Chapter 26: Phylogeny and the Tree of Life

Chapter 26: Phylogeny and the Tree of Life 1. Key Concepts Pertaining to Phylogeny 2. Determining Phylogenies 3. Evolutionary History Revealed in Genomes 1. Key Concepts Pertaining to Phylogeny PHYLOGENY

### Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them? Carolus Linneaus:Systema Naturae (1735) Swedish botanist &

### 8/23/2014. Phylogeny and the Tree of Life

Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

### Introduction to Molecular Phylogeny

Introduction to Molecular Phylogeny Starting point: a set of homologous, aligned DNA or protein sequences Result of the process: a tree describing evolutionary relationships between studied sequences =

### Phylogenetic Analysis

Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

### Phylogenetic Analysis

Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

### 1 ATGGGTCTC 2 ATGAGTCTC

We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that

### Theory of Evolution. Charles Darwin

Theory of Evolution harles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (8-6) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

### How should we organize the diversity of animal life?

How should we organize the diversity of animal life? The difference between Taxonomy Linneaus, and Cladistics Darwin What are phylogenies? How do we read them? How do we estimate them? Classification (Taxonomy)

### A (short) introduction to phylogenetics

A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

### Molecular Evolution and Phylogenetic Tree Reconstruction

1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length

### Dr. Amira A. AL-Hosary

Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

### How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

### Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

### Phylogenetic Analysis

Phylogenetic Analysis Aristotle Through classification, one might discover the essence and purpose of species. Nelson & Platnick (1981) Systematics and Biogeography Carl Linnaeus Swedish botanist (1700s)

### Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary

### Phylogeny. Properties of Trees. Properties of Trees. Trees represent the order of branching only. Phylogeny: Taxon: a unit of classification

Multiple sequence alignment global local Evolutionary tree reconstruction Pairwise sequence alignment (global and local) Substitution matrices Gene Finding Protein structure prediction N structure prediction

### PHYLOGENY & THE TREE OF LIFE

PHYLOGENY & THE TREE OF LIFE PREFACE In this powerpoint we learn how biologists distinguish and categorize the millions of species on earth. Early we looked at the process of evolution here we look at

### CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

### Lecture 6 Phylogenetic Inference

Lecture 6 Phylogenetic Inference From Darwin s notebook in 1837 Charles Darwin Willi Hennig From The Origin in 1859 Cladistics Phylogenetic inference Willi Hennig, Cladistics 1. Clade, Monophyletic group,

### Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood

### CHAPTER 26 PHYLOGENY AND THE TREE OF LIFE Connecting Classification to Phylogeny

CHAPTER 26 PHYLOGENY AND THE TREE OF LIFE Connecting Classification to Phylogeny To trace phylogeny or the evolutionary history of life, biologists use evidence from paleontology, molecular data, comparative

### Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

### Biology 211 (2) Week 1 KEY!

Biology 211 (2) Week 1 KEY Chapter 1 KEY FIGURES: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 VOCABULARY: Adaptation: a trait that increases the fitness Cells: a developed, system bound with a thin outer layer made of

### Phylogenetics. BIOL 7711 Computational Bioscience

Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

### Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

### Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

### Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

### CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang

### Phylogeny Tree Algorithms

Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some

### Chapter 26 Phylogeny and the Tree of Life

Chapter 26 Phylogeny and the Tree of Life Biologists estimate that there are about 5 to 100 million species of organisms living on Earth today. Evidence from morphological, biochemical, and gene sequence

### Chapter 16: Reconstructing and Using Phylogenies

Chapter Review 1. Use the phylogenetic tree shown at the right to complete the following. a. Explain how many clades are indicated: Three: (1) chimpanzee/human, (2) chimpanzee/ human/gorilla, and (3)chimpanzee/human/

### Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Inferring Phylogenetic Trees Distance Approaches Representing distances in rooted and unrooted trees The distance approach to phylogenies given: an n n matrix M where M ij is the distance between taxa

### Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

### Macroevolution Part I: Phylogenies

Macroevolution Part I: Phylogenies Taxonomy Classification originated with Carolus Linnaeus in the 18 th century. Based on structural (outward and inward) similarities Hierarchal scheme, the largest most

### Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

### C.DARWIN ( )

C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships

### CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

CS5263 Bioinformatics Guest Lecture Part II Phylogenetics Up to now we have focused on finding similarities, now we start focusing on differences (dissimilarities leading to distance measures). Identifying

### Chapter 19: Taxonomy, Systematics, and Phylogeny

Chapter 19: Taxonomy, Systematics, and Phylogeny AP Curriculum Alignment Chapter 19 expands on the topics of phylogenies and cladograms, which are important to Big Idea 1. In order for students to understand

### Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

### Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

### Name: Class: Date: ID: A

Class: _ Date: _ Ch 17 Practice test 1. A segment of DNA that stores genetic information is called a(n) a. amino acid. b. gene. c. protein. d. intron. 2. In which of the following processes does change

### Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

### Bioinformatics course

Bioinformatics course Phylogeny and Comparative genomics 10/23/13 1 Contents-phylogeny Introduction-biology, life classificationtaxonomy Phylogenetic-tree of life, tree representation Why study phylogeny?

### Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Phylogene)cs IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, 2016 Joyce Nzioki Phylogenetics The study of evolutionary relatedness of organisms. Derived from two Greek words:» Phle/Phylon: Tribe/Race» Genetikos:

### PHYLOGENY AND SYSTEMATICS

AP BIOLOGY EVOLUTION/HEREDITY UNIT Unit 1 Part 11 Chapter 26 Activity #15 NAME DATE PERIOD PHYLOGENY AND SYSTEMATICS PHYLOGENY Evolutionary history of species or group of related species SYSTEMATICS Study

### Organisatorische Details

Organisatorische Details Vorlesung: Di 13-14, Do 10-12 in DI 205 Übungen: Do 16:15-18:00 Laborraum Schanzenstrasse Vorwiegend Programmieren in Matlab/Octave Teilnahme freiwillig. Übungsblätter jeweils

### Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

### A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

### Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions

### Organizing Life s Diversity

17 Organizing Life s Diversity section 2 Modern Classification Classification systems have changed over time as information has increased. What You ll Learn species concepts methods to reveal phylogeny

### Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Evolutionary trees Bonobo Chimpanzee Human Neanderthal Gorilla Orangutan Describe the relationship between objects, e.g. species or genes Early evolutionary studies The evolutionary relationships between

### Classification, Phylogeny yand Evolutionary History

Classification, Phylogeny yand Evolutionary History The diversity of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize

### Phylogeny is the evolutionary history of a group of organisms. Based on the idea that organisms are related by evolution

Bio 1M: Phylogeny and the history of life 1 Phylogeny S25.1; Bioskill 11 (2ndEd S27.1; Bioskills 3) Bioskills are in the back of your book Phylogeny is the evolutionary history of a group of organisms

### Introduction to Bioinformatics Introduction to Bioinformatics

Dr. rer. nat. Gong Jing Cancer Research Center Medicine School of Shandong University 2012.11.09 1 Chapter 4 Phylogenetic Tree 2 Phylogeny Evidence from morphological ( 形态学的 ), biochemical, and gene sequence

### Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

### Phylogeny and the Tree of Life

Chapter 26 Phylogeny and the Tree of Life PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

### Phylogeny. November 7, 2017

Phylogeny November 7, 2017 Phylogenetics Phylon = tribe/race, genetikos = relative to birth Phylogenetics: study of evolutionary relationships among organisms, sequences, or anything in between Related

### Phylogeny and Systematics

Chapter 25 Phylogeny and Systematics PowerPoint Lectures for Biology, Seventh Edition Neil Campbell and Jane Reece Lectures by Chris Romero Modified by Maria Morlin racing phylogeny Phylogeny: he evolutionary

### Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

### Phylogenetic inference: from sequences to trees

W ESTFÄLISCHE W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT NIVERSITÄT WILHELMS-U ÜNSTER MM ÜNSTER VOLUTIONARY FUNCTIONAL UNCTIONAL GENOMICS ENOMICS EVOLUTIONARY Bioinformatics 1 Phylogenetic inference: from sequences

### CLASSIFICATION OF LIVING THINGS. Chapter 18

CLASSIFICATION OF LIVING THINGS Chapter 18 How many species are there? About 1.8 million species have been given scientific names Nearly 2/3 of which are insects 99% of all known animal species are smaller

### Effects of Gap Open and Gap Extension Penalties

Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

### MOLECULAR EVOLUTION AND PHYLOGENETICS SERGEI L KOSAKOVSKY POND CSE/BIMM/BENG 181 MAY 27, 2011

MOLECULAR EVOLUTION AND PHYLOGENETICS If we could observe evolution: speciation, mutation, natural selection and fixation, we might see something like this: AGTAGC GGTGAC AGTAGA CGTAGA AGTAGA A G G C AGTAGA

### Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

### Consistency Index (CI)

Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

### molecular evolution and phylogenetics

molecular evolution and phylogenetics Charlotte Darby Computational Genomics: Applied Comparative Genomics 2.13.18 https://www.thinglink.com/scene/762084640000311296 Internal node Root TIME Branch Leaves

### Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Review: Gene Families Gene Families part 2 03 327/727 Lecture 8 What is a Case study: ian globin genes Gene trees and how they differ from species trees Homology, orthology, and paralogy Last tuesday 1

### Seuqence Analysis '17--lecture 10. Trees types of trees Newick notation UPGMA Fitch Margoliash Distance vs Parsimony

Seuqence nalysis '17--lecture 10 Trees types of trees Newick notation UPGM Fitch Margoliash istance vs Parsimony Phyogenetic trees What is a phylogenetic tree? model of evolutionary relationships -- common

### Unit 9: Evolution Guided Reading Questions (80 pts total)

Name: AP Biology Biology, Campbell and Reece, 7th Edition Adapted from chapter reading guides originally created by Lynn Miriello Unit 9: Evolution Guided Reading Questions (80 pts total) Chapter 22 Descent

### CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

CSCI1950 Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary Parsimony Probabilis4c Method Input Output Sankoff s & Fitch

### Gel Electrophoresis. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 1

Gel Electrophoresis Used to measure the lengths of DNA fragments. When voltage is applied to DNA, different size fragments migrate to different distances (smaller ones travel farther). 10/28/0310/21/2003