# Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Size: px
Start display at page:

Download "Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies"

Transcription

1 Inferring Phylogenetic Trees Distance Approaches Representing distances in rooted and unrooted trees The distance approach to phylogenies given: an n n matrix M where M ij is the distance between taxa i and j problem: build an edge-weighted tree such that the distances between leaves i and j are as close as possible to M ij

2 Where do we get distances? Commonly obtained from multiple sequence alignments: In the alignment of sequence i with sequence j let f ij = #mismatches #matches + #mismatches Then this could be used as a simple measure of sequence distance: d ij = f ij Or we could use the Jukes-Cantor correction for multiple substitutions at a single position: 3 4 d ij = log(1 ) 4 3 f ij Derivation of the Jukes-Cantor model assume that all sites are independent and have identical mutation rates assumes that all possible nucleotide substitutions occur at the same rate per unit time A matrix can then represent the substitution rates: A C G T A 1 3 C 1 3 G 1 3 T 1 3 Now suppose that an ancestral sequence diverged time t years ago into two related sequences After this time, suppose that the fraction of identical sites between the two sequences is q(t), and the fraction of different sites is p(t), so that p(0) = 0 and q(0) = 1. and p(t) + q(t) = 1, t > 0.

3 We can calculate q(t + 1), the fraction of identical sites after time t+1 There are two ways of getting an identical site at time t + 1: Two aligned sites not mutating: the probability of this event is (1 3) 2 (1 6). Since q(t) sites were identical at time t, we expect (1 6)q(t) remain identical at time t + 1 One of two different aligned sites at time t mutate to become identical to the other at time t + 1: the probability of this event is 2(1 3)p(t) 2p(t) Therefore, the fraction of identical sites at time t + 1 is: This allows for estimating the derivative of q(t) with time as: Solving this differential equation subject to the initial condition, q(0) = 1, gives rise to q(t + 1) = (1 6)q(t) + 2p(t) = q(t + 1) q(t) = 2 8q(t) 1 q(t) = (1 + 3 e 8t ) 4 1 Notice that q t= =, so this model predicts a minimum 25% identity even on aligning unrelated nucleotide sequences. 4 dq(t) dt Finally to obtain Jukes-Cantor correction we note that we would expect 3t mutations during a time t for each sequence site on each sequence. Thus, the evolutionary distance between two sequences under this model is 6t However: Replacing p(t) by our measured deviation, 6t = = = = 3 ( 8t) 4 3 4q(t) 1 log( ) p(t) 1 log((4 ) log(1 p(t)) 4 3 f ij = #mismatches #matches + #mismatches gives the Jukes-Cantor correction from 7 slides back: 3 4 d ij = log(1 ) 4 3 f ij The molecular clock hypothesis Some proteins appear to evolve slowly, others rapidly. But for any given protein, the rate of molecular evolution is approximately constant in all evolutionary lineages

4 ultrametric data the molecular clock assumption is not generally true: selection pressures vary across time periods, organisms, genes within an organism, regions within a gene if it does hold, then the data is said to be ultrametric ultrametric data condition if your data is ultrametric then for any triplet of sequences, (i, j, k), the distances are either all equal, or two are equal and the remaining one is smaller. Unweighted Pair Group Method using Averages given ultrametric data, UPGMA will reconstruct the tree T that is consistent with the data. basic idea:

5 iteratively pick two taxa clusters and merge them create a new node in tree for merged cluster. distance d ij between clusters C i and C j of taxa is defined as the average distance between pairs of taxa from each cluster. 1 d ij = C i C j p Ci d pq,q C j UPGMA algorithm assign each taxon to its own cluster define one leaf for each taxon; place it at height 0 while more than two clusters determine two clusters i, j with smallest d ij define a new cluster C k = C i C j define a node k with children i and j: d ij place k at height 2 replace clusters i and j with cluster k compute distance between k and other clusters: C i d il + C j d jl d kl = C i + C j join last two clusters, i and j, by root at height d ij 2 UPGMA example

6

7 Newick format for phylogenetic trees An example phylogenetic tree This tree can be represented via an integer n followed by the adjacency list of a weighted tree with n leaves.

8 4 A->F:0.1 B->F:0.2 C->E:0.3 D->E:0.4 E->F:0.5 The tree can also be represented as Newick strings: (,,(,)); (no names) (A,B,(C,D)); (leaves are named) (A,B,(C,D)E)F; (leaves and internal nodes are named) (:0.1,:0.2,(:0.3,:0.4):0.5):0.0; (distance to parent) (A:0.1,B:0.2,(C:0.3,D:0.4):0.5); (distance and leaf names) (A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F; (distance and all node names) Julia code for UPGMA trees First we need a function to read in a distance matrix: function to read in a single integer n followed by an n x n distance matrix. function getdistmatrix(fn) # read the whole file as a string fr = open(fn) data = readstring(fr) # use split to generate a list of tokens # use filter to get rid of empty tokens # use tryparse to convert tokens to Float64 # suppose the result is in nums # strip the first and reshape the rest n = round(int,nums[1]) nums = nums[2:length(nums)] dm = reshape(nums,(n,n)) return dm end Julia code for UPGMA returns tree as Newick string

9 function upgma(dm) n = length(dm[1,:]) # first nodes are labled 1..n # and each node is placed in a cluster # heights of leaf nodes are all set to zero # newick string starts off as a list of nodes clusters = Array{Int64}[] newick = [] heights = [] nodes = [] for i in 1:n push!(clusters,[i]) newick = vcat(newick,"\$i") nodes = vcat(nodes,"\$(i-1)") heights = vcat(heights,0) end # next node to generate has label n+1 next = n+1 enter while loop, and each time merge two clusters while n > 1 # first add 2 * max to the diagonal zeros # before finding the indices # and value of the minimum distance (max,ind)= findmax(dm) dme = dm + eye(n,n)*max*2 (min,ind) = findmin(dme) # store indicies of the min as row and col row = ((ind-1)%n)+1 col = div(ind-1,n)+1 continue while loop compute weights for generating distances to new cluster ncr = length(clusters[row]) ncc = length(clusters[col]) # get distance to new cluster formula # and append new row and new column # to distance matrix newrow = ( ncr * dm[row,:] + ncc * dm[col,:] ) / (ncr + ncc) dm = vcat(dm,newrow') newcol = ( ncr * dm[:,row] + ncc * dm[:,col] ) / (ncr + ncc) dm = hcat(dm,newcol) # set the diagonal element of new # row and new col to zero dm[n+1,n+1] = 0.0 continue while loop

10 # append the new cluster to cluster list push!(clusters,vcat(clusters[row], clusters[col])) # compute height for the new cluster # and generate the Newick representation # for the new cluster h=min/2 hr = (h-heights[row])))" hc = (h-heights[col])))" newnode = "("*newick[row]*hr*", "*newick[col]*hc*")\$next" # append the new newick rep, # the new height and the new node name # to the appropriate lists newick = vcat(newick,newnode) heights = vcat(heights,h) nodes = vcat(nodes,next-1) continue while loop # make use of daleteat to remove # row and col items from each list if (row < col) deleteat!(clusters,[row,col]) deleteat!(newick,[row,col]) deleteat!(heights,[row,col]) deleteat!(nodes,[row,col]) else deleteat!(clusters,[col,row]) deleteat!(newick,[col,row]) deleteat!(heights,[col,row]) deleteat!(nodes,[col,row]) end complete the while loop end # finally remove row and col # rows and columns # from the distance matrix dm = dm[setdiff(1:n+1,[row,col]),:] dm = dm[:,setdiff(1:n+1,[row,col])] # by now n should drop by one. n = length(dm[1,:]) # increment the next node label next=next+1 return the Newick string

11 # after the while loop # there should be one string in the # newick list representing # the whole tree, return it! newick[1] In [1]: # the code is stored locally, lets try it out # note that print statements have been added # to generate the adjacency list required by Rosalind. include("code/upgma.jl") tree = upgma(getdistmatrix("data/dm1.txt")) 3->4: >3: >4: >2: >5: >4: >5: >0: >6: >5: >6: >1:8.833 Out[1]: "(((4:5.000, 3:5.000)5:2.000, 1:7.000)6:1.833, 2:8.833)7" In [2]: # write the Newick string to a file for viewing with FigTree open("data/tr1.tree", "w") do f write(f, tree) end Out[2]: 55 A rendering from FigTree

12 In [7]: tree2 = upgma(getdistmatrix("data/dm2.txt")) open("data/tr2.tree", "w") do f write(f, tree2) end 17->26: >17: >26: >16: >27: >25: >27: >12: >28: >22: >28: >19: >29: >24: >29: >2: >30: >20: >30: >9: >31: >28: >31: >13: >32: >14: >32: >0: >33: >23: >33: >6: >34: >21: >34: >11: >35: >18: >35: >1: >36: >8: >36: >3: >37: >15: >37: >4: >38: >10: >38: >5: >39: >27: >39: >7: >40: >30: >40: >29: >41: >36:58.500

13 Out[7]: >41: >34: >42: >35: >42: >33: >43: >38: >43: >26: >44: >43: >44: >39: >45: >44: >45: >31: >46: >41: >46: >37: >47: >45: >47: >32: >48: >42: >48: >40: >49: >47: >49: >46: >50: >49: >50: >48: In [8]: tree2 Out[8]: "(((((((11: , 6: )39:56.000, (18: , 17: )27:87.000)44:36.625, ((26: , 13: )28:59.000, 8: )40:63.125)45:23.232, ((23: , 20: )29:5.00 0, 14: )32: )46:13.718, (15: , 1: )33: )48:18.224, (((9: , 4: )37:58.500, (22: , 12: )35:66.000)42:63.875, (16: , 5: )3 8: )47:30.924)50:5.191, (((19: , 2: )36:65.125, (24: , 7: )34:67.625)43:79.000, ((21: , 10: )31:78.000, (25: , 3: )30:79.000)41: )49:19.865)51" Another rendering from FigTree

14 Homework Attempt the following problems from the UKZN-COMP710-bioinformatics course on the Rosalind website. In each case write Julia code to solve the problem. Do not use web based tools. ( BA7D Implement UPGMA

### Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

### Evolutionary Tree Analysis. Overview

CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

### BINF6201/8201. Molecular phylogenetic methods

BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

### EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

### Theory of Evolution Charles Darwin

Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

### Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

### Algorithms in Bioinformatics

Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

### Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

### Molecular Evolution and Phylogenetic Tree Reconstruction

1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

### Dr. Amira A. AL-Hosary

Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

### Phylogenetic Tree Reconstruction

I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

### Phylogenetic inference

Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

### Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

### Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

### Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny

### Phylogeny: traditional and Bayesian approaches

Phylogeny: traditional and Bayesian approaches 5-Feb-2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275-284, 2003 1 Phylogeny A graph depicting the ancestor-descendent

### "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

### Phylogenetic trees 07/10/13

Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share

### Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

### Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Evolutionary trees Bonobo Chimpanzee Human Neanderthal Gorilla Orangutan Describe the relationship between objects, e.g. species or genes Early evolutionary studies The evolutionary relationships between

### C.DARWIN ( )

C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships

### Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

### Phylogeny: building the tree of life

Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

### 9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

### Theory of Evolution. Charles Darwin

Theory of Evolution harles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (8-6) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

### CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang

### C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

### Phylogeny Tree Algorithms

Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some

### Reading for Lecture 13 Release v10

Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................

### Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

### CSCI1950 Z Computa4onal Methods for Biology Lecture 5

CSCI1950 Z Computa4onal Methods for Biology Lecture 5 Ben Raphael February 6, 2009 hip://cs.brown.edu/courses/csci1950 z/ Alignment vs. Distance Matrix Mouse: ACAGTGACGCCACACACGT Gorilla: CCTGCGACGTAACAAACGC

### Multiple Sequence Alignment. Sequences

Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe

### CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

CSCI1950 Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary Parsimony Probabilis4c Method Input Output Sankoff s & Fitch

### POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

### Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

ioinformatics -- lecture 9 Phylogenetic trees istance-based tree building Parsimony (,(,(,))) rees can be represented in "parenthesis notation". Each set of parentheses represents a branch-point (bifurcation),

### Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

### Phylogenetics: Building Phylogenetic Trees

1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

### THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

### STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University kubatko.2@osu.edu

### Phylogeny. November 7, 2017

Phylogeny November 7, 2017 Phylogenetics Phylon = tribe/race, genetikos = relative to birth Phylogenetics: study of evolutionary relationships among organisms, sequences, or anything in between Related

### Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

### Evolutionary Models. Evolutionary Models

Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

### Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.

### Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

### Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

### A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

### How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

### BIOINFORMATICS GABRIEL VALIENTE ALGORITHMS, BIOINFORMATICS, COMPLEXITY AND FORMAL METHODS RESEARCH GROUP, TECHNICAL UNIVERSITY OF CATALONIA

BIOINFORMATICS GABRIEL VALIENTE ALGORITHMS, BIOINFORMATICS, COMPLEXITY AND FORMAL METHODS RESEARCH GROUP, TECHNICAL UNIVERSITY OF CATALONIA 2005 2006 Gabriel Valiente (ALBCOM) Bioinformatics 2005 2006

### Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC

### What is Phylogenetics

What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

### Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

Phylogenetics: Parsimony and Likelihood COMP 571 - Spring 2016 Luay Nakhleh, Rice University The Problem Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S Assumptions

### Phylogenetics. BIOL 7711 Computational Bioscience

Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

### Lecture 6 Phylogenetic Inference

Lecture 6 Phylogenetic Inference From Darwin s notebook in 1837 Charles Darwin Willi Hennig From The Origin in 1859 Cladistics Phylogenetic inference Willi Hennig, Cladistics 1. Clade, Monophyletic group,

### Phylogenetics: Likelihood

1 Phylogenetics: Likelihood COMP 571 Luay Nakhleh, Rice University The Problem 2 Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S Assumptions 3 Characters are mutually

### MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE Manmeet Kaur 1, Navneet Kaur Bawa 2 1 M-tech research scholar (CSE Dept) ACET, Manawala,Asr 2 Associate Professor (CSE Dept) ACET, Manawala,Asr

### Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

### Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

### Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Anatomy of a tree outgroup: an early branching relative of the interest groups sister taxa: taxa derived from the same recent ancestor polytomy: >2 taxa emerge from a node Anatomy of a tree clade is group

### Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

### 8/23/2014. Phylogeny and the Tree of Life

Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

### Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Evolutionary trees Bonobo Chimpanzee Human Neanderthal Gorilla Orangutan Describe the relationship between objects, e.g. species or genes Early evolutionary studies Anatomical features were the dominant

### Seuqence Analysis '17--lecture 10. Trees types of trees Newick notation UPGMA Fitch Margoliash Distance vs Parsimony

Seuqence nalysis '17--lecture 10 Trees types of trees Newick notation UPGM Fitch Margoliash istance vs Parsimony Phyogenetic trees What is a phylogenetic tree? model of evolutionary relationships -- common

### Math 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E.

Math 239: Discrete Mathematics for the Life Sciences Spring 2008 Lecture 14 March 11 Lecturer: Lior Pachter Scribe/ Editor: Maria Angelica Cueto/ C.E. Csar 14.1 Introduction The goal of today s lecture

### Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

### Phylogenetics: Parsimony

1 Phylogenetics: Parsimony COMP 571 Luay Nakhleh, Rice University he Problem 2 Input: Multiple alignment of a set S of sequences Output: ree leaf-labeled with S Assumptions Characters are mutually independent

### Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

### Multiple Alignment. Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis

Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis gorm@cbs.dtu.dk Refresher: pairwise alignments 43.2% identity; Global alignment score: 374 10 20

### Chapter 7: Models of discrete character evolution

Chapter 7: Models of discrete character evolution pdf version R markdown to recreate analyses Biological motivation: Limblessness as a discrete trait Squamates, the clade that includes all living species

### CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

CS5263 Bioinformatics Guest Lecture Part II Phylogenetics Up to now we have focused on finding similarities, now we start focusing on differences (dissimilarities leading to distance measures). Identifying

### Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

### Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

### Macroevolution Part I: Phylogenies

Macroevolution Part I: Phylogenies Taxonomy Classification originated with Carolus Linnaeus in the 18 th century. Based on structural (outward and inward) similarities Hierarchal scheme, the largest most

### Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood

### molecular evolution and phylogenetics

molecular evolution and phylogenetics Charlotte Darby Computational Genomics: Applied Comparative Genomics 2.13.18 https://www.thinglink.com/scene/762084640000311296 Internal node Root TIME Branch Leaves

### Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Phylogene)cs IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, 2016 Joyce Nzioki Phylogenetics The study of evolutionary relatedness of organisms. Derived from two Greek words:» Phle/Phylon: Tribe/Race» Genetikos:

### Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

### Pairwise sequence alignment

Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL

### Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

### Hierarchical Clustering

Hierarchical Clustering Some slides by Serafim Batzoglou 1 From expression profiles to distances From the Raw Data matrix we compute the similarity matrix S. S ij reflects the similarity of the expression

### A (short) introduction to phylogenetics

A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

### Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary

### Sequence Analysis '17- lecture 8. Multiple sequence alignment

Sequence Analysis '17- lecture 8 Multiple sequence alignment Ex5 explanation How many random database search scores have e-values 10? (Answer: 10!) Why? e-value of x = m*p(s x), where m is the database

### What Is Conservation?

What Is Conservation? Lee A. Newberg February 22, 2005 A Central Dogma Junk DNA mutates at a background rate, but functional DNA exhibits conservation. Today s Question What is this conservation? Lee A.

### 5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT.03.239 03.10.2012 ALIGNMENT Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity. Homology:

### Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

### Phylogenetic inference: from sequences to trees

W ESTFÄLISCHE W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT NIVERSITÄT WILHELMS-U ÜNSTER MM ÜNSTER VOLUTIONARY FUNCTIONAL UNCTIONAL GENOMICS ENOMICS EVOLUTIONARY Bioinformatics 1 Phylogenetic inference: from sequences

Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment

### Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction Lesser Tenrec (Echinops telfairi) Goals: 1. Use phylogenetic experimental design theory to select optimal taxa to

### Comparative Genomics II

Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods

### 66 Bioinformatics I, WS 09-10, D. Huson, December 1, Evolutionary tree of organisms, Ernst Haeckel, 1866

66 Bioinformatics I, WS 09-10, D. Huson, December 1, 2009 5 Phylogeny Evolutionary tree of organisms, Ernst Haeckel, 1866 5.1 References J. Felsenstein, Inferring Phylogenies, Sinauer, 2004. C. Semple

### Biology 211 (2) Week 1 KEY!

Biology 211 (2) Week 1 KEY Chapter 1 KEY FIGURES: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 VOCABULARY: Adaptation: a trait that increases the fitness Cells: a developed, system bound with a thin outer layer made of

### Quantifying sequence similarity

Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

### SUPPLEMENTARY INFORMATION

Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

### NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

### Phylogenetic Tree Generation using Different Scoring Methods

International Journal of Computer Applications (975 8887) Phylogenetic Tree Generation using Different Scoring Methods Rajbir Singh Associate Prof. & Head Department of IT LLRIET, Moga Sinapreet Kaur Student

### Week 5: Distance methods, DNA and protein models

Week 5: Distance methods, DNA and protein models Genome 570 February, 2016 Week 5: Distance methods, DNA and protein models p.1/69 A tree and the expected distances it predicts E A 0.08 0.05 0.06 0.03

### Constructing Evolutionary Trees

Constructing Evolutionary Trees 0-0 HIV Evolutionary Tree SIVs (monkeys)! HIV (human)! human infection! human HIV/M human HIV/M chimpanzee SIV chimpanzee SIV human HIV/N human HIV/N chimpanzee SIV chimpanzee

### Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

### Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Sequence Bioinformatics Multiple Sequence Alignment Waqas Nasir 2010-11-12 Multiple Sequence Alignment One amino acid plays coy; a pair of homologous sequences whisper; many aligned sequences shout out