Lecture 10: Phylogeny

Size: px
Start display at page:

Download "Lecture 10: Phylogeny"

Transcription

1 Computational Genomics Prof. Ron Shamir & Prof. Roded Sharan School of Computer Science, Tel Aviv University גנומיקה חישובית פרופ' רון שמיר ופרופ' רודד שרן ביה"ס למדעי המחשב,אוניברסיטת תל אביב Lecture 10: Phylogeny 25,27/12/12

2 Phylogeny Slides: Adi Akavia Nir Friedman s slides at HUJI (based on ALGMB 98) Anders Gorm Pedersen,Technical University of Denmark Sources: Joe Felsenstein Inferring Phylogenies (2004) 1

3 Phylogeny Phylogeny: the ancestral relationship of a set of species. Represented by a phylogenetic tree leaf branch?? Internal node??? Leaves - contemporary Internal nodes - ancestral Branch length - distance between sequences 2

4 3

5 4

6 5

7 6

8 Phylogeny schools classical Classical vs. Modern: Classical - morphological characters Modern - molecular sequences. 7

9 Trees and Models rooted / unrooted topology / distance binary / general molecular clock 8

10 To root or not to root? Unrooted tree: phylogeny without direction. 9

11 Rooting an Unrooted Tree We can estimate the position of the root by introducing an outgroup: a species that is definitely most distant from all the species of interest Proposed root Falcon Aardvark Bison Chimp Dog Elephant 10

12 A Scally et al. Nature 483, (2012) doi: /nature10842

13 HOW DO WE FIGURE OUT THESE TREES? TIMES? 12

14 Dangers of Paralogs Right species distance: (1,(2,3)) Gene Duplication Sequence Homology Caused By: Orthologs - speciation, Paralogs - duplication Xenologs - horizontal (e.g., by virus) Speciation events 1A 2A 3A 3B 2B 1B 13

15 Dangers of Paralogs Right species distance: (1,(2,3)) If we only consider 1A, 2B, and 3A: ((1,3),2) Gene Duplication Speciation events 1A 2A 3A 3B 2B 1B 14

16 Type of Data Distance-based Input: matrix of distances between species Distance can be fraction of residue they disagree on, alignment score between them, Character-based Examine each character (e.g., residue) separately 15

17 Distance Based Methods 16

18 Tree based distances d(i,j)=sum of arc lengths on the path i j Given d, can we find an exactly matching tree? An approximately matching tree? 17

19 The Problem Input: matrix d of distances between species Goal: Find a tree with leaves=chars and edge distances, matching d best. Quality measure: sum of squares: SSQ( T ) i j i t ij : distance in the tree = w ( d ij ij t ij 2 ) w ij : pair weighting. Options: (1) 1 (2) 1/d ij (3) 1/d ij 2 NP-hard (Day 86). We ll describe common heuristics 18

20 UPGMA Clustering (Sokal & Michener 1958) (Unweighted pair-group method with arithmetic mean) Approach: Form a tree; closer species according to input distances should be closer in the tree Build the tree bottom up, each time merging two smaller trees All leaves are at same distance from the root 19

21 Efficiency lemma Approach: gradually form clusters: sets of species Repeatedly identify two clusters and merge them. For clusters C i C j, define the distance between them to be the average dist betw their members: 1 d ( C = i, C j ) d ( p, q) C C i p C i q C j Lemma: If C k is formed by merging C i and C j then for every other cluster C l d(c k,c l ) = ( C i *d(c i,c l ) + C j *d(c j,c l )) / ( C i + C j ) Can update distances between clusters in time prop. to the number of clusters. j 20

22 UPGMA algorithm Initialize: each node is a cluster Ci={i}. d(ci,cj)=d(i,j) set height(i) = 0 i Iterate: Find Ci,Cj with smallest d(ci,cj) Introduce a new cluster node Ck that replaces Ci and Cj //Ck represents all the leaves in clusters Ci and Cj Introduce a new tree node Aij with height(aij) = d(ci,cj)/2 // d(ci,cj) is the average dist among leaves of Ci and Cj Connect the corresponding tree nodes Ci,Cj to Aij with length(ci, Aij) = height(aij) - height(ci) length(cj,aij) = height(aij) - height(cj) For all other Cl: d(ck,cl) = ( Ci *d(ci,cl) + Cj *d(cj,cl)) / ( Ci + Cj ) //dist to any old cluster is the ave dist between its leaves and leaves in Ci, Cj Time: Naïve: O(n 3 ); Can show O(n 2 logn) (ex.) and O(n 2 ) (harder ex.) 21

23 UPGMA alg (2) Orange nodes represent the groups of nodes that they replaced, and maintain the average dist of the set from other leaf nodes/clusters A h 123 A 123 h 12 A 12 h 12 A C 12 C 45 C 123 C 45 1,

24 UPGMA Algorithm

25 UPGMA Algorithm

26 UPGMA Algorithm

27 UPGMA Algorithm

28 UPGMA Algorithm

29 Molecular Clock UPGMA implicitly assumes the tree is ultrametric: equal leaf-root distances => common uniform clock Works reasonably well for nearby species 28

30 29 Additivity A weaker additivity assumption: distances between species are the sum of distances between intermediate nodes (even if the tree is not ultrametric) a b c i j k c b k j d c a k i d b a j i d + = + = + = ), ( ), ( ), (

31 30 Consequences of Additivity Suppose input distances are additive For any three leaves Thus a b c i j k c b k j d c a k i d b a j i d + = + = + = ), ( ), ( ), ( m )), ( ), ( ), ( ( ), ( j i d k j d k i d 2 1 m k d + =

32 31 Suppose input distances are additive For any four leaves Induced tree wt: W Largest sum must appear twice Consequences of Additivity II b W l j d k i d b W k j d l i d b W l k d j i d = + + = + + = + ), ( ), ( ), ( ), ( ), ( ), ( a b c i j k l d e

33 Consequences of Additivity III If we can identify neighbor leaves, then can use pairwise distances to reconstruct the tree: Remove neighbors i, j from the leaf set Add k i Set d km = (d im + d jm d ij )/2 m But can we find neighbor leaves? k j b 32

34 Closest pairs may not be neighbors! Closest pair: kj even though they are not neighbors k j 4 4 i l 33

35 Neighbor Joining (Saitou-Nei 87) Let where D i, j ) = d ( i, j ) ( r i + r ) ( j 1 r = i d( i, k) L 2 k Corrected average distance of i from all other nodes Theorem: if D(i,j) is minimum among all pairs of leaves, then i and j are neighbors in the tree 34

36 35 Neighbor Joining Set L to contain all leaves Iteration: Choose i,j such that D(i,j) is minimum Create new node k, and set remove i,j from L, and add k Update r, D Termination: when L =2 connect the two nodes 2 )) /, ( ), ( ), ( ( ), ( 2 ) / ), ( ( ), ( 2 ) / ), ( ( ), ( j i d m j d m i d m k d r r j i d k j d r r j i d k i d i j j i + = + = + = Thm: Opt tree guaranteed if distances match a tree Does not assume a clock Time: O(n 3 ) = k i k i d L r ), ( 2 1 ) ( ), ( ), ( j r i r j i d j i D + = j m i k

37 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B C D A B C - 14 D -

38 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B C D A B C - 14 D - i u i A ( )/2=32. 5 B ( )/2=23. 5 C ( )/2=23. 5 D ( )/2=29. 5

39 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B C D A B C - 14 D - A B C D A B C D - i u i A ( )/2=32. 5 B ( )/2=23. 5 C ( )/2=23. 5 D ( )/2=29. 5 D ij - u i - u j

40 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B C D A B C - 14 D - A B C D A B C D - i u i A ( )/2=32. 5 B ( )/2=23. 5 C ( )/2=23. 5 D ( )/2=29. 5 D ij - u i - u j

41 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B C D A B C - 14 D - A B C D A B C D - i u i A ( )/2=32. 5 B ( )/2=23. 5 C ( )/2=23. 5 D ( )/2=29. 5 C D X D ij - u i - u j

42 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B C D A B C - 14 D - A B C D A B C D - D ij - u i - u j i u i A ( )/2=32. 5 B ( )/2=23. 5 C ( )/2=23. 5 D ( )/2=29. 5 C 4 10 X D v C = 0. 5 x x ( ) = 4 v D = 0. 5 x x ( ) = 10

43 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B C D X A B C - 14 D - X - C 4 10 X D

44 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B C D X A B C - 14 D - X - D XA = ( D CA + D DA - D CD )/2 = ( )/2 = 17 D XB = ( D CB + D DB - D CD )/2 = ( )/2 = 8 C 4 10 X D

45 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B C D X A B C - 14 D - X - D XA = ( D CA + D DA - D CD )/2 = ( )/2 = 17 D XB = ( D CB + D DB - D CD )/2 = ( )/2 = 8 C 4 10 X D

46 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B X A B - 8 X - D XA = ( D CA + D DA - D CD )/2 = ( )/2 = 17 D XB = ( D CB + D DB - D CD )/2 = ( )/2 = 8 C 4 10 X D

47 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B X A B - 8 X - i C 4 10 X D u i A ( 17+17)/1 = 34 B ( 17+8) /1 = 25 X ( 17+8) /1 = 25

48 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B X A B - 8 X - A B X A B X - i C 4 10 X D u i A ( 17+17)/1 = 34 B ( 17+8) /1 = 25 X ( 17+8) /1 = 25 D ij - u i - u j

49 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B X A B - 8 X - A B X A B X - i C 4 10 X D u i A ( 17+17)/1 = 34 B ( 17+8) /1 = 25 X ( 17+8) /1 = 25 D ij - u i - u j

50 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B X A B - 8 X - A B X A B X - D ij - u i - u j i C 4 10 X D u i A ( 17+17)/1 = 34 B ( 17+8) /1 = 25 X ( 17+8) /1 = 25 v A = 0. 5 x x ( 34-25) = 13 v D = 0. 5 x x ( 25-34) = 4 A 13 Y 4 B

51 Neighbor Joining Algorithm A B X Y CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B - 8 X - Y C 4 10 X D A 13 Y 4 B

52 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B X Y A B - 8 X - 4 Y D YX = ( D AX + D BX - D AB )/2 = ( )/2 = 4 C 4 10 X D A 13 Y 4 B

53 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS X Y X - 4 Y - D YX = ( D AX + D BX - D AB )/2 = ( )/2 = 4 C 4 10 X D A 13 Y 4 B

54 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS X Y X - 4 Y - D YX = ( D AX + D BX - D AB )/2 = ( )/2 = 4 C 4 10 D A 13 4 B 4

55 Neighbor Joining Algorithm CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS A B C D A B C - 14 D - D 10 C B 13 A

56 Division of Population Genetics, National Institute of Genetics & Department of Genetics, Graduate University for Advanced Studies (Sokendai) Mishima, , Japan Naruya Saitou

57 Probabilistic approaches 56

58 Likelihood of a Tree Given: n aligned sequences M= X 1,,X n A tree T, leaves labeled with X 1,,X n reconstruction t: labeling of internal nodes branch lengths Goal: Find optimal reconstruction t* : One maximizing the likelihood P(M T,t*) 57

59 Likelihood (2) We need a model for computing P(M T,t*) Assumptions: Each character is independent The branching is a Markov process: The probability that a node x has a specific label is only a function of the parent node y and the branch length t between them. The probabilities P(x y,t) are known 58

60 59 Example x 1 x 2 x 3 x 4 x 5 t 1 t 2 t 3 t 5 ) ( ), ( ), ( ), ( ), ( *),,..., ( x P t x x P t x x P t x x P t x x P t T x x P =

61 = Calculating the Likelihood Assume that the branch lengths t uv are known. P(M T, character t*) = character j reconstrcuction P( root) P( M j j reconstrcuction R edge u v R, R T, t*) p u v ( t uv ) Independence of sites Markov property independence of each branch 60

62 Additional Assumed Properties Additivity: P ( a b, t) P( b c, s) = P( a c, s + t) b Reversibility (symmetry): q P( a b, t) q P( b a, t) b = Allows one to freely move the root a Provable under broad and reasonable assumptions 61

63 Jukes-Cantor Model (J-K 69) Assumptions: Each base in sequence has equal chance of changing Changes to other 3 bases with equal probability Characteristics Each base appears with equal frequency in DNA The quantity r is the rate of change NOTE: r is not the rate of mutation, it is the rate of substitution event of a nucleotide throughout a population. During each infinitesimal time t a substitution occurs with probability 3r t 62

64 a=r 63

65 Jukes-Cantor Model P A(t) : prob of A in the character at time t Discrete case: Prob. change A B in one time unit is r P A(t+1) = (1-3r) P A(t) + r (1- P A(t) ) P A(t+1) - P A(t) = -3r P A(t) + r (1- P A(t) ) Continuous case: P A(t) = -3r P A(t) + r (1- P A(t) )= -4r P A(t) + r Solution to the differential equation: P A(t) = 1 4 rt (1 + 3e 4 ) 64

66 prob. that the nucleotide remains unchanged 1 rt over t time units: P 3 4 same = + e 4 4 Probability of specific change: Probability of change: Note: For t P change 3 4 P A B = 1 1 4rt 4 e rt P change = e

67 Other Models Kimura s 2-parameter model: A,G - purines; C,T - pyrmidines Two different rates purine-purine or pyrmidine-pyrimidine (transitions) purine-pyrmidine or pyrmidine-purine (transversions) Felsenstein 84 and Yano, Hasegawa & Kishino 85 extend the Kimura model to asymmetric base frequencies. C T A G 66

68 Charles Cantor Boston University Professor, Biomedical Engineering Professor of Pharmacology, School of Medicine Center for Advanced Biotechnology Ph.D., Biophysical Chemistry, University of California, Berkeley Dr. Cantor is currently Chief Scientific Officer at Sequenom, Inc in San Diego, California. 67

69 Efficient Likelihood Calculation (Felsenstein 73) Use dynamic programming DefineS j (a,v) = Pr(subtree rooted in v v j = a) Initialization: leafv set S j (a,v) = 1 if v is labeled by a, else S j (a,v) = 0 Recursion: Traverse the tree in postorder: for each node v with children u and w, for each state x S j ( x, v) Final Soln: = S j ( y, u) px y ( tvu ) S y y j ( y, w) p L = S j ( x, root) P( x) j x x y ( t vw ) Complexity: O(nmk 2 ) n species, m chars, k states 68

70 69

71 Finding Optimal Branch lengths Optimizing the length of a single root-child branch r-v can be done using standard optimization techniques log L m = log j= 1 x, y p( x) S" j ( x, r) p x y ( t rv ) S j ( y, v) Under the symmetry assumption, each node can be made (temporarily) the root To heuristically optimize all the branch lengths: repeatedly optimize one branch at a time No guaranteed convergence, but often works 70

72 Character Based Methods 71

73 Inferring a Phylogenetic Tree Generic problem: Optimal Phylogenetic Tree: Input: n species, set of characters, for each species, the state of each of the characters. (parameters) Goal: find a fully-labeled phylogenetic tree that best explains the data. (maximizes a target function). Assumptions: characters - mutually independent species evolve independently A: CAGGTA B: CAGACA C: CGGGTA D: TGCACT E: TGCGTA A B C D E 72

74 A Simple Example Five species, three have C and two T at a specified position. A minimal tree has one evolutionary change: C T C C T C C T T C 73

75 Inferring a Phylogenetic Tree Naive Solution - Enumeration: No. of non-isomorphic, labeled, binary, unrooted trees, containing n leaves: (2n -5)!! = Π i=3 n (2i-5) Rooted: x(2n-3) for n=20 this is 10 21! n leaves n-2 internal nodes 2n-3 edges Pf: induction on n a b Each new species adds 2 new edges a b c a c b b c a Adding 3rd species 74

76 Parsimony Goal: explain data with min. no. of evolutionary changes ( mutations, or mismatches) Parsimony: S(T) Σ j Σ {v, u} E(T) {j: v j u j } Small parsimony problem : Input: leaf sequences + a leaf-labeled tree T Goal: Find ancestral sequences implying minimum no. of changes (most parsimonious) Large parsimony problem : Input: leaf sequences Gaol: Find a most parsimonious tree (topology, leaf labeling and internal seqs.) 75

77 Algorithm for the Small Parsimony Problem (Fitch `71) Initialization: scan T in post-order, assign: leaf vertex m: S m ={state at node m} internal node m with children l, r : S m = S l S r if S l S r =φ (i) S l S r o/w (ii) Solution Construction: scan tree in preorder, choose: for the root choose x S root at node m with child k: pick same state as k if possible; o/w - pick arbitrarily cost = no. of case (i) events. time: O(n k) (k = #states s) 76

78 Fitch s Alg T T,C T >C C A,T T >Α T C C A T T T 77

79 Walter Fitch May 21, March 10, 2011 One of the most influential evolutionary biologists in the world, who established a new scientific field: molecular phylogenetics. He was a member in the National Academy of Sciences, the American Academy of Arts and Sciences and the American Philosophical Society. He co-founded and was the first president of the Society for Molecular Biology, which established the annually awarded Fitch Prize. Additionally, he was a founding editor of Molecular Biology and Evolution. Fitch was at the University of California, Irvine, until his death, preceded by three years at the University of Southern California and 24 years University of Wisconsin Madison. 78

80 Weighted Parsimony k states S 1,, S k C ij = cost of changing from state i to j Algorithm (Sankoff-Cedergren 88): Need: S k (x) best cost for the subtree rooted at x if state at x is k For leaf x, S k (x) = 0 if state of x is k o/w Scan T in postorder. At node a with children l, r S k (a) = min m (S m (l)+ C mk ) + min m (S m (r)+ C mk ) Opt=min m (S m (root)) time: O(n k 2 ) 79

81 C G C C 80

82 David Sankoff Over the past 30 years, Sankoff formulated and contributed to many of the fundamental problems in computational biology. In sequence comparison, he introduced the quadratic version of the Needleman-Wunsch algorithm, developed the first statistical test for alignments, initiated the study of the limit behavior of random sequences with Vaclav Chvatal and described the multiple alignment problem, based on minimum evolution over a phylogenetic tree. In the study of RNA secondary structure, he developed algorithms based on general energy functions for multiple loops and for simultaneous folding and alignment, and performed the earliest studies of parametric folding and automated phylogenetic filtering. Sankoff and Robert Cedergren collaborated on the first studies of the evolution of the genetic code based on trna sequences. His contributions to phylogenetics include early models for horizontal transfer, a general approach for optimizing the nodes of a given tree, a method for rapid bootstrap calculations, a generalization of the nearest neighbor interchange heuristic, various constraint, consensus and supertree problems, the computational complexity of several phylogeny problems with William Day, and a general technique for phylogenetic invariants with Vincent Ferretti. Over the last fifteen years he has focused on the evolution of genomes as the result of chromosomal rearrangement processes. Here he introduced the computational analysis of genomic edit distances, including parametric versions, the distribution of gene numbers in conserved segments in a random model with Joseph Nadeau, phylogeny based on gene order with Mathieu Blanchette and David Bryant, generalizations to include multigene families, including algorithms for analyzing genome duplication and hybridization with Nadia El-Mabrouk, and the statistical analysis of gene clusters with Dannie Durand. Sankoff is also well known in linguistics for his methods of studying grammatical variation and 81 change in speech communities, the quantification of discourse analysis and production models of bilingual speech.

83 Large Parsimony Problem Input: n x m matrix M: M ij = state of j th character of species i. M i = label of i (all labels are distinct) Goal: Construct a phylogenetic tree T with n leaves and a label for each node, s.t. 1-1 correspondence of leaves and labels cost of tree is minimum. NP-hard 82

84 Branch & Bound (Hendy-Penny 89) enumerate all unrooted trees with increasing no. of leaves Note: cost of tree with all leaves cost of subtree with some leaves pruned (and same labeling) => If cost of subtree best cost for full tree so far, then: can prune (ignore) all refinements of the subtree. enumeration & pruning can be done in O(1) time per visited subtree. 83

85 Branch swapping C D C D A B B A Each internal edge defines 4 sub-trees: Can swap two such non-adjacent sub-trees 84

86 Nearest Neighbor Interchanges handles n-labeled trees T and T are neighbors if one can get T by following operation on T: S R S R U V S U V U R V SV RU SU RV SR UV Use the neighborhood structure on the set of solutions (all trees) via hill climbing, annealing, other heuristics... 85

87 NN interchange graph 86

88 Wilson & Cann, Sci Amer 92 87

89 FIN 88

Phylogeny Jan 5, 2016

Phylogeny Jan 5, 2016 גנומיקה חישובית Computational Genomics Phylogeny Jan 5, 2016 Slides: Adi Akavia Nir Friedman s slides at HUJI (based on ALGMB 98) Anders Gorm Pedersen,Technical University of Denmark Sources: Joe Felsenstein

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Molecular Evolution and Phylogenetic Tree Reconstruction

Molecular Evolution and Phylogenetic Tree Reconstruction 1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary CSCI1950 Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary Parsimony Probabilis4c Method Input Output Sankoff s & Fitch

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Phylogenetic trees 07/10/13

Phylogenetic trees 07/10/13 Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

CSCI1950 Z Computa4onal Methods for Biology Lecture 5 CSCI1950 Z Computa4onal Methods for Biology Lecture 5 Ben Raphael February 6, 2009 hip://cs.brown.edu/courses/csci1950 z/ Alignment vs. Distance Matrix Mouse: ACAGTGACGCCACACACGT Gorilla: CCTGCGACGTAACAAACGC

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

Theory of Evolution Charles Darwin

Theory of Evolution Charles Darwin Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

Hierarchical Clustering

Hierarchical Clustering Hierarchical Clustering Some slides by Serafim Batzoglou 1 From expression profiles to distances From the Raw Data matrix we compute the similarity matrix S. S ij reflects the similarity of the expression

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Phylogeny: building the tree of life

Phylogeny: building the tree of life Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Lecture Notes: Markov chains

Lecture Notes: Markov chains Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

More information

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22 Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

More information

BMI/CS 776 Lecture 4. Colin Dewey

BMI/CS 776 Lecture 4. Colin Dewey BMI/CS 776 Lecture 4 Colin Dewey 2007.02.01 Outline Common nucleotide substitution models Directed graphical models Ancestral sequence inference Poisson process continuous Markov process X t0 X t1 X t2

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies Inferring Phylogenetic Trees Distance Approaches Representing distances in rooted and unrooted trees The distance approach to phylogenies given: an n n matrix M where M ij is the distance between taxa

More information

Chapter 3: Phylogenetics

Chapter 3: Phylogenetics Chapter 3: Phylogenetics 3. Computing Phylogeny Prof. Yechiam Yemini (YY) Computer Science epartment Columbia niversity Overview Computing trees istance-based techniques Maximal Parsimony (MP) techniques

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

Phylogeny: traditional and Bayesian approaches

Phylogeny: traditional and Bayesian approaches Phylogeny: traditional and Bayesian approaches 5-Feb-2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275-284, 2003 1 Phylogeny A graph depicting the ancestor-descendent

More information

Phylogeny Tree Algorithms

Phylogeny Tree Algorithms Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some

More information

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D 7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood

More information

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço jcarrico@fm.ul.pt Charles Darwin (1809-1882) Charles Darwin s tree of life in Notebook B, 1837-1838 Ernst Haeckel (1934-1919)

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary

More information

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26 Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and

More information

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi DNA Phylogeny Signals and Systems in Biology Kushal Shah @ EE, IIT Delhi Phylogenetics Grouping and Division of organisms Keeps changing with time Splitting, hybridization and termination Cladistics :

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny

More information

MOLECULAR EVOLUTION AND PHYLOGENETICS SERGEI L KOSAKOVSKY POND CSE/BIMM/BENG 181 MAY 27, 2011

MOLECULAR EVOLUTION AND PHYLOGENETICS SERGEI L KOSAKOVSKY POND CSE/BIMM/BENG 181 MAY 27, 2011 MOLECULAR EVOLUTION AND PHYLOGENETICS If we could observe evolution: speciation, mutation, natural selection and fixation, we might see something like this: AGTAGC GGTGAC AGTAGA CGTAGA AGTAGA A G G C AGTAGA

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

Consistency Index (CI)

Consistency Index (CI) Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

More information

Theory of Evolution. Charles Darwin

Theory of Evolution. Charles Darwin Theory of Evolution harles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (8-6) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM

AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM MENG ZHANG College of Computer Science and Technology, Jilin University, China Email: zhangmeng@jlueducn WILLIAM ARNDT AND JIJUN TANG Dept of Computer Science

More information

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). 1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

More information

Reading for Lecture 13 Release v10

Reading for Lecture 13 Release v10 Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................

More information

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0

More information

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University Phylogenetics: Parsimony and Likelihood COMP 571 - Spring 2016 Luay Nakhleh, Rice University The Problem Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S Assumptions

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

molecular evolution and phylogenetics

molecular evolution and phylogenetics molecular evolution and phylogenetics Charlotte Darby Computational Genomics: Applied Comparative Genomics 2.13.18 https://www.thinglink.com/scene/762084640000311296 Internal node Root TIME Branch Leaves

More information

The Generalized Neighbor Joining method

The Generalized Neighbor Joining method The Generalized Neighbor Joining method Ruriko Yoshida Dept. of Mathematics Duke University Joint work with Dan Levy and Lior Pachter www.math.duke.edu/ ruriko data mining 1 Challenge We would like to

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

More information

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004, Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

A Minimum Spanning Tree Framework for Inferring Phylogenies

A Minimum Spanning Tree Framework for Inferring Phylogenies A Minimum Spanning Tree Framework for Inferring Phylogenies Daniel Giannico Adkins Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-010-157

More information

Recent Advances in Phylogeny Reconstruction

Recent Advances in Phylogeny Reconstruction Recent Advances in Phylogeny Reconstruction from Gene-Order Data Bernard M.E. Moret Department of Computer Science University of New Mexico Albuquerque, NM 87131 Department Colloqium p.1/41 Collaborators

More information

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions

More information

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise Bot 421/521 PHYLOGENETIC ANALYSIS I. Origins A. Hennig 1950 (German edition) Phylogenetic Systematics 1966 B. Zimmerman (Germany, 1930 s) C. Wagner (Michigan, 1920-2000) II. Characters and character states

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

66 Bioinformatics I, WS 09-10, D. Huson, December 1, Evolutionary tree of organisms, Ernst Haeckel, 1866

66 Bioinformatics I, WS 09-10, D. Huson, December 1, Evolutionary tree of organisms, Ernst Haeckel, 1866 66 Bioinformatics I, WS 09-10, D. Huson, December 1, 2009 5 Phylogeny Evolutionary tree of organisms, Ernst Haeckel, 1866 5.1 References J. Felsenstein, Inferring Phylogenies, Sinauer, 2004. C. Semple

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

arxiv: v1 [q-bio.pe] 1 Jun 2014

arxiv: v1 [q-bio.pe] 1 Jun 2014 THE MOST PARSIMONIOUS TREE FOR RANDOM DATA MAREIKE FISCHER, MICHELLE GALLA, LINA HERBST AND MIKE STEEL arxiv:46.27v [q-bio.pe] Jun 24 Abstract. Applying a method to reconstruct a phylogenetic tree from

More information

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics CS5263 Bioinformatics Guest Lecture Part II Phylogenetics Up to now we have focused on finding similarities, now we start focusing on differences (dissimilarities leading to distance measures). Identifying

More information

Copyright 2000 N. AYDIN. All rights reserved. 1

Copyright 2000 N. AYDIN. All rights reserved. 1 Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment

More information

Properties of normal phylogenetic networks

Properties of normal phylogenetic networks Properties of normal phylogenetic networks Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu August 13, 2009 Abstract. A phylogenetic network is

More information

Phylogenetics: Likelihood

Phylogenetics: Likelihood 1 Phylogenetics: Likelihood COMP 571 Luay Nakhleh, Rice University The Problem 2 Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S Assumptions 3 Characters are mutually

More information

Phylogenetic analyses. Kirsi Kostamo

Phylogenetic analyses. Kirsi Kostamo Phylogenetic analyses Kirsi Kostamo The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species,

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

arxiv: v5 [q-bio.pe] 24 Oct 2016

arxiv: v5 [q-bio.pe] 24 Oct 2016 On the Quirks of Maximum Parsimony and Likelihood on Phylogenetic Networks Christopher Bryant a, Mareike Fischer b, Simone Linz c, Charles Semple d arxiv:1505.06898v5 [q-bio.pe] 24 Oct 2016 a Statistics

More information

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony ioinformatics -- lecture 9 Phylogenetic trees istance-based tree building Parsimony (,(,(,))) rees can be represented in "parenthesis notation". Each set of parentheses represents a branch-point (bifurcation),

More information

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

Thanks to Paul Lewis and Joe Felsenstein for the use of slides Thanks to Paul Lewis and Joe Felsenstein for the use of slides Review Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPGMA infers a tree from a distance

More information

Probabilistic modeling and molecular phylogeny

Probabilistic modeling and molecular phylogeny Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical

More information