# Chapter 3: Phylogenetics

Size: px
Start display at page:

Transcription

1 Chapter 3: Phylogenetics 3. Computing Phylogeny Prof. Yechiam Yemini (YY) Computer Science epartment Columbia niversity Overview Computing trees istance-based techniques Maximal Parsimony (MP) techniques Maximum likelihood techniques This chapter is based on urbin Chapter 7 lso recommended: The Phylogenetic Handbook, Salemi and andamme 00

2 Can e Tell volution rom Homology uplication Partial sample Speciation 3 3B B B Phylogeny How do we tell the right tree? 3 B 3 B 3 Phylogeny: Computing Trees INPT: Y GGGCT TGCCC TGCTT TGCC TGCGCTT Phylogeny OTPT: Y

3 Brute orce pproach Brute orce numerate all trees Compute some measure of evolutionary likelihood Select best tree How many rooted trees are there with n leaves? n= leaves => tree n=3 leaves =>attach 3 rd leaf to 3 edges => 3 trees Let T(n)= # rooted trees with n leaves; (n) = # edges T()=, ()=3; T(3)=3, (3)= ddition of a leaf creates two new edges => (n)=(n-)+=> (n)=n- T(n)=T(n-)*(n-)=T(n-)*(n-3) => T(n)= *3** (n-3) or n=0 leaves ~0 pproaches istance based Tree should best model evolutionary distance metric among taxa Character-based [Maximal Parsimony (MP)] Tree should minimize changes Maximum likelihood (ML) Tree should maximize likelihood of changes INPT: Y GGGCT TGCCC TGCTT TGCC TGCGCTT Phylogeny OTPT: Y 6 3

4 istance Based Techniques 7 I. istance Based Techniques Key Idea: Compute evolutionary distance metric among S={,,,,Y} Compute a tree on S that best fits the distances ormally: Given: nxn distance matrix Compute: weighted tree T on n leaves that best fits How to establish evolutionary distance measures? istance ~ changes Next chapter: evaluating distance using Markovian evolution models 8

5 Is There Tree That Perfectly its? Not every distance metric can be modeled by a tree How can we tell distance metrics that model a tree?...? 9 The our-point condition distance matrix corresponding to a tree is called additive THORM: is additive if and only if: or every four indices i,j,k,l, the maximum and median of the three pairwise sums are identical: ij + kl < ik + jl = il + jk Suggests how to connect points into a tree to fit i l ik il ij kl < = jl j k jk... 0

6 How o e Handle Non-dditive? dditive metrics are very useful Provide perfect fit with a tree model; tree is easily computed from But evolutionary distance metrics are often non-additive How do we handle non-additive metric? itch & Margoliash: find a tree T to minimize least-square fit: (T) = i,j (d ij (T) ij ) This problem is NP-Hard need heuristics itch & Margoliash (968) exhaustive search Closest-Pair Clustering Idea: use to guide closest-pair clustering xtend to clusters by PGM/PGM averaging 6

7 PGM lgorithm Initialization Initialize n clusters C i ={S i } Initialize T with leaves for each cluster Ci Iteration ind C i, C j with smallest distance ij Create new cluster C k = C i C j dd a new node to T, for C k, and connect it to C i,c j If all nodes are connected to a tree exit; otherwise, assign ki = kj = ij / and compute the distances kl to all clusters C l il C i + jl C j kl = C i + C j Repeat the iteration 3 PGM: Molecular Clock Property niform distance from root to leaves istance to root ~ evolutionary clock Species are assumed to take identical time to evolve

8 Notes Complexity is is O(n ) veraging redistributes distances to overcome non-additivity Clustering can lead to substantial errors and is very sensitive This limits the applications of clustering How do we overcome the sensitivity of PGM? Real tree PGM Improvements Through Bootstrapping Bootstrapping: statistical technique to increase robustness Scenario: given a sample S(ω) and a result R(S) computed from S Bootstrapping: o Resample S, to get S (ω); o valuate R(S (ω)); o valuate match of R(S) with the values R(S (ω)) In here S= columns of sequences of size n; R(S)=tree S (ω)=sample n random columns of S with possible repetitions Compute phylogenetic tree R(S (ω)) se {R(S (ω))} to compute consensus/likelihood of branches of R(S) 6 8

9 Bootstrapping xample 7 Closest Pair vs. volutionary-neighbors dditivity: ij + kl < ik + jl = il + jk i l ik il ij kl < = jl j k PGM overcomes non-additivity by averaging distances But, the closest pair may not be evolutionary neighbors The evolutionary tree distances may diverge greatly; averaging distorts neighborhood jk 8 9

10 Neighbor Joining [Saitou & Nei 87; Studier & Keppler 88] Neighbor joining heuristics: join closest clusters that are far from the rest efine: R k =Σ i k ik the divergence of k Cluster nodes k,m that minimize km = km -(R k +R m )/(n-) [efine r k =R k /(n-) and consider km -r k -r m ] km r k r m r Neighbor Joining lgorithm Initialization:(same as PGM) Initialize n clusters C i ={S i } Iteration:. Compute r k =Σ i k ik /(n-) for each cluster k. ind (k,m) minimizing km -r k -r m ; 3. efine a new node i and set is = 0.( ks + ms - km ) for all s. Join node i to k and m with edges of respective lengths: ki =0.( km +r k -r m ) mi =0.( km +r m -r k ). Repeat until all nodes are connected 0 0

11 xample: Step --Compute ivergences r B C Σ B C Step B C Step : compute r k =Σ i k ik /(n-) Sum the columns then divide by 6-= r rom The Phylogenetic Handbook, Salemi and andamme 00 Step : find neighboring pair Step : evaluate neighboring distance matrix N km = km -(r k +r m ) [Subtract the r column & row] ind (k,m) minimizing N km Create a new node and attach to k,m B C B C PGM would connect the closest pair Step B C B C B C Min{ Min{N km km }

12 Step 3,: Join Neighbors pdate istances Step 3: Compute the branch lengths,b =0.( B +r -r B )=0.(-3)= B =0.( B +r B -r )=0.(+3)= Step : pdate distance matrix = 0.( + B - B ) C = 0.(+7-)=3; =0.(7+0-)=6 =0.(6+9-)=; =0.(8+-)=7 B C B C Step C C Step 3 B C 3 Repeat Steps //3/ r C Step C C 3 Step C Step : compute r k =Σ i k ik /(n-) Step : compute neighboring pair Min{N Y = Y -r -r Y } => (,C) or (,) Step 3: join neighbors; compute branch length =0.( C +r -r C )=; C = Step : re-compute distances = 0.( + C - C ) Step 3 B C Step

13 Repeat Step Step : compute r k =Σ i k ik /(n-) Step : compute neighboring pair Min{N Y = Y -r -r Y } => (,) Step 3: join neighbors; compute branch length =0.( +r -r )=3; = Step : re-compute distances = 0.( + - ) r Step Step 3 C 3 Step B Repeat Step Step : compute r k =Σ i k ik /(n-) Step : compute neighboring pair Min{N Y = Y -r -r Y } => (,) Step 3: join neighbors; compute branch length Z =0.( +r -r )=; Z = Step : re-compute distances Z = 0.( + - ) r 8 8 Step Step 3 C Z 3 Step Z Z B 6 3

14 7 Complete B C 3 Z Z Z B C 3 Z 8 Notes On Neighbors Joining Complexity is O(n ) oes not depend on molecular clock assumption Heavily used in practice [e.g., Clustal ] But can be sensitive to non-additivity

15 Maximal Parsimony (character based phylogeny) 9 Key Idea: Minimize Changes Reconsider the problem: ind best tree to explain evolution of sequences Motivation: focus on evolution of positions istance loses information on evolutionary changes TTCTG TTCT GTTGCT TTGCT Key idea: find tree with minimal changes to explain data G GG G C= G G GG C=3 G G GG G 30

16 More Generally Taxa are considered as sets of attributes: characters character = N position, genes order, morphological feature character state = a value assumed by a character Characters evolve through state changes volutionary tree represents changes in character states MP-tree seeks to minimize state changes 3 MP xample Characters Binary states Taxa state change 3 6

17 MP xample 7 state changes 6 state changes 33 xample: volution of Gene Taxa Character = position State = nucleotide 3 7

18 xample: volution of Gene Character = position State = nucleotide Taxa 3 xample MP rearrangements of chromosome Pevzner 003 Genome Research 36 8

19 The Max Parsimony (MP) Problem Big MP: Input: set of n aligned sequences of length k Output: phylogenetic tree T such that o T has n leaves labeled with the input sequences (taxa) o T has internal nodes labeled with sequences of length k (states) o T minimizes the Hamming distance among its node labels H=3 G This is a Steiner Tree type problem Can be shown to be NP hard [Gusfield, oulds] But often the number of sequences considered is small G GG G Small MP Input: a tree with sequence-labeled leaves Output: labeling of internal nodes states which max parsimony 37 MP Basics Consider {T,TT, GTT, GT, GGT} irst column admits arrangements & identifies likely mutation T G TT G 3 G GTT GT G G 3 G GGT MP ( mutation) mutations Second column does not provide clues on likely mutations T G T T 3 T 3 T T T T G T TT GTT GT GGT Non-informative position (need at least characters) 38 9

20 MP Basics G 3 MP G G T T 3 T MP T TT GTT GT GGT Merge MP trees of columns & 3: T TT GTT TT GTT GTT 3 GGT GT T GT T TT GTT 3 TT GGT GTT Two MP trees 39 ardvark: CGGT Bison: CGC Chimp: CGGGT og: TGCCT lephant: TGCGT xample (N. riedman) TGGGT CGGT CGGGT TGCGT ardvark Bison Chimp og lephant CGGT CGC CGGGT TGCCT TGCGT 0 0

21 xample:volution of Protein omains Total Cost: 3 C. Chothia et al, volution of the Protein Repertoire, Science OL 300, 3 June 003 T. Przytycka et al, Graph Theoretical Insights., RCOMB 00, LNBI 300, pp. 3-3, 00 Single Site MP: The itch lgorithm Problem: Input: a tree T with labeled leaves Output: labels of internal nodes of MP tree + cost C Step : ssign to each node x a set of labels S(x) such that If x is a leaf then S(x)= label of x, C 0 If x has children y,z S(x) = if S(y) S(z) 0 then S(y) S(z) else S(y) S(z), C C+ Traverse T in postorder (leaves to root) Step : ssign to a node x a character value v(x) Traverse T in preorder (root to leaves) If y is the parent of x and v(y)εs(x) then v(x) v(y) else v(x)= any label from S(x)

22 Step : Computing Candidate Labels C= {} C= {, G} C= {} C= {, G} C= {, G} C=0 G G G G {} {G} {} {G} {} {G} {} {G} G G {} {G} {} {G} 3 Step : Selecting MP Labels {} {, G} {} {, G} {} C= {, G} {, G} {, G} {, G} G G {} {G} {} {G} G G {} {G} {} {G} G G {} {G} {} {G}

23 Notes lgorithm is fast O(nk) n= # nodes, k=#character values It selects a particular MP tree (there may be others) {, G} C= G G {, G} {} G G G G {} {G} {} {G} G G G G G G Run separately for each character then merge results May be generalized for weighted parsimony: Sankoff s generalization: different costs of different changes Heuristic MP lgorithms se Steiner-tree heuristic algorithms Branch-and-bound search Represent search space as tree (nodes at k-th level represent phylogenetic trees for first k species) ind best scoring search-node and use it as bound Branch to children of this search-node Nearest neighbor interchange (NNI) switch subtrees Simulated annealing. 6 3

24 Maximal Likelihood pproach 7 (III) Max Likelihood pproaches (Based on N. riedman slides) Key idea: compute maximum likelihood tree Many models of changes (trees) can yield observed data Compute tree that maximizes the likelihood Problem : given T, compute probability P(S T) S={, n } are the observed sequences Need a probability model of changes generated by T: o Background probabilities: q(a) o Mutation probabilities: P(a b,t) x Problem : compute T that maximizes P(S T) This is the complex part x t t t t 3 x x x 3 8

25 Tree Likelihood Computation efine P(L k a)= prob. of subtree below node k given x k =a Init: for all leaves k; P(L k a)= if x k =a ; 0 otherwise Iteration: if k is node with children i and j, then " P(L k a) = P(b a,t i )L(i b)p(c a,t j )L( j c) b,c Termination:Likelihood is P( x, K, x3 T, t) =! P( Lroot a) q( a) a x t x t t t 3 x x x 3 9 Maximum Likelihood (ML) Score each tree by P (, K, n T, t) =! P( x[ m], K, xn[ m] T, t) m ssumption of independent positions ind the highest scoring tree xhaustive search Sampling methods (Metropolis) pproximation (consider only a subset of trees) 0

26 Comparison Tony eisstein, Neighbor-joining Maximum parsimony Maximum likelihood ses only pairwise distances ses only shared derived characters ses all data Minimizes distance between nearest neighbors Minimizes total distance Maximizes tree likelihood given specific parameter values ery fast asily trapped in local optima Slow ssumptions fail when evolution is rapid ery slow Highly dependent on assumed evolution model Good for generating tentative tree, or choosing among multiple trees Best option when tractable (<30 taxa) Good for very small data sets and for testing trees built using other methods Conclusions Computing phylogeny is an area of active research Hundreds of algorithms. New models: phylogenetic networks (generalize trees) New challenges: whole genome phylogeny ccount for multi-site changes: replication, transpositions New algorithms pplications pidemiology Cancer diagnosis. 6

### Evolutionary Tree Analysis. Overview

CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

### Phylogenetic Tree Reconstruction

I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

### 9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

### Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

### Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny

### Theory of Evolution Charles Darwin

Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

### Theory of Evolution. Charles Darwin

Theory of Evolution harles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (8-6) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

### CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang

### Dr. Amira A. AL-Hosary

Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

### Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

### Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

ioinformatics -- lecture 9 Phylogenetic trees istance-based tree building Parsimony (,(,(,))) rees can be represented in "parenthesis notation". Each set of parentheses represents a branch-point (bifurcation),

### Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

### CSCI1950 Z Computa4onal Methods for Biology Lecture 5

CSCI1950 Z Computa4onal Methods for Biology Lecture 5 Ben Raphael February 6, 2009 hip://cs.brown.edu/courses/csci1950 z/ Alignment vs. Distance Matrix Mouse: ACAGTGACGCCACACACGT Gorilla: CCTGCGACGTAACAAACGC

### Phylogeny Tree Algorithms

Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some

### Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

### CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

CSCI1950 Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary Parsimony Probabilis4c Method Input Output Sankoff s & Fitch

### Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood

### Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

### Algorithms in Bioinformatics

Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

### EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

### Phylogeny: traditional and Bayesian approaches

Phylogeny: traditional and Bayesian approaches 5-Feb-2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275-284, 2003 1 Phylogeny A graph depicting the ancestor-descendent

### Molecular Evolution and Phylogenetic Tree Reconstruction

1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length

### Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

### A (short) introduction to phylogenetics

A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

### Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Inferring Phylogenetic Trees Distance Approaches Representing distances in rooted and unrooted trees The distance approach to phylogenies given: an n n matrix M where M ij is the distance between taxa

### Consistency Index (CI)

Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

### Phylogeny: building the tree of life

Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

### Phylogenetic trees 07/10/13

Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share

### Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

### BINF6201/8201. Molecular phylogenetic methods

BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

### Phylogenetics: Parsimony

1 Phylogenetics: Parsimony COMP 571 Luay Nakhleh, Rice University he Problem 2 Input: Multiple alignment of a set S of sequences Output: ree leaf-labeled with S Assumptions Characters are mutually independent

### NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

### Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

Phylogenetics: Parsimony and Likelihood COMP 571 - Spring 2016 Luay Nakhleh, Rice University The Problem Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S Assumptions

### Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

### Is the equal branch length model a parsimony model?

Table 1: n approximation of the probability of data patterns on the tree shown in figure?? made by dropping terms that do not have the minimal exponent for p. Terms that were dropped are shown in red;

### InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

### Phylogenetic inference

Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

### Walks in Phylogenetic Treespace

Walks in Phylogenetic Treespace lan Joseph aceres Samantha aley John ejesus Michael Hintze iquan Moore Katherine St. John bstract We prove that the spaces of unrooted phylogenetic trees are Hamiltonian

### POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

### Phylogeny Jan 5, 2016

גנומיקה חישובית Computational Genomics Phylogeny Jan 5, 2016 Slides: Adi Akavia Nir Friedman s slides at HUJI (based on ALGMB 98) Anders Gorm Pedersen,Technical University of Denmark Sources: Joe Felsenstein

### Molecular Evolution & Phylogenetics

Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic

### Phylogeny. November 7, 2017

Phylogeny November 7, 2017 Phylogenetics Phylon = tribe/race, genetikos = relative to birth Phylogenetics: study of evolutionary relationships among organisms, sequences, or anything in between Related

### Effects of Gap Open and Gap Extension Penalties

Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

### Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#\$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

### Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

### Phylogenetics. BIOL 7711 Computational Bioscience

Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

### Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

### Building Phylogenetic Trees UPGMA & NJ

uilding Phylogenetic Trees UPGM & NJ UPGM UPGM Unweighted Pair-Group Method with rithmetic mean Unweighted = all pairwise distances contribute equally. Pair-Group = groups are combined in pairs. rithmetic

### Phylogenetic analyses. Kirsi Kostamo

Phylogenetic analyses Kirsi Kostamo The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species,

### Lecture 10: Phylogeny

Computational Genomics Prof. Ron Shamir & Prof. Roded Sharan School of Computer Science, Tel Aviv University גנומיקה חישובית פרופ' רון שמיר ופרופ' רודד שרן ביה"ס למדעי המחשב,אוניברסיטת תל אביב Lecture

### Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

### Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary

### Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

### TheDisk-Covering MethodforTree Reconstruction

TheDisk-Covering MethodforTree Reconstruction Daniel Huson PACM, Princeton University Bonn, 1998 1 Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document

### BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged

### THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

### A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

### Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT

### Sequential Monte Carlo Algorithms

ayesian Phylogenetic Inference using Sequential Monte arlo lgorithms lexandre ouchard-ôté *, Sriram Sankararaman *, and Michael I. Jordan *, * omputer Science ivision, University of alifornia erkeley epartment

### Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

### Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

### Phylogeny. Properties of Trees. Properties of Trees. Trees represent the order of branching only. Phylogeny: Taxon: a unit of classification

Multiple sequence alignment global local Evolutionary tree reconstruction Pairwise sequence alignment (global and local) Substitution matrices Gene Finding Protein structure prediction N structure prediction

### 17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:

17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.

### 66 Bioinformatics I, WS 09-10, D. Huson, December 1, Evolutionary tree of organisms, Ernst Haeckel, 1866

66 Bioinformatics I, WS 09-10, D. Huson, December 1, 2009 5 Phylogeny Evolutionary tree of organisms, Ernst Haeckel, 1866 5.1 References J. Felsenstein, Inferring Phylogenies, Sinauer, 2004. C. Semple

### Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

### Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.

### Phylogenetics: Likelihood

1 Phylogenetics: Likelihood COMP 571 Luay Nakhleh, Rice University The Problem 2 Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S Assumptions 3 Characters are mutually

### Phylogenetic inference: from sequences to trees

W ESTFÄLISCHE W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT NIVERSITÄT WILHELMS-U ÜNSTER MM ÜNSTER VOLUTIONARY FUNCTIONAL UNCTIONAL GENOMICS ENOMICS EVOLUTIONARY Bioinformatics 1 Phylogenetic inference: from sequences

### Finding the best tree by heuristic search

Chapter 4 Finding the best tree by heuristic search If we cannot find the best trees by examining all possible trees, we could imagine searching in the space of possible trees. In this chapter we will

### (Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

Bot 421/521 PHYLOGENETIC ANALYSIS I. Origins A. Hennig 1950 (German edition) Phylogenetic Systematics 1966 B. Zimmerman (Germany, 1930 s) C. Wagner (Michigan, 1920-2000) II. Characters and character states

### "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

### Properties of normal phylogenetic networks

Properties of normal phylogenetic networks Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu August 13, 2009 Abstract. A phylogenetic network is

### Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

### Phylogenetics: Building Phylogenetic Trees

1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

### Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

### DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

DNA Phylogeny Signals and Systems in Biology Kushal Shah @ EE, IIT Delhi Phylogenetics Grouping and Division of organisms Keeps changing with time Splitting, hybridization and termination Cladistics :

### What is Phylogenetics

What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

### Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

### Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

### Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

### Isolating - A New Resampling Method for Gene Order Data

Isolating - A New Resampling Method for Gene Order Data Jian Shi, William Arndt, Fei Hu and Jijun Tang Abstract The purpose of using resampling methods on phylogenetic data is to estimate the confidence

### Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Some slides adapted from Jacques van Helden Small vs. large parsimony A quick review Fitch s algorithm:

Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment

### Seuqence Analysis '17--lecture 10. Trees types of trees Newick notation UPGMA Fitch Margoliash Distance vs Parsimony

Seuqence nalysis '17--lecture 10 Trees types of trees Newick notation UPGM Fitch Margoliash istance vs Parsimony Phyogenetic trees What is a phylogenetic tree? model of evolutionary relationships -- common

### Reconstruction of certain phylogenetic networks from their tree-average distances

Reconstruction of certain phylogenetic networks from their tree-average distances Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu October 10,

### Copyright notice. Molecular Phylogeny and Evolution. Goals of the lecture. Introduction. Introduction. December 15, 2008

opyright notice Molecular Phylogeny and volution ecember 5, 008 ioinformatics J. Pevsner pevsner@kennedykrieger.org Many of the images in this powerpoint presentation are from ioinformatics and Functional

### Organisatorische Details

Organisatorische Details Vorlesung: Di 13-14, Do 10-12 in DI 205 Übungen: Do 16:15-18:00 Laborraum Schanzenstrasse Vorwiegend Programmieren in Matlab/Octave Teilnahme freiwillig. Übungsblätter jeweils

### Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and

### Who Has Heard of This Problem? Courtesy: Jeremy Kun

P vs. NP 02-201 Who Has Heard of This Problem? Courtesy: Jeremy Kun Runtime Analysis Last time, we saw that there is no solution to the Halting Problem. Halting Problem: Determine if a program will halt.

### "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

### Perfect Phylogenetic Networks with Recombination Λ

Perfect Phylogenetic Networks with Recombination Λ Lusheng Wang Dept. of Computer Sci. City Univ. of Hong Kong 83 Tat Chee Avenue Hong Kong lwang@cs.cityu.edu.hk Kaizhong Zhang Dept. of Computer Sci. Univ.

### METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

### Introduction to Bioinformatics Introduction to Bioinformatics

Dr. rer. nat. Gong Jing Cancer Research Center Medicine School of Shandong University 2012.11.09 1 Chapter 4 Phylogenetic Tree 2 Phylogeny Evidence from morphological ( 形态学的 ), biochemical, and gene sequence

### Reconstructing Trees from Subtree Weights

Reconstructing Trees from Subtree Weights Lior Pachter David E Speyer October 7, 2003 Abstract The tree-metric theorem provides a necessary and sufficient condition for a dissimilarity matrix to be a tree

### THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

COMMUNICATIONS IN INFORMATION AND SYSTEMS c 2009 International Press Vol. 9, No. 4, pp. 295-302, 2009 001 THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT DAN GUSFIELD AND YUFENG WU Abstract.

### Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes

Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes David DeCaprio, Ying Li, Hung Nguyen (sequenced Ascomycetes genomes courtesy of the Broad Institute) Phylogenomics Combining whole genome

### Inferring Molecular Phylogeny

r. Walter Salzburger The tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 2 1. Molecular Markers Inferring Molecular Phylogeny 3 Immunological comparisons! Nuttall