Lecture 7 Phylogenetic Analysis

Similar documents
CSC Design and Analysis of Algorithms. Example: Change-Making Problem

Paths. Connectivity. Euler and Hamilton Paths. Planar graphs.

Why the Junction Tree Algorithm? The Junction Tree Algorithm. Clique Potential Representation. Overview. Chris Williams 1.

An undirected graph G = (V, E) V a set of vertices E a set of unordered edges (v,w) where v, w in V

CSE 373: More on graphs; DFS and BFS. Michael Lee Wednesday, Feb 14, 2018

V={A,B,C,D,E} E={ (A,D),(A,E),(B,D), (B,E),(C,D),(C,E)}

V={A,B,C,D,E} E={ (A,D),(A,E),(B,D), (B,E),(C,D),(C,E)}

CSE 373. Graphs 1: Concepts, Depth/Breadth-First Search reading: Weiss Ch. 9. slides created by Marty Stepp

Math 61 : Discrete Structures Final Exam Instructor: Ciprian Manolescu. You have 180 minutes.

COMP108 Algorithmic Foundations

Module graph.py. 1 Introduction. 2 Graph basics. 3 Module graph.py. 3.1 Objects. CS 231 Naomi Nishimura

12. Traffic engineering

CS61B Lecture #33. Administrivia: Autograder will run this evening. Today s Readings: Graph Structures: DSIJ, Chapter 12

Computational Biology, Phylogenetic Trees. Consensus methods

CS 461, Lecture 17. Today s Outline. Example Run

Exam 1 Solution. CS 542 Advanced Data Structures and Algorithms 2/14/2013

12/3/12. Outline. Part 10. Graphs. Circuits. Euler paths/circuits. Euler s bridge problem (Bridges of Konigsberg Problem)

Constructive Geometric Constraint Solving

5/9/13. Part 10. Graphs. Outline. Circuits. Introduction Terminology Implementing Graphs

b. How many ternary words of length 23 with eight 0 s, nine 1 s and six 2 s?

CSE 373: AVL trees. Warmup: Warmup. Interlude: Exploring the balance invariant. AVL Trees: Invariants. AVL tree invariants review

(2) If we multiplied a row of B by λ, then the value is also multiplied by λ(here lambda could be 0). namely

ECE COMBINATIONAL BUILDING BLOCKS - INVEST 13 DECODERS AND ENCODERS

Algorithmic and NP-Completeness Aspects of a Total Lict Domination Number of a Graph

Similarity Search. The Binary Branch Distance. Nikolaus Augsten.

Garnir Polynomial and their Properties

, each of which is a tree, and whose roots r 1. , respectively, are children of r. Data Structures & File Management

Graph Isomorphism. Graphs - II. Cayley s Formula. Planar Graphs. Outline. Is K 5 planar? The number of labeled trees on n nodes is n n-2

Problem solving by search

Outline. Circuits. Euler paths/circuits 4/25/12. Part 10. Graphs. Euler s bridge problem (Bridges of Konigsberg Problem)

Section 10.4 Connectivity (up to paths and isomorphism, not including)

Outline. 1 Introduction. 2 Min-Cost Spanning Trees. 4 Example

Present state Next state Q + M N

A Simple Code Generator. Code generation Algorithm. Register and Address Descriptors. Example 3/31/2008. Code Generation

The University of Sydney MATH2969/2069. Graph Theory Tutorial 5 (Week 12) Solutions 2008

Cycles and Simple Cycles. Paths and Simple Paths. Trees. Problem: There is No Completely Standard Terminology!

1 Introduction to Modulo 7 Arithmetic

# 1 ' 10 ' 100. Decimal point = 4 hundred. = 6 tens (or sixty) = 5 ones (or five) = 2 tenths. = 7 hundredths.

The Plan. Honey, I Shrunk the Data. Why Compress. Data Compression Concepts. Braille Example. Braille. x y xˆ

Graphs. CSC 1300 Discrete Structures Villanova University. Villanova CSC Dr Papalaskari

CS200: Graphs. Graphs. Directed Graphs. Graphs/Networks Around Us. What can this represent? Sometimes we want to represent directionality:

CS 241 Analysis of Algorithms

QUESTIONS BEGIN HERE!

Outline. Computer Science 331. Computation of Min-Cost Spanning Trees. Costs of Spanning Trees in Weighted Graphs

Announcements. Not graphs. These are Graphs. Applications of Graphs. Graph Definitions. Graphs & Graph Algorithms. A6 released today: Risk

Planar Upward Drawings

0.1. Exercise 1: the distances between four points in a graph

Solutions for HW11. Exercise 34. (a) Use the recurrence relation t(g) = t(g e) + t(g/e) to count the number of spanning trees of v 1

Weighted graphs -- reminder. Data Structures LECTURE 15. Shortest paths algorithms. Example: weighted graph. Two basic properties of shortest paths

Compression. Compression. Compression. This part of the course... Ifi, UiO Norsk Regnesentral Vårsemester 2005 Wolfgang Leister

Register Allocation. Register Allocation. Principle Phases. Principle Phases. Example: Build. Spills 11/14/2012

Graphs. Graphs. Graphs: Basic Terminology. Directed Graphs. Dr Papalaskari 1

QUESTIONS BEGIN HERE!

Numbering Boundary Nodes

COMPLEXITY OF COUNTING PLANAR TILINGS BY TWO BARS

EE1000 Project 4 Digital Volt Meter

(a) v 1. v a. v i. v s. (b)

S i m p l i f y i n g A l g e b r a SIMPLIFYING ALGEBRA.

GREEDY TECHNIQUE. Greedy method vs. Dynamic programming method:

Section 3: Antiderivatives of Formulas

IEEE TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. TK, NO. TK, MONTHTK YEARTK 1. Hamiltonian Walks of Phylogenetic Treespaces

arxiv: v1 [cs.ds] 20 Feb 2008

CS September 2018

Announcements. These are Graphs. This is not a Graph. Graph Definitions. Applications of Graphs. Graphs & Graph Algorithms

Greedy Algorithms, Activity Selection, Minimum Spanning Trees Scribes: Logan Short (2015), Virginia Date: May 18, 2016

1. Determine whether or not the following binary relations are equivalence relations. Be sure to justify your answers.

Graph Contraction and Connectivity

More Foundations. Undirected Graphs. Degree. A Theorem. Graphs, Products, & Relations

LEO VAN IERSEL TU DELFT

learning objectives learn what graphs are in mathematical terms learn how to represent graphs in computers learn about typical graph algorithms

SOLVED EXAMPLES. be the foci of an ellipse with eccentricity e. For any point P on the ellipse, prove that. tan

MAT3707. Tutorial letter 201/1/2017 DISCRETE MATHEMATICS: COMBINATORICS. Semester 1. Department of Mathematical Sciences MAT3707/201/1/2017

DUET WITH DIAMONDS COLOR SHIFTING BRACELET By Leslie Rogalski

Using the Printable Sticker Function. Using the Edit Screen. Computer. Tablet. ScanNCutCanvas

Register Allocation. How to assign variables to finitely many registers? What to do when it can t be done? How to do so efficiently?

UNCORRECTED SAMPLE PAGES 4-1. Naming fractions KEY IDEAS. 1 Each shape represents ONE whole. a i ii. b i ii

5/7/13. Part 10. Graphs. Theorem Theorem Graphs Describing Precedence. Outline. Theorem 10-1: The Handshaking Theorem

Instructions for Section 1

Last time: introduced our first computational model the DFA.

ECE 407 Computer Aided Design for Electronic Systems. Circuit Modeling and Basic Graph Concepts/Algorithms. Instructor: Maria K. Michael.

A 4-state solution to the Firing Squad Synchronization Problem based on hybrid rule 60 and 102 cellular automata

Outline. Binary Tree

Multipoint Alternate Marking method for passive and hybrid performance monitoring

Seven-Segment Display Driver

Minimum Spanning Trees

CSI35 Chapter 11 Review

RAM Model. I/O Model. Real Machine Example: Nehalem : Algorithms in the Real World 4/9/13

Chapter 9. Graphs. 9.1 Graphs

Walk Like a Mathematician Learning Task:

CS553 Lecture Register Allocation I 3

Trees as operads. Lecture A formalism of trees

Integration Continued. Integration by Parts Solving Definite Integrals: Area Under a Curve Improper Integrals

Ch 1.2: Solutions of Some Differential Equations

WORKSHOP 6 BRIDGE TRUSS

Page 1. Question 19.1b Electric Charge II Question 19.2a Conductors I. ConcepTest Clicker Questions Chapter 19. Physics, 4 th Edition James S.

Formal Concept Analysis

Quartets and unrooted level-k networks

On Contract-and-Refine Transformations Between Phylogenetic Trees

OpenMx Matrices and Operators

Aquauno Video 6 Plus Page 1

Transcription:

Ltur 7 Phylognti Anlysis Aitionl Rfrn Molulr Evolution: A Phylognti Approh Rori D. M. Pg n Ewr C. Holms 1

Uss of Phylognti Anlysis Evolutionry trs Multipl squn lignmnt Evolutionry Prolms i) Th fossil ror suggsts tht morn mn ivrg from ps out 5- million yrs go. Morn Homo spins mrg twn 1,-, yrs go ii) DNA n squn lignmnt y Po support this. iii) Work s on mitohonril DNA y Wilson t l suggst th morn mn mrg only, yrs go with th ivrgn into iffrnt rs 5, yrs go 1. mitohonril DNA irulr. mtrnl inhritn 3. 1x fstr muttion rt thn nulr DNA

Algorithms Typs of Dt Distns Nuloti sits Tr-uiling Mtho Clustring Algorithm Optimlity Critrion UPGMA Nighor Joining Minimum Evolution Mximum Prsimony Mximum Liklihoo From Pg n Holms Molulr Evolution: A Phylognti Approh Prliminris Txon (tx plurl) or oprtion txon unit is ntity whos istn from othr ntitis n msurs (i spis, mino i squn, lngug, t.) Comprisons r m on msurmnts or ssumptions onrning rts of volutionry hng. This is omplit y k muttions, prlll muttions, n vritions in muttion rt. W will only onsir sustitutions. 3

Amino Ai Squns i) For xmpl, th mino i sustitution rt pr sit pr yr is 5.3 x 1-9 for guin pig ut only.33 x 1-9 for othr orgnisms. ii) Th volutionry tim is th vrg tim to prou on sustitution pr 1 mino is T u 1 = 1λ Amino Ai Squns Exmpl Thr r iffrns in squn of 1 mino is whn ompring lf n p histon H4. Sin plnts n nimls ivrg 1 illion yrs go, T u =.5 illion yrs 1 λ = 1T u 1 λ = 1T u 1-11 iii) proility of sustitution svrl wy to lult it. Th st wy is using th PAM mtris. 4

Nuloti Squns i) Diffrnt from mino i squns u to runny in th gnti o (i svrl oons n o for prtiulr mino i. ii) Most sustitutions in th 3 r position r synonomous (UC* is th RNA oing for srin th orrsponing DNA woul AG*). Sin volution shoul pn on funtion n this is onfrr y th mino i squn, it hs n suggst tht th molulr lok shoul s on th sustitution rt in th thir position of th oon. In ft, in th firinopptis, this is s high s th mino i sustitution rt. Nuloti Squns iii) In th finition of PAM mtris, on ssums isrt Mrkov Chin, with th PAM mtrix ing th trnsition mtrix for th Mrkov Chin. 5

Mrkov Chins Assum tht w hv pross tht hs isrt osrvl stts x 1, x,. Whn w monitor this ovr tim w gt squn of th stts oupi q 1, q,. whr q i = ny of x 1, x,. This squn is Mrkov Chin. Not tht whil thr n n infinit numr of stts, th Mrkov hin hs ountl numr of lmnts. Mrkov Chins Anothr proprty of Mrkov pross is tht history os not mttr. This mns tht th stt ssum t tim t+1 pns on th stt ssum on t (not on ny othr prvious stt). This is ll th Mrkov proprty. Lt X = {X n, n = 1,, } isrt tim rnom pross with stt sp S whos lmnts r s 1, s, X is Mrkov hin if for ny n, th proility tht X n+1 tks on ny vlu s k S is onitionl on th vlu of X n ut os not pn on th vlus of X n-1, X n-,. Th on-tim-stp trnsition proilitis p jk (n) = Pr{X n = s k X n-1 = s j } j,k=1,, n = 1,, Sin X is rnom vril ll th initil onition, p j () = Pr{X = s j } j=1,,

Mrkov Chins Trnsition mtrix put th p jk into mtrix P. A squn of mino is n thought of s Mrkov hin. Sttionry Mrkov pross th proilitis p jk (n) o not pn on n, tht is thy r onstnt. Anothr wy of sying this is n initil istriution π is si to sttionry if πp(t)=π. Irruil vry stt n rh from vry othr stt Applition of Mrkov prosss to volutionry mols i) Th PAM mtrix hs its sustitution proilitis trmin from losly rlt mino i squns, it ssums tht th sustitutions hv ourr through on pplition of th trnsition mtrix (i.. no multipl sustitutions n givn sit) n ssums tht volutionry istn rsults from rpt pplition of th sm PAM mtrix. ii) A ttr volutionry mol is n. (txt p 14-144) This rquirs th us of ontinuous Mrkov pross rthr thn isrt Mrkov hin. This still hs th Mrkov proprty. 7

Applition of Mrkov prosss to volutionry mols A tim homognous Mrkov pross for th stohsti funtion X(t) onsists of st of stts Q={1,,,n}, st of initil stt istriutions π=(π 1,,π n ), n trnsition proility funtions P(t)= ) p 1,,1 (t) p 1,n (t) ).... p n,1 (t) p n,n (t) Applition of Mrkov prosss to volutionry mols W n pply this to nuloti squns. Lt Q={1,,3,4} orrspon to {A,C,G,T}. P(t)= ) ) p 1,,1 (t) p 1,4 (t) ).... p n,4 (t) p 4,4 (t) P[A A,t] P[C A,t] P[G A,t] P[T A,t] ) P[A C,t] P[C C,t] P[G C,t] P[T C,t] P[A G,t] P[C G,t] P[G G,t] P[T G,t] P[A T,t] P[C T,t] P[G T,t] P[T T,t] 8

Juks-Cntor Mol A α G α α α α C α Trnsitions = Trnsvrsions T Rts of Nuli Ai Chng Th Juks Cntor mol ssums tht u 1 =u =u 3 =u 4 =, yiling th rt mtrix. Λ = ) -3α α α α α -3α α α α α -3α α α α α -3α) Thn p 1 =p =p 3 =p 4 = Us in Mimum Liklihoo Clultion 9

HKY Mol Purins A α G β β β β Pyrimiins C α T Trnsitions > Trnsvrsions α > β Dfinitions -tx ntitis whos istn from othr ntitis n msur -A irt grph G(V, E) onsists of st V of nos or vrtis n st E(V) of irt gs. Thn (i,j) E mns tht thr is irt g from i to j. -A grph is unirt if th g rltion is symmtri, tht is, (i,j) E iff (j,i) E. -A irt grph is onnt if thr is irt pth twn ny two nos. 1

Dfinitions -A irt grph is yli if it os not ontin yl. (i.. (i,j), (j,k), n (k,i) ll long to E. -A tr is unirt, onnt, yli grph. -A root tr hs strting no ll root. -Th prnt no is immitly for no on th pth from th root. -Th hil no is no tht is follows no. Dfinitions -An nstor is ny no tht m for no on th pth from root. -A lf or xtrnl no is no tht h no hilrn. -Non-lf nos r ll intrnl nos. -Th pth of tr is on lss thn th mximl numr of nos on pth from th root to lf. 11

Dfinitions -An orr tr is tr whr th hilrn of intrnl nos r numr. -A inry tr is tr whr h no hs t most two hilrn. Othrwis it is multifurting. Trs Qustion: Drw ll inry trs on 1,, n 3 tx. A phylognti tr on n tx is tr with lvs ll y 1,,n. 1

How o you tll if two trs r th sm? If you n onvrt on tr into nothr without rking ny rnhs thy r topologilly quivlnt. Phylognti Trs Phylognti trs or volutionry trs r inry trs tht sri th rltions twn spis. Trs onsist of nos or vrtis n tx or lvs. 13

Phylognti Trs To unrstn th t, w must unrstn som of th mthos hin phylognti trs or volutionry trs i) Clustring mthos ii) Mximum liklihoo mthos iii) Qurtt puzzling Wht o w o with phylognti trs? msuring volutionry hng on tr If th lvs of tr h signify squn, th sum of th wights of th gs givs th volutionry istn twn th two squns. molulr phylogntis Convrt informtion in squns into n volutionry tr for thos squns. 14

Clustr mthos vs. srh mthos Thr r two si mthos for onstruting trs. Clustr mthos us n lgorithm (st of stps) to gnrt tr. Ths mthos r vry sy to implmnt n hn n omputtionlly ffiint. Thy lso typilly prou singl tr. A ig isvntg to this mtho is tht it pns upon th orr in whih w squns to th tr. Hn, thr oul iffrnt tr tht xplins th t just s wll. Srh mthos us som sort of optimlity ritri to hoos mong th st of ll possil trs. Th optimlity ritri givs h tr sor tht is s on th omprison of th tr to t. Th vntg of srh mthos is tht thy us n xpliit funtion rlting th trs to th t (for xmpl, mol of how th squns volv). Th isvntg is tht thy r omputtionlly vry xpnsiv (NP omplt prolm). How o w ompr iffrnt tr mthos? Effiiny How fst is th mtho? powr How muh t os th mtho rquir? onsistny Will th tr onvrg on th right nswr giv nough t? roustnss Will minor violtions of th mtho s ssumptions rsult in poor stimts of phylogny? flsifiility Will th mtho tll us whn its ssumptions r violt? 15

How o ssign wights for th gs of our trs? Distn mthos first onvrt lign squns into pirwis istn mtrix thn input tht mtrix into tr uiling mtho. Th mjor ojtions to istn mthos r tht summrizing st of squns y istn t loss informtion n rnh lngths stimt y som istn mthos might not volutionrily trminl. Disrt mthos onsir h nuloti sit (of som funtion of h sit) irtly. Distn Mthos Two istn mthos r nighor joining n minimum volution. Minimum volution fins th tr tht minimizs th sum of th rnh lngths whr th lngths r lult from th pirwis istns twn th squns. Linr progrmming or lst squrs mthos n us to o this. Nighor joining is lustring mtho tht is omputtionlly fst n givs uniqu rsult. This n us somthing lik th four-point onition n lustrs th losst lmnts. 1

Disrt Mthos Th two mjor isrt mthos r mximum prsimony n mximum liklihoo. Both ths r srh mthos. i) With mximum prsimony w try to ronstrut th volution t prtiulr sit with th fwst possil volutionry hngs. Th vntgs of prsimony r tht it mks rltivly fw ssumptions out th volutionry pross, it hs n stui xtnsivly mthmtilly, n som vry powrful softwr implmnttions r vill. Th mjor isvntg to using prsimony is tht unr som mols of volution, it is inonsistnt, tht is if mor t is th wrong rsult might our. Disrt Mthos ii) Th mximum liklihoo pproh looks for th tr tht mks th t th most prol volutionry outom. This pproh rquirs xpliit mol of volution whih is oth strngth n wknss us th rsults pn on th mol us. This mtho n lso vry omputtionlly xpnsiv. 17

Typs of mtris For th four point onition or itiv mtri, givn th lvs i, j, k, n l (i,j) + (k,l) (i,k) + (j,l) = (i,l) + (j,k) For n ultrmtri mtri th ultrmtri or 3-point onition hols Tht is givn th lvs i, j, n k (i,j) (i,k) = (j,k) Ultrmtri trs Clustring mthos ttmpt to rpt lustr th t y grouping th losst lmnts togthr. Thy r us for phylogny n gn xprssion mirorry nlysis. Th pir group mtho (PGM) is thniqu whr th pirs r rptly mlgmt. Th unwight pir group mtho with rithmti mn (UPGMA) is us to lustr molulr t whr squn lignmnt istn twn squns hs n trmin in istn mtrix. 18

UPGMA Input n n x n istn mtrix D 1. Initiliz st C to onsist of n singlton lustrs. Initiliz ist(,) on C y fining for ll {i} n {j} in C ist({i},{j}) = D(i,j) 3. Rpt th following n-1 tims ) trmin pir, of lustrs in C suh tht ist (,) in miniml; fin min = ist(,) ) fin nw lustr = U ; fin C = C {,} U {} ) fin no with ll n ughtrs n, whr th hs istn min/ to its lvs ) fin for ll f in C with f iffrnt from ist(,f) = ist(f,) = [ist(,f) + ist(,f)]/ UPGMA Exmpl 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Ultrmtri Topology Distn Tl (i,j) (i,k) = (j,k) from Clot n Bkofn Computtionl Molulr Biology 19

Givn th istn tl 1 1 1 1 1 1 1 1 UPGMA Exmpl 1 1 1 1 1. W hv fiv singlton lustrs {}, {}, {}, {}, n {} from th st C = {,,,,}. Gt th istns from th istn tl (lft) 3. ) Fin th losst two lustrs, nmly, lustrs {} n {} with min = ) f = {,} n C= {,,,f} ) f is th root for n ) Dfin nw istn tl Rpt 3 Th ol istn tl UPGMA Exmpl Th nw istn tl f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 f 1 1 1 1

f 1 1 UPGMA xmpl Tr formtion g f 3 3 + 3 1 1 h 3 g r h f 3 3 1 1 WPGMA Input n n x n istn mtrix D 1. Initiliz st C to onsist of n singlton lustrs. Initiliz ist(,) on C y fining for ll {i} n {j} in C ist({i},{j}) = D(i,j) 3. Rpt th following n-1 tims ) trmin pir, of lustrs in C suh tht ist (,) in miniml; fin min = ist(,) ) fin nw lustr = U ; fin C = C {,} U {} ) fin no with ll n ughtrs n, whr th hs istn min/ to its lvs ) fin for ll f in C with f iffrnt from ist(,f) = ist(f,) = [ ist(,f) + ist(,f)]/[ + ] 1

1 1 Frris Trnsform - Exmpl 1 8 8 1 Aitiv, non-ultrmtri topology 1 9 14 11 9 13 1 18 13 1 11 14 1 1 19 Distn Tl 11 18 11 19 (i,j) + (k,l) (i,k) + (j,l) = (i,l) + (j,k) from Clot n Bkofn Computtionl Molulr Biology Frris Trnsform - Exmpl 9.19.38.75.5 9 14 11 9 13 1 18 13 1 11 14 1 1 19 11 18 11 19 UPGMA inorrtly Distn Tl ronstrut topology from Clot n Bkofn Computtionl Molulr Biology

Frris Trnsform Somtims th t will stisfy n itiv mtri n not ultrmtri. This will yil tr with th inorrt topology if UPGMA or WPGMA is us. Th Frris Trnsform Distn Mtho onvrts th t for n itiv, non-ultrmtri mtri so tht it stisfis th ultrmtri. Thn UPGMA or WPGMA n us to yil tr with th orrt topology Frris Trnsform If w hv phylognti tr with root r n lvs (tx) 1,,n n i,j is th istn twn two nos, thn w hv th trnsform istn i, j r = whr 1 = n i, j n i= 1 i, r i, r j, r + r You must ssum root r. This n th lf tht is frthst from ll th othrs. Unfortuntly, pning on th root slt th mtho might not giv th right topology. 3

1 1 Frris Trnsform - Exmpl 1 8 8 1 Aitiv, non-ultrmtri topology Wht is th istn to th root? 1 9 14 11 9 13 1 18 13 1 11 14 1 1 19 Distn Tl 11 18 11 19 r 9 4 1 9 Frris Trnsform - Exmpl Originl Distn Tl with ssum root Trnsform Distn Tl 9 14 11 9 13 1 18 13 1 11 14 1 1 19 11 18 11 19 r 9 4 1 9. 7. 7. 7.. 7. 7. 7. 7. 7. 5.. 7. 7. 5.. 7. 7... i, j i, r j, r i, j = + r whr r = 7. 4

Frris Trnsform - Exmpl.5.5.5 3.1 3.1.. 3.1 Frris trnsform tr topology. 7. 7. 7.. 7. 7. 7. 7. 7. 5. 7. 5. 7... Distn Tl 7. 7... (i,j) + (k,l) (i,k) + (j,l) = (i,l) + (j,k) Frris Trnsform Pik s root Originl Distn Tl with ssum root Trnsform Distn Tl 9 14 11 9 13 1 18 13 1 11 14 1 1 19 11 18 11 19. 3. 13... 3. 13.. 3. 3. 13. 3. 13. 13. 13. 13... 3. 13. i, j i, r j, r i, j = + r whr r = 13. 5

Frris Trnsform Corrt Topology!.5 1..1.1 5. 1.1 1... 3. 13... 3. 13.. 3. 3. 13. 3. 13. 13. 13. 13... 3. 13. Frris trnsform tr topology Distn Tl (i,j) + (k,l) (i,k) + (j,l) = (i,l) + (j,k) Algorithms Typs of Dt Distns Nuloti sits Tr-uiling Mtho Clustring Algorithm Optimlity Critrion UPGMA Nighor Joining Minimum Evolution Mximum Prsimony Mximum Liklihoo From Pg n Holms Molulr Evolution: A Phylognti Approh

Phylogny: Distn Mthos Prsimony Mximum Liklihoo Look t hngs in h olumn of lignmnt Mtri to stimt Popultion Drift Computtionlly mor xpnsiv Nighor Joining Comins omputtionl sp with uniqunss of rsult Clustring mtho hn hs no optimlity ritri. Oftn us in onjuntion with Minimum Evolution to stimt th minimum volution tr 7

Nighor Joining n Minimum Evolution Comput th Nighor Joining Tr n s if ny lol rrrngmnt prous shortr tr. Not gurnt to giv th minimum volution tr. Nighor Joining Algorithm Rlt to lustr nlysis ut rmovs th ssumption of ultrmtri t Dos not ssum t oms los to fitting n itiv tr (n to us n pproprit mol of volution). Kps trk of nos on tr Consirs only losst pirs n not ll possil pirs in h stp of str omposition. 8

Nighor Joining FROM: http://www.ip.ul../~oppr/p rivt/nighor.html Author: Fr Oppros Suppos w hv th following tr: Sin B n D hv umult muttions t highr rt thn A. Th Thr-point ritrion is violt n th UPGMA mtho nnot us sin this woul group togthr A n C rthr thn A n B. In suh s th nighor-joining mtho is on of th rommn mthos. Nighor Joining Th rw t of th tr r rprsnt y th following istn mtrix: A B C D E B5 C 4 7 D 7 1 7 E 9 5 F8 11898 W hv in totl OTUs (N=). 9

Nighor Joining Stp 1: W lult th nt ivrgn r (i) for h OTU from ll othr OTUs r(a) = 5+4+7++8=3 r(b) = 4 r(c) = 3 r(d) = 38 r(e) = 34 r(f) = 44 Nighor Joining Stp : Now w lult nw istn mtrix using for h pir of OTUs th formul: M(I,j)=(i,j) - [r(i) + r(j)]/(n-) or in th s of th pir A,B: M(AB)=(AB) -[(r(a) + r(b)]/(n-) = -13 A B C D E B-13 C -11.5-11.5 D -1-1 -1.5 E -1-1 -1.5-13 F -1.5-1.5-11 -11.5-11.5 3

Nighor Joining Now w strt with str tr: A F B \ / \ / \ / / \ / \ / \ E C D Nighor Joining Stp 3: Now w hoos s nighors thos two OTUs for whih M ij is th smllst. Ths r A n B n D n E. Lt's tk A n B s nighors n w form nw no ll U. Now w lult th rnh lngth from th intrnl no U to th xtrnl OTUs A n B. S(AU) =(AB) / + [r(a)-r(b)] / (N-) = 1 S(BU) =(AB) -S(AU) = 4 31

Nighor Joining Stp 4: Now w fin nw istns from U to h othr trminl no: (CU) = (AC) + (BC) - (AB) / = 3 (DU) = (AD) + (BD) - (AB) / = (EU) = (AE) + (BE) - (AB) / = 5 (FU) = (AF) + (BF) - (AB) / = 7 n w rt nw mtrix: Nighor Joining UCDE C3 D7 E55 F7898 Th rsulting tr will th following: C D \ A \ / 1 / \ / \ 4 E \ F \ B N= N-1 = 5 Th ntir prour is rpt strting t stp 1 3

Qurtt Puzzling Qurtt puzzling is lss omputtionlly xpnsiv mtho thn mximum liklihoo to trmin th phylognti tr. Prour: n 1. Comput th 4 mximum liklihoo trs for ll possil qurtts. (Qurtt Puzzling stp) Comin th qurtt trs into n- txon tr tht tris to onform to ll th nighor rltions of ll th qurtt trs. 3. Rpt stps 1. n. mny tims n us th mjority onsnsus tr. Givn th originl tr topology for 5 tx Qurtt Puzzling Two possil qurtts 33

Qurtt Puzzling All 5 4 Possil qurtts N(,;,) N(,;,) N(,;,) N(,;,) N(,;,) Qurtt Puzzling Qurtt puzzling stp prour: 1. Tk on of th qurtt with th nighor rltion N(,;,). A pnlty of 1 to vry g suh tht th ition of th nw tx will yil th inorrt topology. 3. Rpt for ll th nighor rltions 4. Th rnh with th lowst wight is th rnh whr th tx show 34

35 Qurtt Puzzling Qurtt with nighor rltion N(,;,) Aing tx twn n Yils wrong topology! Qurtt Puzzling N(,;,) N(,;,) N(,;,) N(,;,) 1 1 3 3 3 1 1

Qurtt Puzzling Chos on qurtt tr. Pik th tx to Us ll nighor rltions (othr thn th on iing th qurtt tr us) to fin wights on rnhs A th tx to th rnh with th lowst pnlty. Minimum Evolution Givn n unroot mtri tr for n squns, thr r (n-3) rnhs h with rnh lngth i. Th sum of ths rnh lngths is th lngth L of th tr. Th minimum volution tr is th tr whih minimizs L 3

Minimum Evolution similr to prsimony But lngth oms from pirwis istns twn th squns (not from fit of nuloti sits) Us linr progrmming or lst squrs to fin optiml solution. Minimum Evolution 37

Phylogny: Chrtr Stt Mthos Prsimony Mxmum Liklihoo Look t hngs in h olumn of lignmnt Mtri to stimt Popultion Drift Computtionlly mor xpnsiv PHYLOGENY: Chrtr Stts Tx 1 Tx Tx 3 Tx 4 Tx 5 ATT-GCCATT ATG-GC-ATT ATC-TATCTT ATCAAATCTT ACT-G--ACC Informtiv hrtrs (olumns) Look t ll possil trs For h olumn, lult ost Minimum sor = st tr 38

Mximum Prsimony Tx 1 Tx Tx 3 Tx 4 Tx 5 ATT-GCCATT ATG-GC-ATT ATC-TATCTT ATCAAATCTT ACT-G--ACC Informtiv hrtrs Minimum numr of hngs Multipl sustitutions = homoplsy Mximum Prsimony Smllst numr of volutionry hngs First us on protin t (Ek & Dyhoff, 19) Appli to Nuloti t (Fith, 1977) Brut for srh of tr sp 39

Cost of Prsimony Tr 1 3 4 5 T G C C T 1 Tx 1 Tx Tx 3 Tx 4 Tx 5 G 1 C C ATT- GCCATT ATG-GC - ATT ATC- TAT CTT ATCAAAT CTT ACT- G - - ACC 1 T Chrtr y hrtr Column 3 Prsimony sor = ++3 Cost of Prsimony Tr 1 3 4 5 T-G G-G C-T CAA T-G 1 G-G 1 Tx 1 Tx Tx 3 Tx 4 Tx 5 C-G C-G ATT- GCCATT ATG-GC - ATT T-G ATC- TAT CTT ATCAAAT CTT ACT- G - - ACC Chrtr y hrtr Prsimony sor = ++3++ = 5 4

Tr Sp 4 3 1 5 Tx Orr 5 4 3 1 5 4 1 3 Topology Srh Tr Sp Exhustiv Srh (Brut For) Brnh n Boun (Effiint?) Huristi Mthos (Hill Climing) Gnti Algorithms (GAML) 41

Mximum Liklihoo Gol: Construt phylognti tr from DNA squns whos liklihoo is mximum. (Flsnstin 1981) Prour Strt with givn topology n us th mximum liklihoo mtho to optimiz rnh lngths Mk lol moifitions to th topology n r-optimiz th rnh lngths Nw tx r on y on, optimizing rnh lngths n topologis h tim Assums n volutionry pross tht is rvrsil Mrkov pross Vry omputtionlly xpnsiv to us Liklihoo of Tr W wnt to fin L(tr) = Pr[t tr] Givn th t 1 =CT, =CG n 3 =AT Consir th tr CT 1 4 3 CG AT W n lult th liklihoo of this tr if w fill in th intrnl nos 4

Liklihoo of Tr Sin this is Mrkov pross, w n onsir h sit sprtly from th othr whih rus th omplxity of th lultion. Exmpl 3 1 CA AC A C 1 1 4 AT = C A A T 3 4 3 4 CT CG C C T G tr 1 tr tr3 Pr[t tr 1] = Pr[t tr ] + Pr[t tr 3] Liklihoo of sit spifi tr W n lult from th trnsition mtrix n th istns on h rnh th proility of h hng. Th prout of ths multipli y th proility of th originl s givs th liklihoo of sit spifi tr. Sin thr r two unknown nos th oul sum of ll possil vlus for h (ACTG) givs th liklihoo for th originl tr. 43

Mximum Liklihoo Sttistil mol for hngs in nulotis Liklihoo tht tht hng ourr Muh mor omputtionl intnsiv thn prsimony Hypothsis Tsting Trnsitions/Trnsvrsions HKY (Kimur prmtr mol) Juks Cntor (1 prmtr) Mximum Liklihoo Tx 1 Tx Tx 3 Tx 4 Tx 5 ATT-GCCATT ATG-GC-ATT ATC-TATCTT ATCAAATCTT ACT-G--ACC Sttistil mol for hngs in nulotis Trnsitions/Trnsvrsions HKY (Kimur prmtr mol) Juks Cntor (1 prmtr) Liklihoo tht tht hng ourr Muh mor omputtionl intnsiv thn prsimony 44

Liklihoo of Tr L(tr) = Pr [t tr] Multiply liklihoo for h hrtr position Rursiv finition of Liklihoo Svs omputtionl tim Liklihoo of Tr L(tr) = Pr [t tr] Multiply liklihoo for h hrtr position Rursiv finition of Liklihoo 45

Liklihoo of Tr 5 5 1 3 4 1 3 4 4