Genome Rerrngements: from Biologicl Prolem to Comintoril Algorithms (nd ck) Pvel Pevzner Deprtment of Computer Science, University of Cliforni t Sn Diego
Genome Rerrngements Mouse X chromosome Unknown ncestor ~ 80 M yers go Humn X chromosome ü Wht is the evolutionry scenrio for trnsforming one genome into the other? ü Wht is the orgniztion of the ncestrl genome? ü Are there ny rerrngement hotspots in mmmlin genomes?
Genome Rerrngements: Evolutionry Scenrios Unknown ncestor ~ 80 M yers go ü Wht is the evolutionry scenrio for trnsforming one genome into the other? ü Wht is the orgniztion of the ncestrl genome? ü Are there ny rerrngement hotspots in mmmlin genomes? Reversl flips segment of chromosome
Genome Rerrngements: Ancestrl Reconstruction ü Wht is the evolutionry scenrio for trnsforming one genome into the other? ü Wht is the orgniztion of the ncestrl genome? ü Are there ny rerrngement hotspots in mmmlin genomes? Bourque, Tesler, PP, Genome Res.
Genome Rerrngements: Evolutionry Erthqukes ü Wht is the evolutionry scenrio for trnsforming one genome into the other? ü Wht is the orgniztion of the ncestrl genome? ü Are there ny rerrngement hotspots in mmmlin genomes?
Rerrngement Hotspots in Tumor Genomes ü Rerrngements my disrupt genes nd lter gene regultion. ü Exmple: rerrngement in leukemi yields Phildelphi chromosome: Chr 9 Chr 22 promoter ABL gene promoter BCR gene promoter BCR gene promoter c-1 oncogene ü Thousnds of rerrngements hotspots known for different tumors.
Rerrngement Hotspots in Tumor Genomes MCF7 rest cncer cell line ü Wht is the evolutionry scenrio for trnsforming one genome into the other? ü Wht is the orgniztion of the ncestrl genome? ü Where re the rerrngement hotspots in mmmlin genomes?
Controversy: Evolution vs. Intelligent Design
Three Evolutionry Controversies Primte - Rodent - Crnivore Split Rerrngement Hotspots Whole Genome Duplictions
Primte Rodent Crnivore Split rodent-crnivore split primte-crnivore split Primte - Rodent - Crnivore Split primte-rodent split
Primte Rodent Crnivore Split (who is closer to us: mouse or dog?) rodent-crnivore split primte-crnivore split Primte - Rodent - Crnivore Split primte-rodent split
Primte Rodent vs. Primte Crnivore Split Hutley et l., MBE, My 07: We hve demonstrted with very high confidence tht the rodents diverged efore crnivores nd primtes Lunter, PLOS CB, April 07: It ppers unjustified to continue to consider the phylogeny of primtes, rodents, nd cnines s contentious primte-rodent split primte-crnivore split Arnson et l, PNAS 02 Cnrozzi, PLOS CB 06 Murphy et l,, Science 01 Kumr&Hedges, Nture, 98
Reconstruction of Ancestrl Genomes: Humn / Mouse / Rt
Cn Rerrngement Anlysis Resolve the Primte Rodent Crnivore Controversy?
Susumu Ohno: Two Hypothesis Ohno, 1970, 1973 Rerrngement Hotspots Whole Genome Duplictions ü Rndom Brekge Hypothesis: Genomic rchitectures re shped y rerrngements tht occur rndomly (no frgile regions). ü Whole Genome Dupliction (WGD) Hypothesis: Big leps in evolution would hve een impossile without whole genome duplictions.
Rndom Brekge Model (RBM) Rerrngement Hotspots ü The rndom rekge hypothesis ws emrced y iologists nd hs ecome de fcto theory of chromosome evolution. ü Ndeu & Tylor, Proc. Nt'l Acd. Sciences 1984 ü ü ü First convincing rguments in fvor of the Rndom Brekge Model (RBM) RBM implies tht there is no rerrngement hotspots RBM ws re-iterted in hundreds of ppers
Rndom Brekge Model (RBM) Rerrngement Hotspots ü The rndom rekge hypothesis ws emrced y iologists nd hs ecome de fcto theory of chromosome evolution. ü Ndeu & Tylor, Proc. Nt'l Acd. Sci. 1984 ü ü ü First convincing rguments in fvor of the Rndom Brekge Model (RBM) RBM implies tht there is no rerrngement hotspots RBM ws re-iterted in hundreds of ppers ü PP & Tesler, Proc. Nt'l Acd. Sci. 2003 ü ü ü Rejected RBM nd proposed the Frgile Brekge Model (FBM) Postulted existence of rerrngement hotspots nd vst rekpoint re-use FBM implies tht the humn genome is mosic of solid nd frgile regions.
Are the Rerrngement Hotspots Rel? üthe Frgile Brekge Model did not live long: in 2003 Dvid Snkoff presented convincing rguments ginst FBM: we hve shown tht rekpoint re-use of the sme mgnitude s found in Pevzner nd Tesler, 2003 my very well e rtifcts in context where NO re-use ctully occurred.
Are the Rerrngement Hotspots Rel? üthe Frgile Brekge Model did not live long: in 2003 Dvid Snkoff presented rguments ginst FBM (Snkoff & Trinh, J. Comp. Biol, 2005) we hve shown tht rekpoint re-use of the sme mgnitude s found in Pevzner nd Tesler, 2003 my very well e rtifcts in context where NO re-use ctully occurred. Before you criticize people, you should wlk mile in their shoes. Tht wy, when you criticize them, you re mile wy. And you hve their shoes. J.K. Lmert
Reuttl of the Reuttl of the Reuttl n Peng et l., 2006 (PLOS Computtionl Biology) found n error in Snkoff-Trinh reuttl: If Snkoff & Trinh fixed their ST-Synteny lgorithm, they would confirm rther thn reject Pevzner-Tesler s Frgile Brekge Model n Snkoff, 2006 (PLOS Computtionl Biology): Not only did we foist hstily conceived nd incorrectly executed simultion on n overworked RECOMB conference progrm committee, ut worse nostr mxim culp we oliged tem of high-powered reserchers to clen up fter us!
Controversy continues n Rndom Brekge Model controversy: While Snkoff cknowledged the flw in his rguments ginst RBM, he ppers reluctnt to cknowledge the reuttl of RBM, this time rguing tht more complex rerrngement events (e.g., trnspositions) my crete n ppernce of rekpoint re-use. ü Most recent studies support the Frgile Brekge Model: vn der Wind, 2004, Biley, 2004, Zho et l., 2004, Murphy et l., 2005, Hinsch & Hnnenhlli, 2006, Ruiz-Herrer et l., 2006, Yue & Hf, 2006, Mehn et l., 2007, etc Kikut et l., Genome Res. 2007:... the Ndeu nd Tylor hypothesis is not possile for the explntion of synteny in rt.
Reuttl of the Reuttl of the Reuttl: Controversy Resolved... Or Ws It? ü Nerly ll recent studies support the Frgile Brekge Model: vn der Wind, 2004, Biley, 2004, Zho et l., 2004, Murphy et l., 2005, Hinsch & Hnnenhlli, 2006, Ruiz-Herrer et l., 2006, Yue & Hf, 2006, Mehn et l., 2007, etc Kikut et l., Genome Res. 2007:... the Ndeu nd Tylor hypothesis is not possile for the explntion of synteny
Rndom vs. Frgile Brekge Dete Continues: Complex Rerrngements ü PP & Tesler, PNAS 2003, rgued tht every evolutionry scenrio for trnsforming Mouse into Humn genome with reversls, trnsloctions, fusions, nd fissions must result in lrge numer of rekpoint re-uses, contrdiction to the RBM. ü Snkoff, PLoS Comp. Biol. 2006: We cnnot infer whether mutully rndomized synteny lock orderings derived from two divergent genomes were creted... through processes other thn reversls nd trnsloctions.
Whole Genome Dupliction (WGD) Whole Genome Duplictions
Genome Rerrngements After WGD
Genome Rerrngements After WGD
Genome Rerrngements After WGD
Whole Genome Dupliction Hypothesis Ws Confirmed After Yers of Controversy üthe Whole Genome Dupliction hypothesis first met with skepticism ut ws finlly confirmed y Kellis et l., Nture 2004: Our nlysis resolves the longstnding controversy on the ncestry of the yest genome. n There ws whole-genome dupliction. Wolfe, Nture, 1997 n There ws no whole-genome dupliction. Dujon, FEBS, 2000 n Duplictions occurred independently Lngkjer, JMB, 2000 n Continuous duplictions Dujon, Yest 2003 n Multiple duplictions Friedmn, Gen. Res, 2003 n Spontneous duplictions Koszul, EMBO, 2004
Whole Genome Dupliction Hypothesis Ws Eventully Confirmed...But Some Scientists Are Not Convinced ü Kellis, Birren & Lnder, Nture 2004: Our nlysis resolves the longstnding controversy on the ncestry of the yest genome. ü Mrtin et l., Biology Direct 2007: We elieve tht the proposl of Whole Genome Dupliction in the yest linege is unwrrnted. ü To ddress Mrtin et l., 2007 rguments ginst WGD, it would e useful to reconstruct the preduplicted genomes. n There ws whole-genome dupliction. Wolfe, Nture, 1997 n There ws no whole-genome dupliction. Dujon, FEBS, 2000 n Duplictions occurred independently Lngkjer, JMB, 2000 n Continuous duplictions Dujon, Yest 2003 n Multiple duplictions Friedmn, Gen. Res, 2003 n Spontneous duplictions Koszul, EMBO, 2004 n Our nlysis resolved the controversy Kellis, Nture, 2004 n WGD in the yest linege is unwrrnted Mrtin, Biology Direct, 2007 n...
Genome Hlving Prolem Reconstruction of the pre-duplicted ncestor: Genome Hlving Prolem
From Biologicl Controversies to Algorithmic Prolems Primte - Rodent - Crnivore Split Ancestrl Genome Reconstruction Rerrngement Hotspots Brekpoint Re-use Anlysis Whole Genome Duplictions Genome Hlving Prolem
Algorithmic Bckground: Genome Rerrngements nd Brekpoint Grphs
Unichromosoml Genomes: Reversl Distnce Sorting y reversls: find the shortest series of reversls trnsforming one uni-chromosoml genome into nother. The numer of reversls in such shortest series is the reversl distnce etween genomes. Hnnenhlli nd PP. (FOCS 1995) gve polynomil lgorithm for computing the reversl distnce.. Step 0: 2-4 -3 5-8 -7-6 1 Step 1: 2 3 4 5-8 -7-6 1 Step 2: -5-4 -3-2 -8-7 -6 1 Step 3: -5-4 -3-2 -1 6 7 8 Step 4: 1 2 3 4 5 6 7 8 Sorting y prefix reversls (pncke flipping prolem), Gtes nd Ppdimitriou, Discrete Appl. Mth. 1976
Sorting y reversls Most prsimonious scenrios Step 0: 2-4 -3 5-8 -7-6 1 Step 1: 2 3 4 5-8 -7-6 1 Step 2: -5-4 -3-2 -8-7 -6 1 Step 3: -5-4 -3-2 -1 6 7 8 Step 4: 1 2 3 4 5 6 7 8 The reversl distnce is the minimum numer of reversls required to trnsform one gene order into nother. Here, the distnce is 4.
Multichromosoml Genomes: Genomic Distnce ü Genomic Distnce etween two genomes is the minimum numer of reversls, trnsloctions, fusions, nd fissions required to trnsform one genome into nother. ü Hnnenhlli nd PP (STOC 1995) gve polynomil lgorithm for computing the genomic distnce. ü These lgorithms were followed y mny improvements: Kpln et l. 1999, Bder et l. 2001, Tesler 2002, Ozery-Flto & Shmir 2003, Tnnier & Sgot 2004, Bergeron 2001-07, etc.
HP Theory Is Rther Complicted: Is There Simpler Alterntive? ü HP theory is key tool in most genome rerrngement studies. However, it is rther complicted mking it difficult to pply it in complex setups such s the RBM vs. FBM or WGD controversies. ü To study the outlined evolutionry controversies, we use n lterntive (simpler) pproch clled k-rek nlysis
Simplifying HP Theory: from Liner to Circulr Chromosomes d c A chromosome is represented s cycle with directed red nd undirected lck edges: c d red edges encode locks (genes) lck edges connect djcent locks
Reversls on Circulr Chromosomes c reversl c d d c d d c A reversl replces two lck edges with two other lck edges
Fissions c fission c d d c d c d ü A fission splits single cycle (chromosome) into two. ü A fissions replces two lck edges with two other lck edges.
Trnsloctions / Fusions c fusion c d d c d c d ü A trnsloction/fusion merges two cycles (chromosomes) into single one. ü They lso replce two lck edges with two other lck edges.
2-Breks c 2-rek c d d ü A 2-Brek replces ny 2 lck edges with nother 2 lck edges forming mtching on the sme 4 vertices. ü Reversls, trnsloctions, fusions, nd fissions represent ll possile types of 2-reks (introduced s DCJ opertions y Yncopoulos et l., 2005)
2-Brek Distnce ü The 2-rek distnce d 2 (P,Q) is the minimum numer of 2-reks required to trnsform P into Q. ü In contrst to the genomic distnce, the 2-rek distnce is esy to compute.
Two Genomes s Blck-Red nd Green-Red Cycles P c c d d c Q c d d
Q-style representtion of P P d c c c Q d d
Q-style representtion of P P d c c c Q d d
Brekpoint Grph: Superposition of Genome Grphs Brekpoint Grph P d c c (Bfn & PP, FOCS 1994) c G(P,Q) Q d d
Brekpoint Grph: GLUING Red Edges with the Sme Lels Brekpoint Grph P d c c (Bfn & PP, FOCS 1994) c G(P,Q) Q d d
Blck-Green Alternting Cycles ü Blck nd green edges form perfect mtchings in the rekpoint grph. Therefore, these edges form collection of lck-green lternting cycles ü Let cycles(p,q) e the numer of lck-green cycles. c d cycles(p,q)=2
Rerrngements Chnge Brekpoint Grphs nd cycle(p,q) Trnsforming genome P into genome Q y 2-reks corresponds to trnsforming the rekpoint grph G(P,Q) into the identity rekpoint grph G(Q,Q). c G(P,Q) d cycles(p,q) = 2 c c G(Q,Q) G(P',Q) trivil cycles d d cycles(p',q) = 3 cycles(q,q) = 4 =#locks
Sorting y 2-Breks 2-reks P=P 0 P 1... P d = Q G(P,Q) G(P 1,Q)... G(Q,Q) cycles(p,q) cycles... #locks cycles # of lck-green cycles incresed y #locks - cycles(p,q) How much ech 2-rek cn contriute to the increse in the numer of cycles?
Ech 2-Brek Increses #Cycles y t Most 1 A 2-rek: ü dds 2 new lck edges nd thus cretes t most 2 new cycles (contining two new lck edges) ü removes 2 lck edges nd thus destroys t lest 1 old cycle (contining two old edges): chnge in the numer of cycles: Δcycles 2-1=1.
2-Brek Distnce ü Any 2-rek increses the numer of cycles y t most one (Δcycles 1) ü Any non-trivil cycle cn e split into two cycles with 2- rek (Δcycles = 1) ü Every sorting y 2-reks must increse the numer of cycles y #locks - cycles(p,q) ü The 2-rek distnce etween genomes P ndq: d 2 (P,Q) = #locks - cycles(p,q) (cp. Yncopoulos et l., 2005, Bergeron et l., 2006)
Complex Rerrngements: Trnspositions ü Sorting y Trnspositions Prolem: find the shortest sequence of trnspositions trnsforming one genome into nother. ü First 1.5-pproximtion lgorithm ws given y Bfn nd P.P. SODA 1995. The est known result: 1.375-pproximtion lgorithm of Elis nd Hrtmn, 2005. ü The complexity sttus remins unknown.
Trnspositions c trnsposition c d c d c d d Trnspositions cut off segment of one chromosome nd insert it t some position in the sme or nother chromosome
Trnspositions Are 3-Breks c 3-rek c d d ü 3-Brek replces ny triple of lck edges with nother triple forming mtching on the sme 6 vertices. ü Trnspositions re 3-Breks.
Sorting y 3-Breks 3-reks P=P 0 P 1... P d = Q G(P,Q) G(P 1,Q)... G(Q,Q) cycles(p,q) cycles... #locks cycles # of lck-green cycles incresed y #locks - cycles(p,q) How much ech 3-rek cn contriute to this increse?
Ech 3-Brek Increses #Cycles y t Most 2 A 3-Brek: ü dds 3 new lck edges nd thus cretes t most 3 new cycles (contining three new lck edges) ü removes 3 lck edges nd thus destroys t lest 1 old cycle (contining three old edges): chnge in the numer of cycles: Δcycle 3-1=2.
3-Brek Distnce ü Any 3-rek increses the numer of cycles y t most TWO (Δcycles 2) ü Any non-trivil cycle cn e split into three cycles with 3-rek (Δcycles = 2) ü Every sorting y 3-reks must increse the numer of cycles y #locks - cycles(p,q) ü The 3-rek distnce etween genomes P nd Q: d 3 (P,Q) = (#locks - cycles(p,q))/2
3-Brek Distnce ü Any 3-rek increses the numer of cycles y t most TWO (Δcycles 2) ü Any non-trivil cycle cn e split into three cycles with 3-rek (Δcycles = 2) WRONG STATEMENT ü Every sorting y 3-reks must increse the numer of cycles y #locks - cycles(p,q) ü The 3-rek distnce etween genomes P nd Q: d 3 (P,Q) = (#locks - cycles(p,q))/2
3-Brek Distnce: Focus on Odd Cycles ü A 3-rek cn increse the numer of odd cycles (i.e., cycles with odd numer of lck edges) y t most 2 (Δcycles odd 2) ü A non-trivil odd cycle cn e split into three odd cycles with 3-rek (Δcycles odd = 2) ü An even cycle cn e split into two odd cycles with 3- rek (Δcycles odd =2) ü The 3-Brek Distnce etween genomes P ndq is: d 3 (P,Q) = ( #locks cycles odd (P,Q) ) / 2
Multi-Brek Rerrngements ü k-brek rerrngement opertion mkes k reks in genome nd glues the resulting pieces in new order. ü Rerrngements re rre evolutionry events nd iologists elieve tht k-rek rerrngements re unlikely for k>3 nd reltively rre for k=3 (t lest in the mmmlin evolution). ü Also, in rdition iology, chromosome errtions for k>2 (indictive of chromosome dmge rther thn evolutionry vile vritions) re more common, e.g., complex rerrngements in irrdited humn lymphocytes (Schs et l., 2004; Levy et l., 2004).
Polynomil Algorithm for Multi-Brek Rerrngements ü The formuls for k-rek distnce ecome complex s k grows (without n ovious pttern): The formul for d 20 (P,Q) contins over 1,500 terms! Alekseyev & PP (SODA 07, Theoreticl Computer Science 08) Exct formuls for the k-rek distnce s well s liner-time lgorithm for computing it.
Where Do We Go From Here? Ancestrl Genome Reconstruction Brekpoint Re-use Anlysis Skip Genome Hlving Prolem
Brekpoint Re-use Anlysis Algorithmic Prolem Serching for Rerrngement Hotspots (Frgile Regions) in Humn Genome
Rndom vs. Frgile Brekge Dete: Complex Rerrngements ü PP & Tesler, PNAS 2003, rgued tht every evolutionry scenrio for trnsforming Mouse into Humn genome with reversls nd trnsloctions must result in lrge numer of rekpoint re-uses, contrdiction to the RBM. ü Snkoff, PLoS CB 2006: We cnnot infer whether mutully rndomized synteny lock orderings derived from two divergent genomes were creted... through processes other thn reversls nd trnsloctions. (red: trnspositions or 3-reks)
Rndom Brekge Theory Re-Re-Re-Re-Re-Visited n Ohno, 1970, Ndeu & Tylor, 1984 introduced RBM n Pevzner & Tesler, 2003 (PNAS) rgued ginst RBM n Snkoff & Trinh, 2004 (RECOMB, JCB) rgued ginst Pevzner & Tesler, 2003 rguments ginst RBM n Peng et l., 2006 (PLOS CB) rgued ginst Snkoff & Trinh, 2004 rguments ginst Pevzner & Tesler, 2003 rguments ginst RBM n Snkoff, 2006 (PLOS CB) cknowledged n error in Snkoff&Trinh, 2004 ut cme up with new rgument ginst Pevzner nd Tesler, 2003 rguments ginst RBM n Alekseyev nd Pevzner, 2007 (PLOS CB) rgued ginst Snkoff, 2006 new rgument ginst Pevzner & Tesler, 2003 rguments ginst RBM
Rndom Brekge Theory Re-Re-Re-Re-Re-Visited n Ohno, 1970, Ndeu & Tylor, 1984 introduced RBM n Pevzner & Tesler, PNAS 2003 rgued ginst RBM n Snkoff & Trinh, JCB 2004 rgued ginst Pevzner & Tesler, 2003 rguments ginst RBM n Peng et l., PLOS CB 2006 rgued ginst Snkoff & Trinh, 2004 rguments ginst Pevzner & Tesler, 2003 rguments ginst RBM n Snkoff, PLOS CB 2006 cknowledged n error in Snkoff&Trinh, 2004 ut cme up with new rgument ginst Pevzner nd Tesler, 2003 rguments ginst RBM n Alekseyev nd Pevzner, PLOS CB 2007 rgued ginst Snkoff, 2006 new rgument ginst Pevzner & Tesler, 2003 rguments ginst RBM
Are There Rerrngement Hotspots in Humn Genome?
Are There Rerrngement Hotspots Theorem. Yes in Humn Genome?
Are There Rerrngement Hotspots in Theorem. Yes Humn Genome? Proof: The formul for the 3-rek distnce implies tht there were t lest d 3 (Humn,Mouse) = 139 rerrngements etween humn nd mouse (including trnspositions)
Are There Rerrngement Hotspots in Theorem. Yes Humn Genome? Proof: The formul for 3-rek distnce implies tht there were t lest d 3 (Humn,Mouse) = 139 rerrngements etween humn nd mouse (including trnspositions) Every 3-rek cretes up to 3 rekpoints
Are There Rerrngement Hotspots in Theorem. Yes Humn Genome? Proof: The formul for 3-rek distnce implies tht there were t lest d 3 (Humn,Mouse) = 139 rerrngements etween humn nd mouse (including trnspositions) Every 3-rek cretes up to 3 rekpoints If there were no rekpoint re-use then fter 139 rerrngements we my see 139*3=417 rekpoints
Are There Rerrngement Hotspots in Theorem. Yes Humn Genome? Proof: The formul for 3-rek distnce implies tht there were t lest d 3 (Humn,Mouse) = 139 rerrngements etween humn nd mouse (including trnspositions) Every rerrngement cretes up to 3 rekpoints If there were no rekpoint re-use then fter 139 rerrngements we my see 139*3=417 rekpoints But there re only 281 rekpoints etween Humn nd Mouse!
Are There Rerrngement Hotspots in Theorem. Yes Humn Genome? Proof: The formul for 3-rek distnce implies tht there were t lest d 3 (Humn,Mouse) = 139 rerrngements etween humn nd mouse (including trnspositions) Every rerrngement cretes up to 3 rekpoints If there were no rekpoint re-use then fter 139 rerrngements we my see 139*3=417 rekpoints But there re only 281 rekpoints etween Humn nd Mouse Is 417 lrger thn 281?
Are There Rerrngement Hotspots in Theorem. Yes Humn Genome? Proof: The formul for 3-rek distnce implies tht there were t lest d 3 (Humn,Mouse) = 139 rerrngements etween humn nd mouse (including trnspositions) Every rerrngement cretes up to 3 rekpoints If there were no rekpoint re-use then fter 139 rerrngements we my see 139*3=417 rekpoints But there re only 281 rekpoints etween Humn nd Mouse Is 417 lrger thn 281? Yes, 417 >> 281!
Are There Rerrngement Hotspots in Theorem. Yes Humn Genome? Proof: The formul for 3-rek distnce implies tht there were t lest d 3 (Humn,Mouse) = 139 rerrngements etween humn nd mouse (including trnspositions) Every rerrngement cretes up to 3 rekpoints If there were no rekpoint re-use then fter 139 rerrngements we my see 139*3=417 rekpoints But there re only 281 rekpoints etween Humn nd Mouse Is 417 lrger thn 281? Yes, 417 >> 281!
Trnsforming Humn Genome into Mouse Genome y 3-Breks 3-reks Humn = P 0 P 1... P d = Mouse d = d 3 (Humn,Mouse) = 139 üour proof ssumed tht ech 3-rek mkes 3 rekges in genome, so the totl numer of rekges mde in this trnsformtion is 3 * 139 = 417. üoops! The trnsformtion my include 2-reks (s prticulr cse of 3-reks). If every 3-rek were 2-rek then the totl numer of rekges is only 2*139 = 278 < 281, in which cse there could e no rekpoint re-uses t ll.
d 2 (Humn,Mouse) = 246 d 3 (Humn,Mouse) = 139 minimum numer of rekges = 246 + 139 = 385 Minimizing the Numer of Brekges Prolem. Given genomes P nd Q, find series of k-reks trnsforming P into Q nd mking the smllest numer of rekges. Theorem. Any series of k-reks trnsforming P into Q mkes t lest d k (P,Q)+d 2 (P,Q) rekges Theorem. There exists series of d 3 (P,Q) 3-reks trnsforming P into Q nd mking d 3 (P,Q)+d 2 (P,Q) rekges.
Are There Rerrngement Hotspots in Theorem. Yes Humn Genome? Proof: The formul for 3-rek distnce implies tht there were t lest d 3 (Humn,Mouse) = 139 rerrngements etween humn nd mouse (including trnspositions) Every rerrngement cretes up to 3 rekpoints If there were no rekpoint re-use then fter 139 rerrngements we SHOULD SEE AT LEAST 385 rekpoints (rther thn 417 s efore) But there re only 281 rekpoints etween Humn nd Mouse Yes, 385 >> 281!
Brekpoint Re-uses etween Humn nd Mouse Genomes ü Any trnsformtion of Mouse into Humn genome with 3- reks requires t lest 385 rekges, while there re 281 rekpoints. ü So, there re t lest 385-281=104 rekpoint re-uses (re-use rte 1.37) which is significntly higher thn sttisticlly expected in the RBM. ü Men = 1.23 ü Stndrd devition = 0.02 ü Mximum rekpoint reuse rte = 1.33 (oserved once in 100,000 simultions)
Are there rerrngement hotspots in humn genome? ü We showed tht even if 3-rek rerrngements were frequent, the rgument ginst the RBM still stnds (Alekseyev & PP, PLoS Comput. Biol. 2007) We don t elieve tht 3-reks re frequent ut even if they were, we proved tht there exist frgile regions in the humn genome. It is time to nswer more difficult question: Where re the rerrngement hotspots in the humn genome? (the detiled nlysis ppered in Alekseyev nd PP, Genome Biol., Dec.1, 2010)
Where Do We Go From Here? Ancestrl Genome Reconstruction Brekpoint Re-use Anlysis Skip Genome Hlving Prolem
Ancestrl Genome Reconstruction Algorithmic Prolem: Ancestrl Genome Reconstruction nd Multiple Brekpoint Grphs
Ancestrl Genomes Reconstruction ü Given set of genomes, reconstruct genomes of their common ncestors.
Algorithms for Ancestrl Genomes Reconstruction ü GRAPPA: Tng, Moret, Wrnow et l. (2001) ü MGR: Bourque nd PP (Genome Res. 2002) ü InferCARs: M et l. (Genome Res. 2006) ü EMRAE: Zho nd Bourque (Genome Res. 2007) ü MGRA: Alekseyev nd PP (Genome. Res.2009)
Ancestrl Genomes Reconstruction Prolem (with known tree) ü Input: set of k genomes nd phylogenetic tree T ü Output: genomes t the internl nodes of the tree T tht minimize the totl sum of the 2-rek distnces long the rnches of T ü NP-complete in the simplest cse of k=3. ü Wht mkes it hrd?
Ancestrl Genomes Reconstruction Prolem (with known tree) ü Input: set of k genomes nd phylogenetic tree T ü Output: genomes t the internl nodes of the tree T ü Ojective: minimize the totl sum of the 2-rek distnces long the rnches of T ü NP-complete in the simplest cse of k=3. ü Wht mkes it hrd? BREAKPOINTS RE-USE
Brekpoints Are Footprints of Rerrngements on the Ground of Genomes ü Input: set of k genomes nd phylogenetic tree T ü Output: genomes t the internl nodes of the phylogenetic tree T ü Ojective function: the sum of genomic distnces long the rnches of T (ssuming the most prsimonious rerrngement scenrio) ü NP-complete in the simplest cse of k=3. ü Wht mkes it hrd? BREAKPOINTS RE-USES (resulting in messy footprints )! Ancestrl Genome Reconstructions of MANY Genomes (i.e., for lrge k) my e esier to solve.
How to Construct the Brekpoint Grph for Multiple Genomes? P c R d d c c Q d
Constructing Multiple Brekpoint Grph: rerrnging P in the Q order P c R d d c c c Q d d
Constructing Multiple Brekpoint Grph: rerrnging R in the Q order P c R d d c c c Q d d
Multiple Brekpoint Grph: Still Gluing Red Edges with the Sme Lels P c R d d c c Q c Multiple Brekpoint Grph G(P,Q,R) d d
Multiple Brekpoint Grph of 6 Mmmlin Genomes Multiple Brekpoint Grph G(M,R,D,Q,H,C) of the Mouse, Rt, Dog, mcque, Humn, nd Chimpnzee genomes.
Two Genomes: Two Wys of Sorting y 2-Breks Trnsforming P into Q with lck 2-reks: P = P 0 P 1... P d-1 P d = Q G(P,Q) G(P 1,Q)... G(P d,q) = G(Q,Q) Trnsforming Q into P with green 2-reks: Q = Q 0 Q 1... Q d = P G(P,Q) G(P,Q 1 )... G(P,Q d ) = G(P,P) Let's mke lck-green chimeric trnsformtion
Trnsforming G(P,Q) into (n unknown!) G(X,X) rther thn into ( known) G(Q,Q) s efore ü Let X e ny genome on pth from P to Q: P = P 0 P 1... P m = X = Q m-d... Q 1 Q 0 = Q ü Sorting y 2-reks is equivlent to finding shortest trnsformtion of G(P,Q) into n identity rekpoint grph G(X,X) of priori unknown genome X: G(P 0,Q 0 ) G(P 1,Q 0 ) G(P 1,Q 1 ) G(P 1,Q 2 )... G(X,X) ü The lck nd green 2-reks my ritrrily lternte.
ü For exmple, the groups {M, R} (Mouse nd Rt) nd {Q,H,C} (mcque, Humn, Chimpnzee) correspond to rnches of T while the groups {M, C} nd {R, D} do not. From 2 Genomes To Multiple Genomes ü We find trnsformtion of the multiple rekpoint grph G(P 1,P 2,...,P k ) into ( priori unknown!) identity multiple rekpoint grph G(X,X,...,X): G(P 1,P 2,...,P k )... G(X,X,...,X) ü The evolutionry tree T defines groups of genomes (to which the sme 2-reks my e pplied simultneously).
When All Relile 2-Breks Are Identified nd Undone ü The multiple rekpoint grph is reduced drmticlly! ü The remining (non-trivil) components cn e processed mnully in the cse-y-cse fshion.
Reconstruction Of The Ancestrl Genomes ü The resulting identity rekpoint grph G(X,X,...,X) defines its underlying genome X. ü The reverse trnsformtion is pplied to the genome X to trnsform it into ech of the originl genomes P 1, P 2,..., P k. ü Since X is pssing through ll internl nodes of T, it defines the ncestrl genomes t these nodes.
Reconstructed X Chromosomes ü The Mouse, Rt, Dog, mcque, Humn, Chimpnzee genomes nd their reconstructed ncestors:
If the Evolutionry Tree Is Unknown ü For the set of 7 mmmlin genomes (Mouse, Rt, Dog, mcque, Humn, Chimpnzee, nd Opossum), the evolutionry tree T ws suject of enduring detes ü Depending on the primte rodent crnivore split, three topologies re possile (only two of them re vile).
If The Evolutionry Tree Is Not Known ü For the set of 7 mmmlin genomes: Mouse, Rt, Dog, mcque, Humn, Chimpnzee, nd Opossum, the evolutionry tree T is eing deted. ü Depending on the primte rodent crnivore split, three topologies re possile (only two of them re vile). ü However, these three topologies shre mny common rnches in T (confident rnches). We cn restrict the trnsformtion only to such rnches in order to simplify the rekpoint grph, not reking n evidence for either of the topologies.
Rerrngement Evidence For The Primte- Crnivore Split ü Ech of these three topologies hs n unique rnch tht supports either lue (primte-rodent), or green (rodent-crnivore), or red (primtecrnivore) hypothesis. M M M O H H H D O D O D ü Rerrngement nlysis supports the primte crnivore split.
ü Rerrngement nlysis supports the primte crnivore split. Rerrngement Evidence For The Primte-Crnivore Split ü Ech of these three topologies hs n unique rnch tht supports either lue (primte-rodent), or green (rodent-crnivore), or red (primtecrnivore) hypothesis. M M M O 11 H 16 24 H H D O D O D
Rerrngement Evidence For The Primte- Crnivore Split ü Ech of these three topologies hs n unique rnch tht supports either lue (primte-rodent), or green (rodent-crnivore), or red (primtecrnivore) hypothesis. We nlyze the rerrngements supporting ech of these M rnches: M M O 11 H 16 24 H H D O D O D ü Rerrngement nlysis supports the primte
Where Do We Go From Here? Ancestrl Genome Reconstruction Brekpoint Re-use Anlysis Skip Genome Hlving Prolem
Genome Hlving Prolem Algorithmic Prolem: Genome Hlving Prolem
WGD nd Genome Hlving Prolem ü Whole Genome Dupliction (WGD) of genome R results in perfect duplicted genome R+R where ech chromosome is douled. ü The genome R+R is sujected to rerrngements tht result in duplicted genome P. ü Genome Hlving Prolem: Given duplicted genome P, find perfect duplicted genome R+R minimizing the rerrngement distnce etween R+R nd P.
Genome Hlving Prolem: Previous Results ü Algorithm for reconstructing pre-duplicted genome ws proposed in series of ppers y El- Mrouk nd Snkoff (culminting in SIAM J Comp., 2004). ü The proof of the Genome Hlving Theorem is rther technicl (spns over 30 pges).
Genome Hlving Theorem ü Found n error in the originl Genome Hlving Theorem for unichromosoml genomes nd proved theorem tht dequtely dels with ll genomes. (Alekseyev & PP, SIAM J. Comp. 2007) ü Suggested new (short) proof of the Genome Hlving Theorem - our proof is 5 pges long. (Alekseyev & PP, IEEE Bioinformtics 2007) ü Proved the Genome Hlving Theorem for the hrder cse of trnsposition-like opertions (Alekseyev & PP, SODA 2007) ü Introduced the notion of the contrcted rekpoint grph tht mkes difficult prolems (like the Genome Hlving Prolem) more trnsprent.
2-Brek Distnce Between Duplicted Genomes 2-Brek Distnce etween Duplicted Genomes:
2-Brek Distnce Between Duplicted Genomes 2-Brek Distnce etween Duplicted Genomes:. Difficulty: The notion of the rekpoint grph is not defined for duplicted genomes (the first in P cn correspond to either the first or the second in Q).
2-Brek Distnce Between Duplicted Genomes 2-Brek Distnce etween Duplicted Genomes: Ide: Estlish correspondence etween the genes in P nd Q nd re-lel the corresponding locks (e.g., s 1, 2, 1, 2 )
2-Brek Distnce Between Duplicted Genomes 2-Brek Distnce etween Duplicted Genomes 1 2 1 2 1 1 2 2 Tret the leled genomes s non-duplicted, construct the rekpoint grph nd compute the 2-rek distnce etween them.
Different Lelings Result in Different Brekpoint Grphs ü There re 2 different lelings of the copies of ech lock. ü One of these lelings (corresponding to rekpoint grph with mximum numer of cycles) is n optiml leling. ü Trying ll possile lelings tkes exponentil time: 2 #locks invoctions of the 2-rek distnce lgorithm. ü Leling Prolem. Construct n optiml leling tht results in the mximum numer of cycles in the rekpoint grph (hence, the smllest 2-rek distnce).
Constructing Brekpoint Grph of Duplicted Genomes P Q
Contrcted Genome Grphs: GLUING Red Edges P Q
Contrcted Brekpoint Grph: Gluing Red Edges Yet Agin P Contrcted Brekpoint Grph G (P,Q) Q
Contrcted Brekpoint Grph of Duplicted Genomes ü Edges of 3 colors: lck, green, nd red. ü Red edges form mtching. ü Blck edges form lck cycles. ü Green edges form green cycles.
Wht is the Reltionship Between the Brekpoint Grph nd the Contrcted Brekpoint Grph?
Every Brekpoint Grph Induces Cycle Decomposition of the Contrcted Brekpoint Grph Why do we cre?
Every Brekpoint Grph Induces Cycle Decomposition of the Contrcted Brekpoint Grph Why do we cre? Becuse optiml leling corresponds to mximum cycle decomposition.
Mximum Cycle Decomposition Prolem Open Prolem 1: Given contrcted rekpoint grph, find its mximum lck-green cycle decomposition.?
Leling Prolem Open Prolem 2: Find rekpoint grph tht induces given cycle decomposition of the contrcted rekpoint grph.?
Computing 2-Brek Distnce Between Duplicted Genomes: Two Prolems rekpoint grph inducing mx cycle decomposition mximum cycle decomposition contrcted rekpoint grph
Computing 2-Brek Distnce Between Duplicted Genomes: Two HARD Prolems rekpoint grph inducing mx cycle decomposition mximum cycle decomposition contrcted rekpoint grph These prolems re difficult nd we do not know how to solve them. However, we solved them in the cse of the Genome Hlving Prolem.
2-Brek Genome Hlving Prolem 2-Brek Genome Hlving Prolem: Given duplicted genome P, find perfect duplicted genome R+R minimizing the 2-rek distnce: d 2 (P,R+R) = #locks - cycles(p,r+r) Minimizing d 2 (P,R+R) is equivlent to finding perfect duplicted genome R+R nd leling of P nd R+R tht mximizes cycles(p,r+r).
Contrcted Brekpoint Grph of Perfect Duplicted Genome P G'(P,R+R) R+R
Contrcted Brekpoint Grph of Perfect Duplicted Genome hs Specil Structure ü Red edges form mtching. ü Blck edges form lck cycles. ü Green edges form cycles in generl cse ut now (for R+R) these cycles re formed y prllel (doule) green edges G'(P,R+R) forming mtching
Finding the Best Contrcted Brekpoint Grph for Given Genome P
Blue Prolem: Optiml Contrcted Brekpoint Grph P
Mgent Prolem: Mximum Cycle Decomposition P
Ornge Prolem: Leling P?
Solutions to?,?, nd? re too complex to e presented in this tlk (SIAM J. Comp.2007) P?
Genome Hlving Prolem: Wht is Left Behind (1) 2-Brek Genome Hlving Prolem for Multichromosoml Genomes (prtilly covered in this tlk) (2) 2-Brek Genome Hlving Prolem for Unichromosoml Genomes (more difficult) (3) A Flw in the El-Mrouk Snkoff Theorem for Unichromosoml Genomes (4) Clssifiction of Unichromosoml Genome (fixing the flw in the El-Mrouk Snkoff Theorem) (5) 3-Brek Genome Hlving Prolem
Acknowledgments Mx Alekseyev
Acknowledgments Mx Alekseyev Dvid Snkoff If you re not criticized, you my not e doing much. Donld Rumsfeld
Acknowledgments Mx Alekseyev Dvid Snkoff If you re not criticized, you my not e doing much. Donld Rumsfeld Glenn Tesler, Mth. Dept., UCSD Qin Peng