Suffix Tr
Outli fr Tday Rviw frm Lat Tim A quick rfrhr tri. Suffix Tri A impl data tructur fr trig archig. Suffix Tr A cmpact, pwrful, ad flxibl data tructur fr trig algrithm. Gralizd Suffix Tr A v mr flxibl data tructur.
Rviw frm Lat Tim
A B C D B D A E A I O A R D T N T K U G D N A E D T T E I I A O K T
Tri A tri i a tr that tr a cllcti f trig vr m alphabt Σ. Each d crrpd t a prfix f m trig i th t. Tri ar mtim calld prfix tr. If Σ = O(1), all irti, dlti, ad lkup tak tim O( w ), whr w i th trig i quti. Ca al dtrmi whthr a trig w i a prfix f m trig i th tri i tim O( w ) by walkig th tri ad rturig whthr w did't fall ff.
Ah-Craick Strig Matchig Th Ah-Craick trig matchig algrithm i a algrithm fr fidig all ccurrc f a t f trig P₁,, Pₖ iid a trig T. Rutim i O(m + + z), whr m = T, = P₁ + + Pₖ z i th umbr f match.
Ah-Craick Strig Matchig Th rutim f Ah-Craick ca b plit apart it tw pic: O() prprcig tim t build th matchr, ad O(m + z) tim t fid all match. Uful i th ca whr th pattr ar fixd, but th txt might chag.
Gmic Databa May trig algrithm th day prtai t cmputatial gmic. Typically, hav a hug databa with may vry larg trig. Mr cmm prblm: giv a fixd trig T t arch ad chagig pattr P₁,, Pₖ, fid all match i T. Quti: Ca w itad prprc T t mak it ay t arch fr variabl pattr?
Suffix Tri
Subtrig, Prfix, ad Suffix Rcall: If x i a ubtrig f w, th x i a uffix f a prfix f w. Writ w = αxω; th x i a uffix f αx. Fact: If x i a ubtrig f w, th x i a prfix f a uffix f w. Writ w = αxω; th x i a prfix f xω Thi cd fact i f u bcau tri upprt fficit prfix archig.
Suffix Tri A uffix tri f T i a tri f all th uffic f T. I tim O(), ca dtrmi whthr P₁,, Pₖ xit i T by archig fr ach i th tri.
A Typical Trafrm Typically, w appd m w charactr Σ t th d f T, th ctruct th tri fr T. Laf d crrpd t uffix. Itral d crrpd t prfix f th uffix.
Ctructig Suffix Tri Oc w build a igl uffix tri fr trig T, w ca fficitly dtct whthr pattr match i tim O(). Quti: Hw lg d it tak t ctruct a uffix tri? Prblm: Thr' a Ω(m2 ) lwr bud th wrt-ca cmplxity f ay algrithm fr buildig uffix tri.
A Dgrat Ca a b a b b a b b b b b a m b m b b b b b b b Thr Thr ar ar Θ(m) Θ(m) cpi cpi f f d d chaid chaid tgthr tgthr a a b m m.. Spac Spac uag: uag: Ω(m Ω(m 2 2 ). ).
Crrctig th Prblm Bcau uffix tri may hav Ω(m2 ) d, all uffix tri algrithm mut ru i tim Ω(m 2 ) i th wrt-ca. Ca w rduc th umbr f d i th tri?
Patricia Tri A illy d i a tri i a d that ha xactly child. A Patricia tri (r radix tri) i a tri whr all illy d ar mrgd with thir part.
Patricia Tri A illy d i a tri i a d that ha xactly child. A Patricia tri (r radix tri) i a tri whr all illy d ar mrgd with thir part. 012345678
Suffix Tr A uffix tr fr a trig T i a Patricia tri f T whr ach laf i labld with th idx whr th crrpdig uffix tart i T. 8 7 4 5 1 6 3 0 2 012345678
Prprti f Suffix Tr If T = m, th uffix tr ha xactly m + 1 laf d. Fr ay T ε, all itral d i th uffix tr hav at lat tw childr. Numbr f d i a uffix tr i Θ(m). 8 7 4 0 5 2 012345678 1 6 3
Suffix Tr Rprtati Suffix tr may hav Θ(m) d, but th labl th dg ca hav iz ω(1). Thi ma that a aïv rprtati f a uffix tr may tak ω(m) pac. Uful fact: Each dg i a uffix tr i labld with a ccutiv rag f charactr frm w. Trick: Rprt ach dg labl α a a pair f itgr [tart, d] rprtig whr i th trig α appar.
Suffix Tr Rprtati 7 4 0 5 2 3 6 012345678 8 1 8 8 4 4 0 0 1 8 tart d child 3 4
Buildig Suffix Tr Uig thi rprtati, uffix tr ca b ctructd uig pac Θ(m). Claim: Thr ar Θ(m)-tim algrithm fr buildig uffix tr. Th algrithm ar t trivial. W'll dicu f thm xt tim.
A Applicati: Strig Matchig
Strig Matchig Giv a uffix tr, ca arch t if a pattr P xit i tim O(). Giv a O(m + ) trig-matchig algrithm. T ca b prprcd i tim O(m) t fficitly upprt biary trig matchig quri. 8 7 4 0 5 2 012345678 1 6 3
Strig Matchig Claim: Aftr pdig O(m) tim prprcig T, ca fid all match f a trig P i tim O( + z), whr z i th umbr f match. 8 7 4 5 1 6 3 0 2 012345678
Strig Matchig Claim: Aftr pdig O(m) tim prprcig T, ca fid all match f a trig P i tim O( + z), whr z i th umbr f match. Obrvati 1: 1: Evry Evry ccurrc f f P i i T i i a prfix prfix f f m m uffix uffix f f T. T. 8 7 4 0 5 2 012345678 1 6 3
Strig Matchig Claim: Aftr pdig O(m) tim prprcig T, ca fid all match f a trig P i tim O( + z), whr z i th umbr f match. Obrvati 2: 2: Bcau Bcau th th prfix prfix i i th th am am ach ach tim tim (amly, (amly, P), P), all all th th uffix uffix will will b b i i th th am am ubtr. ubtr. 8 7 4 0 5 2 012345678 1 6 3
Strig Matchig Claim: Aftr pdig O(m) tim prprcig T, ca fid all match f a trig P i tim O( + z), whr z i th umbr f match. 8 7 4 5 1 6 3 0 2 012345678
Strig Matchig Claim: Aftr pdig O(m) tim prprcig T, ca fid all match f a trig P i tim O( + z), whr z i th umbr f match. 8 7 4 5 1 6 3 0 2 012345678
Strig Matchig Claim: Aftr pdig O(m) tim prprcig T, ca fid all match f a trig P i tim O( + z), whr z i th umbr f match. 8 7 4 5 1 6 3 0 2 012345678
Fidig All Match T fid all match f trig P, tart by archig th tr fr P. If th arch fall ff th tr, rprt match. Othrwi, lt v b th d at which th arch tp, r th dpit f th dg whr it tp if it d i th middl f a dg. D a DFS ad rprt all laf umbr fud. Th idic rprtd thi way giv back all piti at which P ccur.
Claim: Th DFS t fid all lav i th ubtr crrpdig t prfix P tak tim O(z), whr z i th umbr f match. Prf: If th DFS rprt z match, it mut hav viitd z diffrt laf d. Sic ach itral d f a uffix tr ha at lat tw childr, th ttal umbr f itral d viitd durig th DFS i at mt z 1. Durig th DFS, w d't d t actually match th charactr th dg. W jut fllw th dg, which tak tim O(1). Thrfr, th DFS viit at mt O(z) d ad dg ad pd O(1) tim pr d r dg, th ttal rutim i O(z).
Rvr Ah-Craick Giv pattr P₁, Pₖ f ttal lgth, uffix tr ca fid all match f th pattr i tim O(m + + z). Sarch fr all match f ach Pᵢ; ttal tim acr all arch i O( + z). Act a a rvr Ah-Craick: Ah-Craick prprc th pattr i tim O(), th pd O(m + z) tim pr ttd trig. Suffix tr prprc th trig i tim O(m), th pd O( + z) tim pr t f ttd pattr.
Athr Applicati: Lgt Rpatd Subtrig
Lgt Rpatd Subtrig Cidr th fllwig prblm: Giv a trig T, fid th lgt ubtrig w f T that appar i at lat tw diffrt piti. Applicati t cmputatial bilgy: mr tha half f th huma gm i frmd frm rpatd DNA quc!
Lgt Rpatd Subtrig 8 7 4 5 1 6 3 Obrvati 1: 1: If If w i i a rpatd rpatd ubtrig ubtrig f f T, T, it it mut mut b b a prfix prfix f f at at lat lat tw tw diffrt diffrt uffix. uffix. 0 2 012345678
Lgt Rpatd Subtrig 8 7 4 5 1 6 3 Obrvati 2: 2: If If w i i a rpatd rpatd ubtrig ubtrig f f T, T, it it mut mut crrpd t t a prfix prfix f f a path path t t a a itral itral d. d. 0 2 012345678
Lgt Rpatd Subtrig 8 7 4 5 1 6 3 Obrvati Obrvati 3: 3: If If w i i a lgt lgt rpatd rpatd ubtrig, ubtrig, it it crrpd crrpd t t a full full path path t t a a itral itral d. d. 0 2 012345678
Lgt Rpatd Subtrig Fr ach d v i a uffix tr, lt (v) b th trig that it crrpd t. Th trig dpth f a d v i dfid a (v), th lgth f th trig v crrpd t. Th lgt rpatd ubtrig i T ca b fud by fidig th itral d i T with th maximum trig dpth.
Lgt Rpatd Subtrig Hr' a O(m)-tim algrithm fr lvig th lgt rpatd ubtrig prblm: Build th uffix tr fr T i tim O(m). Ru a DFS vr T, trackig th trig dpth a yu g, t fid th itral d f maximum trig dpth. Rcvr th trig T crrpd t. Gd xrci: Hw might yu fid th lgt ubtrig f T that rpat at lat k tim?
Challg Prblm: Slv thi prblm i liar tim withut uig uffix tr (r uffix array).
Tim-Out Fr Aucmt!
OH Thi Wk I will b plittig my OH it tw tim lt thi wk: Mday: 3:30PM 4:45PM Tuday: 1:30PM 2:30PM Thi i a tmprary chag; rmal OH tim rum xt wk.
PS4 Gradig Th TA hav t yt fiihd gradig PS4. Q3 i tugh t grad! W'll hav it rady by Wdday. Sluti ar availabl up frt.
Fial Prjct Lgitic W'v rlad a hadut with m uggtd data tructur r tchiqu yu might wat t xplr fr th fial prjct. W rcmmd tryig t fid a grup f 2-3 ppl ad fidig m tpic that lk itrtig. W'll rla dtail abut th frmal fial prjct prpal Wdday.
Yur Quti
Hw d fuctial data tructur wrk, ad what ar m cmm? Chck Chck ut ut Chri Chri Okaaki' Okaaki' bk bk Purly Purly Fuctial Fuctial Data Data Structur Structur fr fr a a xcllt xcllt xpiti xpiti th th tpic. tpic. Sm Sm data data tructur tructur lik lik bimial bimial hap hap ad ad rd/black rd/black tr tr ar ar actually actually air air t t cd cd up up i i a purly purly fuctial fuctial ttig. ttig. Sm Sm w w tructur tructur (lik (lik kw kw bimial bimial radm radm acc acc lit) lit) d d t t b b itrducd itrducd i i plac plac f f cmm cmm tructur tructur lik lik array. array.
What' th bt way t b prpard fr th midtrm? A fw fw uggti: uggti: 1. 1. Mak Mak ur ur yu yu udrtad udrtad th th ituiti ituiti bhid bhid th th diffrt diffrt data data tructur. tructur. 2. 2. Mak Mak ur ur that that yu yu ca ca lv lv all all th th hmwrk hmwrk prblm, prblm, v v if if yu'r yu'r wrkig wrkig i i a pair. pair. 3. 3. Lk Lk vr vr th th radig radig fr fr ach ach cla cla t t gt gt a bttr bttr udrtadig udrtadig f f ach ach tpic. tpic.
Back t CS166!
Gralizd Suffix Tr
Suffix Tr fr Multipl Strig Suffix tr tr ifrmati abut a igl trig ad xprt a hug amut f tructural ifrmati abut that trig. Hwvr, may applicati rquir ifrmati abut th tructur f multipl diffrt trig.
Gralizd Suffix Tr A gralizd uffix tr fr T₁,, Tₖ i a Patricia tri f all uffix f T₁₁,, Tₖₖ. Each Tᵢ ha a uiqu d markr. Lav ar taggd with i:j, maig jth uffix f trig Tᵢ ₁ ₂ 1:8 ₁ ₂ 2:7 1:7 2:6 ₁ ₂ 1:4 2:3 f ₂ ₁ 012345678₁ f ₂ 2:1 2:2 ₁ 1:0 ₁ ₂ 1:5 2:4 ₁ ₁ f f ₂ ₁ 1:2 1:1 2:0 ff₂ 01234567₂ 1:6 ₂ 2:5 ₁ 1:3
Gralizd Suffix Tr Claim: A gralizd uffix tr fr trig T₁,, Tₖ f ttal lgth m ca b ctructd i tim Θ(m). U a tw-pha algrithm: Ctruct a uffix tr fr th igl trig T₁₁T₂₂ Tₖₖ i tim Θ(m). Thi will d up with m ivalid uffix. D a DFS vr th uffix tr ad pru th ivalid uffix. Ru i tim O(m) if implmtd itlligtly.
Applicati f Gralizd Suffix Tr
Lgt Cmm Subtrig Cidr th fllwig prblm: Giv tw trig T₁ ad T₂, fid th lgt trig w that i a ubtrig f bth T₁ ad T₂. Ca lv i tim O( T₁ T₂ ) uig dyamic prgrammig. Ca w d bttr?
Lgt Cmm Subtrig ₁ ₂ 1:8 2:7 f ₁ f ₂ ₁ 1:7 2:6 ₂ ₂ 1:5 ₁ ₂ ₂ ₁ 1:4 2:3 2:1 2:2 2:4 2:1 2:2 1:0 ₁ ₁ 1:2 1:1 f f ₂ 2:0 ₁ ₂ 1:6 2:5 ₁ 1:3 ₁ 012345678₁ ff₂ 01234567₂
Lgt Cmm Subtrig Build a gralizd uffix tr fr T₁ ad T₂ i tim O(m). Atat ach itral d i th tr with whthr that d ha at lat laf d frm ach f T₁ ad T₂. Tak tim O(m) uig DFS. Ru a DFS vr th tr t fid th markd d with th hight trig dpth. Tak tim O(m) uig DFS Ovrall tim: O(m).
Lgt Cmm Exti
Lgt Cmm Exti Giv tw trig T₁ ad T₂ ad tart piti i ad j, th lgt cmm xti f T₁ ad T₂, tartig at piti i ad j, i th lgth f th lgt trig w that appar at piti i i T₁ ad piti j i T₂. W'll dt thi valu by LCE T₁, T₂ (i, j). Typically, T₁ ad T₂ ar fixd ad multipl (i, j) quri ar pcifid. f f
Lgt Cmm Exti Obrvati: LCE T₁, T₂ (i, j) i th lgth f th lgt cmm prfix f th uffix f T₁ ad T₂ tartig at piti i ad j. Th gralizd uffix tr f T₁ ad T₂ mak it ay t qury fr th uffix ad tr ifrmati abut thir cmm prfix.
A Obrvati ₁ ₂ 1:8 ₁ ₂ 2:7 1:7 2:6 ₁ ₂ 1:4 2:3 f ₂ f ₂ 2:1 2:2 ₁ 1:0 ₁ ₂ 1:5 2:4 ₁ ₁ 1:2 1:1 f f ₂ 2:0 ₁ ₂ 1:6 2:5 ₁ 1:3 ₁ 012345678₁ ff₂ 01234567₂
A Obrvati Ntati: Lt S[i:] dt th uffix f trig S tartig at piti i. Claim: LCE T₁, T₂ (i, j) i giv by th trig labl f th LCA f T₁[i:] ad T₂[j:] i th gralizd uffix tr f T₁ ad T₂. Ad hy... d't w hav a way f cmputig th i tim O(1)?
Cmputig LCE' Giv tw trig T₁ ad T₂, ctruct a gralizd uffix tr fr T₁ ad T₂ i tim O(m). Ctruct a LCA data tructur fr th gralizd uffix tr i tim O(m). U Fichr-Hu plu a Eulr tur f th d i th tr. Ca w qury fr th d rprtig th LCE i tim O(1).
Th Ovrall Ctructi Uig a O(m)-tim DFS, atat ach d i th uffix tr with it trig dpth. T cmput LCE: Fid th lav crrpdig t T₁[i:] ad T₂[j:]. Fid thir LCA; lt it trig dpth b d. Rprt T₁[i:i + d 1] r T₂[j:j + d 1]. Ovrall, rquir O(m) prprcig tim t upprt O(1) qury tim.
A Applicati: Lgt Palidrmic Subtrig
Palidrm A palidrm i a trig that' th am frward ad backward. A palidrmic ubtrig f a trig T i a ubtrig f T that' a palidrm. Surpriigly, f grat imprtac i cmputatial bilgy. A C T G T G A C
Lgt Palidrmic Subtrig Th lgt palidrmic ubtrig prblm i th fllwig: Giv a trig T, fid th lgt ubtrig f T that i a palidrm. Hw might w lv thi prblm?
A Iitial Ida T dal with th iu f trig gig frward ad backward, tart ff by frmig T ad T R, th rvr f T. Iitial Ida: Fid th lgt cmm ubtrig f T ad T R. Ufrtuatly, thi d't wrk: T = abbccbbabccbba T R = abbccbabbccbba Lgt cmm ubtrig: abbccb
Palidrm Ctr ad Radii Fr w, lt' fcu v-lgth palidrm. A v-lgth palidrm ubtrig ww R f a trig T ha a ctr ad radiu: Ctr: Th pt btw th duplicatd ctr charactr. Radiu: Th lgth f th trig gig ut i ach dircti. Ida: Fr ach ctr, fid th largt crrpdig radiu.
Palidrm Ctr ad Radii a b b a c c a b c c b
Palidrm Ctr ad Radii a b b a c c a b c c b b c c b a c c a b b a
A Algrithm I tim O(m), ctruct TR. Prprc T ad T R i tim O(m) t upprt LCE quri. Fr ach pt btw tw charactr i T, fid th lgt palidrm ctrd at that lcati by xcutig LCE quri th crrpdig lcati i T ad T R. Each qury tak tim O(1) if it jut rprt th lgth. Ttal tim: O(m). Rprt th lgt trig fud thi way. Ttal tim: O(m).
Suffix Tr: Th Catch
Spac Uag Suffix tr ar mmry hg. Supp Σ = {A, C, G, T, }. Each itral d d 15 machi wrd: fr ach charactr, wrd fr th tart/d idx ad a child pitr. Thi i till O(m), but it' a hug hidd ctat.
Cmbatig Spac Uag I 1990, Udi Mabr ad G Myr itrducd th uffix array a a pac-fficit altrativ t uffix tr. Rquir wrd pr charactr; typically, a xtra wrd i trd a wll (dtail Wdday) Ca't upprt all prati prmittd by uffix tr, but ha much bttr prfrmac. Curiu? Dtail ar xt tim!
Nxt Tim Suffix Array A pac-fficit altrativ t uffix tr. LCP Array A uful auxiliary data tructur fr pdig up uffix array. Ctructig Suffix Tr Hw arth d yu build uffix tr i tim O(m)? Ctructig Suffix Array Start by buildig uffix array i tim O(m)... Ctructig LCP Array ad addig i LCP array i tim O(m).