Motivtion Dynmic FCST s Conclusions Dynmic Fully-Compressed Suffix Trees Luís M. S. Russo Gonzlo Nvrro Arlindo L. Oliveir INESC-ID/IST {lsr,ml}@lgos.inesc-id.pt Dept. of Computer Science, University of Chile gnvrro@dcc.uchile.cl 19th Annul Symposium on Comintoril Pttern Mtching Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions Outline 1 Motivtion The Prolem We Studied Previous Work nd FCST s Fully-Compressed Suffix Tree Bsics 2 Dynmic FCST s The prolem Dynmic CSA s Updting the smpling 3 Conclusions Summry Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Suffix Trees re Importnt 28 min Suffix trees re importnt for severl string prolems: pttern mtching longest common sustring super mximl repets ioinformtics pplictions etc Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Suffix Trees re Importnt 27 min Exmple (Suffix Tree for ) 0 1 2 3 4 5 6 A: 6 4 0 5 3 2 1 Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Representtion Prolems 26 min Prolem (Suffix Trees need too much spce) Pointer sed representtions require O(n log n) its. This is much lrger thn the indexed string. Stte of the rt implementtions require [8, 10]n log σ its. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Compressed Representtions 25 min Sdkne proposed wy to represent compressed suffix trees, in nh k + 6n + o(n log σ) its. Compressed Suffix Tree Tree Structure + Compressed Index Blnced prentheses representtion Nodes represented s intervls Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Compressed Representtions 25 min A dynmic representtion, y Chn et l., requires nh k + Θ(n) + o(n log σ) its nd suffers n O(log n) slowdown. Compressed Suffix Tree Tree Structure + Compressed Index Blnced prentheses representtion Nodes represented s intervls Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Compressed Representtions 25 min The Fully-Compressed suffix tree representtion requires only nh k + o(n log σ) its. The representtion uses the following scheme: Fully-Compressed Suffix Tree Tree Structure + Compressed Index Smpling LSA Nodes represented s intervls Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Compressed Representtions 25 min We present dynmic FCST s tht require only nh k + o(n log σ) its with O(log n) slowdown. Fully-Compressed Suffix Tree Tree Structure + Compressed Index Smpling LSA Nodes represented s intervls Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Node Representtion 23 min A node represented s n intervl of leves of suffix tree. Exmple Intervl [3, 6] represents node. 0 1 2 3 4 5 6 A: 6 4 0 5 3 2 1 Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Compressed Indexes 22 min Compressed indexes re compressed representtions of the leves of suffix tree. Their success relies on: Succinct structures, sed on RANK nd SELECT. Dt compression, tht represent T in O(uH k ) its. Exmples FM-index, Compressed Suffix Arrys, LZ-index, etc. Sdkne used compressed suffix rrys. We need compressed index tht supports ψ nd LF. For exmple the Alphet-Friendly FM-Index. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Suffix Tree self-similrity LCA nd SLINK 21 min Lemm When LCA(v, v ) ROOT we hve tht: SLINK(LCA(v, v )) = LCA(SLINK(v), SLINK(v )) α X α Y v Z v Y Z ψ ψ This self-similrity explins why we cn store only some nodes. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Smpling 18 min FCST s use smpling such tht in ny sequence v SLINK(v) SLINK(SLINK(v)) SLINK(SLINK(SLINK(v)))... of size δ there is t lest one smpled node. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Fundmentl lemm 17 min Lemm If SLINK r (LCA(v, v )) = ROOT, nd let d = min(δ, r + 1). Then SDEP(LCA(v, v )) = mx 0 i<d {i + SDEP(LCSA(SLINK i (v), SLINK i (v )))} Proof. SDEP(LCA(v, v )) = i + SDEP(SLINK i (LCA(v, v ))) = i + SDEP(LCA(SLINK i (v), SLINK i (v ))) i + SDEP(LCSA(SLINK i (v), SLINK i (v ))) The lst inequlity is n equlity for some i d. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Fundmentl lemm 17 min Lemm If SLINK r (LCA(v, v )) = ROOT, nd let d = min(δ, r + 1). Then SDEP(LCA(v, v )) = mx 0 i<d {i + SDEP(LCSA(SLINK i (v), SLINK i (v )))} Proof. SDEP(LCA(v, v )) = i + SDEP(SLINK i (LCA(v, v ))) = i + SDEP(LCA(SLINK i (v), SLINK i (v ))) i + SDEP(LCSA(SLINK i (v), SLINK i (v ))) The lst inequlity is n equlity for some i d. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Fundmentl lemm 17 min Lemm If SLINK r (LCA(v, v )) = ROOT, nd let d = min(δ, r + 1). Then SDEP(LCA(v, v ))? mx 0 i<d {i + SDEP(LCSA(SLINK i (v), SLINK i (v )))} Proof. SDEP(LCA(v, v )) = i + SDEP(SLINK i (LCA(v, v ))) = i + SDEP(LCA(SLINK i (v), SLINK i (v ))) i + SDEP(LCSA(SLINK i (v), SLINK i (v ))) The lst inequlity is n equlity for some i d. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Fundmentl lemm 17 min Lemm If SLINK r (LCA(v, v )) = ROOT, nd let d = min(δ, r + 1). Then SDEP(LCA(v, v ))? mx 0 i<d {i + SDEP(LCSA(SLINK i (v), SLINK i (v )))} Proof. SDEP(LCA(v, v )) = i + SDEP(SLINK i (LCA(v, v ))) = i + SDEP(LCA(SLINK i (v), SLINK i (v ))) i + SDEP(LCSA(SLINK i (v), SLINK i (v ))) The lst inequlity is n equlity for some i d. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Fundmentl lemm 17 min Lemm If SLINK r (LCA(v, v )) = ROOT, nd let d = min(δ, r + 1). Then SDEP(LCA(v, v )) mx 0 i<d {i + SDEP(LCSA(SLINK i (v), SLINK i (v )))} Proof. SDEP(LCA(v, v )) = i + SDEP(SLINK i (LCA(v, v ))) = i + SDEP(LCA(SLINK i (v), SLINK i (v ))) i + SDEP(LCSA(SLINK i (v), SLINK i (v ))) The lst inequlity is n equlity for some i d. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Fundmentl lemm 17 min Lemm If SLINK r (LCA(v, v )) = ROOT, nd let d = min(δ, r + 1). Then SDEP(LCA(v, v )) = mx 0 i<d {i + SDEP(LCSA(SLINK i (v), SLINK i (v )))} Proof. SDEP(LCA(v, v )) = i + SDEP(SLINK i (LCA(v, v ))) = i + SDEP(LCA(SLINK i (v), SLINK i (v ))) i + SDEP(LCSA(SLINK i (v), SLINK i (v ))) The lst inequlity is n equlity for some i d. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The Prolem Previous nd FCST s FCST sics Kernel Opertions 12 min With the previous lemm FCST s compute the following opertions: SDEP(v) = SDEP(LCA(v, v)) = mx 0 i<d {i + SDEP(LCSA(ψ i (v l ), ψ i (v r )))}. LCA(v, v ) = LF(v[0..i 1], LCSA(ψ i (min{v l, v l }), ψi (mx{v r, v r }))), for the i in the lemm. SLINK(v) = LCA(ψ(v l ), ψ(v r )) Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Dynmic FCST s 11 min Prolem (FCST s re sttic) How to insert or remove text T from FCST tht is indexing collection C of texts? Use Weiner s lgorithm or delete suffixes from the lrgest to the iggest. Updte the CSA nd the smpling. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Dynmic FCST s 11 min Prolem (FCST s re sttic) How to insert or remove text T from FCST tht is indexing collection C of texts? Use Weiner s lgorithm or delete suffixes from the lrgest to the iggest. Updte the CSA nd the smpling. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Dynmic FCST s 11 min Prolem (FCST s re sttic) How to insert or remove text T from FCST tht is indexing collection C of texts? Use Weiner s lgorithm or delete suffixes from the lrgest to the iggest. Updte the CSA nd the smpling. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Dynmic FCST s 10 min Use dynmic CSA s. Theorem (Mäkinen, Nvrro) A dynmic CSA over collection C cn e stored in nh k (C) + o(n log σ) its, with times t = Ψ = O(((log σ log n) 1 + 1) log n), Φ = O((log σ log n) log 2 n), nd inserting/deleting texts T in O( T (t + Ψ)). Lets tke closer look t the smpling. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Dynmic FCST s 10 min Use dynmic CSA s. Theorem (Mäkinen, Nvrro) A dynmic CSA over collection C cn e stored in nh k (C) + o(n log σ) its, with times t = Ψ = O(((log σ log n) 1 + 1) log n), Φ = O((log σ log n) log 2 n), nd inserting/deleting texts T in O( T (t + Ψ)). Lets tke closer look t the smpling. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Reverse tree 9 min How do we gurntee the smpling condition, with t most O(n/δ) nodes? We use purely conceptul reverse tree. Definition The reverse tree T R is the miniml leled tree tht, for every node v of suffix tree, contins node v R denoting the reverse string of the pth-lel of v. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Reverse tree 9 min How do we gurntee the smpling condition, with t most O(n/δ) nodes? We use purely conceptul reverse tree. Definition The reverse tree T R is the miniml leled tree tht, for every node v of suffix tree, contins node v R denoting the reverse string of the pth-lel of v. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Reverse tree 9 min How do we gurntee the smpling condition, with t most O(n/δ) nodes? We use purely conceptul reverse tree. Definition The reverse tree T R is the miniml leled tree tht, for every node v of suffix tree, contins node v R denoting the reverse string of the pth-lel of v. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Reverse tree 8 min Exmple (Suffix Tree for nd its reverse tree) Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees 6 6 2 2 1 3 4 5 0 0 1 3 4 5
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Reverse tree 8 min Exmple (Suffix Tree for nd its reverse tree) 0 0 1 2 3 4 5 6 3 1 4 5 6 2 Note tht the SLINK s correspond to moving upwrds on the reverse tree. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Reverse tree 8 min Exmple (Suffix Tree for nd its reverse tree) 0 0 1 2 3 4 5 6 3 1 4 5 6 2 We smple the nodes for which TDEP(v R ) δ/2 0 nd HEIGHT(v R ) δ/2. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Reverse tree 8 min Exmple (Suffix Tree for nd its reverse tree) 0 0 1 2 3 4 5 6 3 1 4 5 6 2 Wht hppens when nodes re inserted or deleted? Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Reverse tree 8 min Exmple (Suffix Tree for nd its reverse tree) 0 1 3 4 5 6 0 3 1 4 5 6 Only the leves of the reverse tree chnge. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Reverse tree 8 min Exmple (Suffix Tree for nd its reverse tree) 0 3 1 4 5 0 1 3 4 5 This smpling does not respect the HEIGHT(v R ) δ/2 condition. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Reverse tree 8 min Exmple (Suffix Tree for nd its reverse tree) 0 3 1 4 5 0 1 3 4 5 To insert node we do n upwrds scn nd smple nodes if necessry. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Reverse tree 8 min Exmple (Suffix Tree for nd its reverse tree) 0 1 3 4 5 6 0 3 1 4 5 6 To insert node we do n upwrds scn nd smple nodes if necessry. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Reverse tree 8 min Exmple (Suffix Tree for nd its reverse tree) 0 1 3 4 5 6 0 3 1 4 5 6 To insert node we do n upwrds scn nd smple nodes if necessry. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Reverse tree 8 min Exmple (Suffix Tree for nd its reverse tree) 0 3 1 4 5 0 1 3 4 5 To delete node we keep reference counters to gurntee tht it is sfe to unsmple node. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Reverse tree 8 min Exmple (Suffix Tree for nd its reverse tree) 0 3 1 4 5 0 1 3 4 5 To delete node we keep reference counters to gurntee tht it is sfe to unsmple node. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Other contriutions 2 min We study the prolem of chnging logn. We give new wy to compute LSA. We otin generlized rnching, tht determines v 1.v 2 for nodes v 1 nd v 2 nd cn e computed directly over CSA s in the smple time s regulr rnching. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Other contriutions 2 min We study the prolem of chnging logn. We give new wy to compute LSA. We otin generlized rnching, tht determines v 1.v 2 for nodes v 1 nd v 2 nd cn e computed directly over CSA s in the smple time s regulr rnching. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions The prolem Dynmic CSA Updting the smpling Other contriutions 2 min We study the prolem of chnging logn. We give new wy to compute LSA. We otin generlized rnching, tht determines v 1.v 2 for nodes v 1 nd v 2 nd cn e computed directly over CSA s in the smple time s regulr rnching. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions Summry Summry 1 min We presented dynmic fully-compressed suffix trees tht: occupy uh k + o(u log σ) its. supports usul opertions in resonle time. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions Summry Acknowledgments 0 min Veli Mäkinen nd Johnnes Fisher for pointing out the generlized rnching prolem. FCT grnt SFRH/BPD/34373/2006 nd project ARN, PTDC/EIA/67722/2006. Millennium Institute for Cell Dynmics nd Biotechnology, Grnt ICM P05-001-F, Midepln, Chile. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees
Motivtion Dynmic FCST s Conclusions Summry Acknowledgments 0 min Thnks for listening. Luís M. S. Russo, Gonzlo Nvrro, Arlindo L. Oliveir Dynmic Fully-Compressed Suffix Trees