TASM: Top-k Approximate Subtree Matching

Size: px
Start display at page:

Download "TASM: Top-k Approximate Subtree Matching"

Transcription

1 TASM: Top-k Approximate Subtree Matching Nikolaus Augsten 1 Denilson Barbosa 2 Michael Böhlen 3 Themis Palpanas 4 1 Free University of Bozen-Bolzano, Italy augsten@inf.unibz.it 2 University of Alberta, Canada denilson@cs.ualberta.ca 3 University of Zurich, Switzerland boehlen@ifi.uzh.ch 4 University of Trento, Italy themis@disi.unitn.eu ICDE 2010, March 3 Long Beach, CA, USA Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

2 Outline 1 Motivation and Problem Definition 2 TASM-Postorder Upper Bound on Subtree Size 3 Experiments 4 Conclusion and Future Work Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

3 Outline Motivation and Problem Definition 1 Motivation and Problem Definition 2 TASM-Postorder Upper Bound on Subtree Size 3 Experiments 4 Conclusion and Future Work Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

4 Motivation Query (XML fragment) authors article author author ICDE Tim John Motivation and Problem Definition booktitle top-k matches? Document (very large XML) DBLP 28M nodes, 531MB Rank the top-k matches for the article query in the DBLP document! Example Answer: k = 3 inproceedings authors author author ICDE Tim booktitle article authors author author booktitle inproceedings authors author author author ICDE John Tim John TKDE Tim John Peter (1 error) (2 errors) (3 errors) booktitle Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

5 Motivation and Problem Definition TASM: Top-k Approximate Subtree Matching Definition (TASM: Top-k Approximate Subtree Matching) Given: query tree Q, document tree T, size k of ranking Goal: Compute a top-k ranking R = (T 1,T 2,...,T k ) of all subtrees T i of document T with respect to query Q using the tree edit distance for the ranking. Subtree T i : a node and all its descendants largest subtree is document itself top-k ranking R = (T 1,T i,..., T k ) subtrees sorted by distance to query best k subtrees: T i / R ted(q, T k ) ted(q, T i ) Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

6 Motivation and Problem Definition Ranking Function: Tree Edit Distance (TED) del(authors) ren(icde) article article article authors booktitle author author ICDE Tim John author Tim author John booktitle ICDE author Tim author John Tree Edit Distance: Minimum number of node edit operations (insert, rename, delete) that transform one tree into the other. TASM computes TED between query and document subtrees booktitle TKDE Size and number of computed subtrees define TASM complexity Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

7 Motivation and Problem Definition State of the Art TASM-Dynamic: dynamic programming solution 1 computes distance to every subtree of the document use smaller subtrees to compute larger ones rank subtrees by visiting memoization table Space complexity: O(mn), m: query size, n: document size Space complexity limits application to databases in database applications n is huge (database size!) TASM-Dynamic maintains two m n matrixes in RAM > 6GB RAM for our tiny query (m = 8) on DBLP (n = ) For database size solutions dynamic programming is too expensive. State-of-the-art algorithms do not scale! 1 Zhang and Shasha 1989, Demaine et al Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

8 Motivation and Problem Definition Problem Definition Find a solution for TASM (Top-k Approximate Subtree Matching) that scales to very large documents runs in small memory ranks subtrees correctly (no heuristics!) Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

9 Outline TASM-Postorder 1 Motivation and Problem Definition 2 TASM-Postorder Upper Bound on Subtree Size 3 Experiments 4 Conclusion and Future Work Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

10 Outline TASM-Postorder Upper Bound on Subtree Size 1 Motivation and Problem Definition 2 TASM-Postorder Upper Bound on Subtree Size 3 Experiments 4 Conclusion and Future Work Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

11 TASM-Postorder Upper Bound on Subtree Size Subtree Size Upper Bound in Three Steps worst match 1. Rank first k subtrees of T in postorder: R = (T 1,T 2,...,T k ) (i) ted(q,t k ) Q + T k Q delete Q insert T k T k 2. Final ranking R = (T 1,T 2,...,T k ) (=TASM result) T k k T i s in R are better than worst match T k of R (ii) ted(q,t i ) ted(q,t k ) Q + T k 3. Size upper bound for subtree T i at least: insert missing nodes T i Q T i Q ted(q,t i ) Q T i T i ted(q,t i ) + Q 2 Q + T k 2 Q + k Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

12 TASM-Postorder Upper Bound on Subtree Size Upper Bound on Subtree Size Theorem (Upper Bound on Subtree Size) TASM needs to consider only small document subtrees of size τ or less: τ = 2 Q + k Upper bound is very powerful: independent of document size and structure! linear in query size and k Example: top-10 with example query Q = 8 on DBLP (28M nodes) with bound: max subtree size τ = = 26 without bound: maximum subtree size is 28M (whole document)! Document-independent upper bound on subtree size! Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

13 Outline TASM-Postorder 1 Motivation and Problem Definition 2 TASM-Postorder Upper Bound on Subtree Size 3 Experiments 4 Conclusion and Future Work Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

14 TASM-Postorder Document Format: Postorder Queue dblp article proceedings book auth title conf article article title John X1 VLDB auth title auth title X2 Peter X3 Mike X4 John,1 auth,2 X1,1 title,2 article,5 VLDB,1 conf,2 Peter,1 auth,2 X3,1 title,2 article,5 Mike,1 auth,2 X4,1 title,2 article,5 proc,13 X2,1 title,2 book,3 dblp,22 Postorder queue: queue of (label,size)-pairs dequeue removes leftmost element, e.g., (John, 1) no random access! Relevant and state-of-the-art for XML Parsing full subtree known only at closing tag closing tags appear in postorder Implementation is efficient and heavily used for XML streams plain XML files (e.g., SAX) XML in database (Dewey, interval encoding,...) Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

15 Candidate Subtrees TASM-Postorder Candidate subtrees are all subtrees T i of the document with T i τ AND T i is not contained in a larger subtree T j τ Pruning: find candidate subtrees Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

16 TASM-Postorder Simple Pruning Approach article 5 dblp 22 proceedings 18 book 21 auth 2 John 1 title 4 X1 3 conf 7 VLDB 6 article 12 auth 9 Peter 8 title 11 X3 10 article 17 auth 14 Mike 13 title 16 X4 15 title 20 X2 19 Simple pruning approach: (τ = 6 in example above) add nodes to memory buffer until non-candidate ( T i > τ) is added subtrees of non-candidate with T i τ are candidate subtrees Problem: memory buffer can grow very large! must keep subtrees in memory until non-candidate ancestor is read worst case: memory buffer stores O(n) nodes (frequent in data-centric XML!) Example: DBLP, τ = 50 99% of nodes are still in buffer when root node is read! Simple pruning not feasible for large documents! Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

17 TASM-Postorder Efficient Pruning is Tricky! Problem: when can we remove a node from the buffer? when we see T i τ, we don t yet know about parent (postorder!) subtree of parent might be smaller than τ! Our Solution does not wait for parent prefix ring buffer: fixed size buffer pruning rule: prune based on following nodes Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

18 TASM-Postorder Pruning in Small Memory prefix ring buffer (τ = 6) VLDB,1 John,1 auth,2 X1,1 title,4 article,5 e s Prefix ring buffer of size τ + 1 (main memory) stores prefix (τ nodes in postorder) of the document two operations append new node remove leftmost subtree/node Pruning rule: If leftmost node in full ring buffer is leaf: leftmost subtree is candidate subtree non-leaf: leftmost node is non-candidate node Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

19 TASM-Postorder Pruning Rule Intuition Candidate subtree: leftmost node is a leaf T i : leftmost subtree, starts with leftmost node T j : smallest subtree that contains T i due to postorder: T j contains all nodes in buffer since T i τ and T j > τ: T i is a candidate Non-candidate node: leftmost node is a non-leaf leftmost non-leaf is parent of previously removed nodes we remove either candidate subtrees and non-candidate nodes in both cases: parent is a non-candidate Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

20 TASM-Postorder Example article auth title conf John X1 τ = 6 dblp proceedings article VLDB auth title auth title X2 Peter X3 article title Mike X4 append book 1. fill ring buffer 2. check leftmost node leaf: candidate subtree to result non-leaf: non-candidate remove 3. until queue and buffer empty postorder queue (input) article,5 Mike,1 auth,2 prefix ring buffer (main memory) Peter,1 auth,2 X3,1 title,2 e VLDB,1 s conf,2 candidate subtrees: (output) article auth title John X1 Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

21 TASM-Postorder TASM-Postorder TASM-postorder 1. empty ranking R, tightening upper bound τ = τ 2. for each candidate subtree T i a. if R = k: update τ = min(τ, max(r) + Q ) b. compute tree edit distance for all subtrees of T i within τ c. update ranking R Theorem (TASM-Postorder) The space complexity of TASM-postorder is independent of the document size: O(m 2 + mk) (m: query size, k: result size) TASM-postorder scales to very large documents! Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

22 Outline Experiments 1 Motivation and Problem Definition 2 TASM-Postorder Upper Bound on Subtree Size 3 Experiments 4 Conclusion and Future Work Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

23 Pruning Effectiveness Experiments Prefix ring buffer pruning is very effective! Maximum subtree reduced from 37M to 18 nodes. Dataset: PSD protein sequences, 37M nodes, 683MB Compute TASM ( Q = 4, k = 1) TASM-dynamic (state of the art) TASM-postorder (our solution) Histogram of computed subtrees number of subtrees 1e7 1e6 1e5 1e4 1e3 1e2 1e1 TASM-Dynamic largest subtree: 37M entire document 1e0 1e0 1e1 1e2 1e3 1e4 1e5 1e6 1e7 subtree size (nodes) number of subtrees TASM-Postorder 1e7 1e6 1e5 largest subtree: 18 1e4 1e3 1e2 1e1 1e0 1e0 1e1 1e2 1e3 1e4 1e5 1e6 1e7 subtree size (nodes) Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

24 Experiments Scalability: TASM-Postorder vs. TASM-Dynamic TASM-postorder much faster than TASM-dynamic. Dataset: XMark (synthetic XML for benchmark) Vary query size and document size Compute TASM (k = 5) 1e3 TASM-dynamic (state of the art) TASM-postorder (our solution) Measure wall clock time 1e3 time (seconds) 1e2 1e1 1e0 dyn, T:224MB dyn, T:112MB pos, T:224MB pos, T:112MB query size (nodes) time (seconds) 1e2 1e1 1e0 dyn, Q =8 dyn, Q =4 pos, Q =8 pos, Q = document size (MB) Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

25 Experiments Scalability with Result Size k TASM-postorder scales well with k. Increasing k by 4 orders of magnitude only doubles runtime. time (seconds) dyn, T:224MB dyn, T:112MB pos, T:224MB pos, T:112MB 0 1e0 1e1 1e2 1e3 1e4 k Dataset: XMark (synthetic XML for benchmark) Vary k (size of ranking) Compute TASM ( Q = 16) TASM-dynamic (state of the art) TASM-postorder (our solution) Measure wall clock time Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

26 Experiments Space complexity: TASM-Postorder vs. TASM-Dynamic TASM-postorder: space independent of document! 4e3 1e3 3GB Dataset: XMark (synthetic XML for benchmark) memory (MB) 1e2 1e1 8MB dyn, Q =16 dyn, Q =4 pos, Q =16 pos, Q =4 Vary document size Compute TASM (k = 5) TASM-dynamic (state of the art) TASM-postorder (our solution) Measure main memory usage 1e document size (MB) Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

27 Outline Conclusion and Future Work 1 Motivation and Problem Definition 2 TASM-Postorder Upper Bound on Subtree Size 3 Experiments 4 Conclusion and Future Work Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

28 Conclusion Conclusion and Future Work Conclusion Prefix Ring Buffer for space efficient pruning Dynamic programming does not scale for database size solutions. Upper bound τ: limit maximum subtree size for TASM TASM-postorder: highly scalable TASM algorithm TASM-postorder makes TASM feasible. Future Work New research opportunities: tune tree edit distance to different applications index the document: can we avoid a document scan? parallel TASM algorithm: where to split document? Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

29 Erik D. Demaine, Shay Mozes, Benjamin Rossman, and Oren Weimann. An optimal decomposition algorithm for tree edit distance. In ICALP, volume 4596 of LNCS, pages , Wroclaw, Poland, July Springer. K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM J. on Computing, 18(6): , Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28

Approximation: Theory and Algorithms

Approximation: Theory and Algorithms Approximation: Theory and Algorithms The Tree Edit Distance (II) Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 7 April 17, 2009 Nikolaus Augsten (DIS) Approximation:

More information

An Optimal Decomposition Algorithm for Tree Edit Distance

An Optimal Decomposition Algorithm for Tree Edit Distance An Optimal Decomposition Algorithm for Tree Edit Distance ERIK D. DEMAINE Massachusetts Institute of Technology and SHAY MOZES Brown University and BENJAMIN ROSSMAN Massachusetts Institute of Technology

More information

Outline. Approximation: Theory and Algorithms. Application Scenario. 3 The q-gram Distance. Nikolaus Augsten. Definition and Properties

Outline. Approximation: Theory and Algorithms. Application Scenario. 3 The q-gram Distance. Nikolaus Augsten. Definition and Properties Outline Approximation: Theory and Algorithms Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 3 March 13, 2009 2 3 Nikolaus Augsten (DIS) Approximation: Theory and

More information

XReason: A Semantic Approach that Reasons with Patterns to Answer XML Keyword Queries

XReason: A Semantic Approach that Reasons with Patterns to Answer XML Keyword Queries XReason: A Semantic Aroach that Reasons with Patterns to Answer XML Keyword Queries Cem Aksoy 1, Aggeliki Dimitriou 2, Dimitri Theodoratos 1, Xiaoying Wu 3 1 New Jersey Institute of Technology, Newark,

More information

Approximation: Theory and Algorithms

Approximation: Theory and Algorithms Approimation: Theor and Algorithms Edit Distance Compleit, Upper and Lower Bounds Nikolaus Augsten Free Universit of Bozen-Bolzano Facult of Computer Science DIS Unit 8 April, 009 Nikolaus Augsten (DIS)

More information

Optimal Color Range Reporting in One Dimension

Optimal Color Range Reporting in One Dimension Optimal Color Range Reporting in One Dimension Yakov Nekrich 1 and Jeffrey Scott Vitter 1 The University of Kansas. yakov.nekrich@googlemail.com, jsv@ku.edu Abstract. Color (or categorical) range reporting

More information

INF2220: algorithms and data structures Series 1

INF2220: algorithms and data structures Series 1 Universitetet i Oslo Institutt for Informatikk I. Yu, D. Karabeg INF2220: algorithms and data structures Series 1 Topic Function growth & estimation of running time, trees (Exercises with hints for solution)

More information

Outline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009

Outline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009 Outline Approximation: Theory and Algorithms The Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 1 Nikolaus Augsten (DIS) Approximation: Theory and

More information

Biased Quantiles. Flip Korn Graham Cormode S. Muthukrishnan

Biased Quantiles. Flip Korn Graham Cormode S. Muthukrishnan Biased Quantiles Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Flip Korn flip@research.att.com Divesh Srivastava divesh@research.att.com Quantiles Quantiles summarize data

More information

Approximation: Theory and Algorithms

Approximation: Theory and Algorithms Approximation: Theory and Algorithms The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 Nikolaus Augsten (DIS) Approximation:

More information

Advanced Implementations of Tables: Balanced Search Trees and Hashing

Advanced Implementations of Tables: Balanced Search Trees and Hashing Advanced Implementations of Tables: Balanced Search Trees and Hashing Balanced Search Trees Binary search tree operations such as insert, delete, retrieve, etc. depend on the length of the path to the

More information

Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya Yanki, Bojana Dimcheva, Donald Kossmann

Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya Yanki, Bojana Dimcheva, Donald Kossmann Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya anki, Bojana Dimcheva, Donald Kossmann Systems Group ETH Zurich Moving Objects Intersection Finding Position at a future

More information

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland Yufei Tao ITEE University of Queensland Today we will discuss problems closely related to the topic of multi-criteria optimization, where one aims to identify objects that strike a good balance often optimal

More information

Type Based XML Projection

Type Based XML Projection Type Based XML Projection VLDB 2006, Seoul, Korea Véronique Benzaken Giuseppe Castagna Dario Colazzo Kim Nguy ên : Équipe Bases de Données, LRI, Université Paris-Sud 11, Orsay, France : Équipe Langage,

More information

5 Spatial Access Methods

5 Spatial Access Methods 5 Spatial Access Methods 5.1 Quadtree 5.2 R-tree 5.3 K-d tree 5.4 BSP tree 3 27 A 17 5 G E 4 7 13 28 B 26 11 J 29 18 K 5.5 Grid file 9 31 8 17 24 28 5.6 Summary D C 21 22 F 15 23 H Spatial Databases and

More information

Similarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012

Similarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012 Similarity Search The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 8, 2012 Nikolaus Augsten (DIS) Similarity Search Unit 2 March 8,

More information

Divide-and-Conquer Algorithms Part Two

Divide-and-Conquer Algorithms Part Two Divide-and-Conquer Algorithms Part Two Recap from Last Time Divide-and-Conquer Algorithms A divide-and-conquer algorithm is one that works as follows: (Divide) Split the input apart into multiple smaller

More information

Multi-Dimensional Top-k Dominating Queries

Multi-Dimensional Top-k Dominating Queries Noname manuscript No. (will be inserted by the editor) Multi-Dimensional Top-k Dominating Queries Man Lung Yiu Nikos Mamoulis Abstract The top-k dominating query returns k data objects which dominate the

More information

Outline. Computer Science 331. Cost of Binary Search Tree Operations. Bounds on Height: Worst- and Average-Case

Outline. Computer Science 331. Cost of Binary Search Tree Operations. Bounds on Height: Worst- and Average-Case Outline Computer Science Average Case Analysis: Binary Search Trees Mike Jacobson Department of Computer Science University of Calgary Lecture #7 Motivation and Objective Definition 4 Mike Jacobson (University

More information

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch 1 and Srikanta Tirthapura 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY

More information

5 Spatial Access Methods

5 Spatial Access Methods 5 Spatial Access Methods 5.1 Quadtree 5.2 R-tree 5.3 K-d tree 5.4 BSP tree 3 27 A 17 5 G E 4 7 13 28 B 26 11 J 29 18 K 5.5 Grid file 9 31 8 17 24 28 5.6 Summary D C 21 22 F 15 23 H Spatial Databases and

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13 Indexes for Multimedia Data 13 Indexes for Multimedia

More information

1 Approximate Quantiles and Summaries

1 Approximate Quantiles and Summaries CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity

More information

Exam 1. March 12th, CS525 - Midterm Exam Solutions

Exam 1. March 12th, CS525 - Midterm Exam Solutions Name CWID Exam 1 March 12th, 2014 CS525 - Midterm Exam s Please leave this empty! 1 2 3 4 5 Sum Things that you are not allowed to use Personal notes Textbook Printed lecture notes Phone The exam is 90

More information

Effective computation of biased quantiles over data streams

Effective computation of biased quantiles over data streams Effective computation of biased quantiles over data streams Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Flip Korn flip@research.att.com Divesh Srivastava divesh@research.att.com

More information

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data 1/29/2010 13 Indexes for Multimedia Data 13 Indexes for Multimedia Data 13.1 R-Trees Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

More information

Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn

Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn Priority Queue / Heap Stores (key,data) pairs (like dictionary) But, different set of operations: Initialize-Heap: creates new empty heap

More information

An Efficient Algorithm for Tree Mapping in XML Databases

An Efficient Algorithm for Tree Mapping in XML Databases Journal of Computer Science 3 (7): 487-493, 2007 ISSN 1549-3636 Science Publications, 2007 Corresponding Author: An Efficient Algorithm for Tree Mapping in XML Databases Yangjun Chen University of Winnipeg,

More information

Optimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs

Optimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs Optimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs Krishnendu Chatterjee Rasmus Ibsen-Jensen Andreas Pavlogiannis IST Austria Abstract. We consider graphs with n nodes together

More information

Nearest Neighbor Search with Keywords in Spatial Databases

Nearest Neighbor Search with Keywords in Spatial Databases 776 Nearest Neighbor Search with Keywords in Spatial Databases 1 Sphurti S. Sao, 2 Dr. Rahila Sheikh 1 M. Tech Student IV Sem, Dept of CSE, RCERT Chandrapur, MH, India 2 Head of Department, Dept of CSE,

More information

Finding Pareto Optimal Groups: Group based Skyline

Finding Pareto Optimal Groups: Group based Skyline Finding Pareto Optimal Groups: Group based Skyline Jinfei Liu Emory University jinfei.liu@emory.edu Jun Luo Lenovo; CAS jun.luo@siat.ac.cn Li Xiong Emory University lxiong@emory.edu Haoyu Zhang Emory University

More information

past balancing schemes require maintenance of balance info at all times, are aggresive use work of searches to pay for work of rebalancing

past balancing schemes require maintenance of balance info at all times, are aggresive use work of searches to pay for work of rebalancing 1 Splay Trees Sleator and Tarjan, Self Adjusting Binary Search Trees JACM 32(3) 1985 The claim planning ahead. But matches previous idea of being lazy, letting potential build up, using it to pay for expensive

More information

Implementing Approximate Regularities

Implementing Approximate Regularities Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National

More information

Fundamental Algorithms

Fundamental Algorithms Fundamental Algorithms Chapter 5: Searching Michael Bader Winter 2014/15 Chapter 5: Searching, Winter 2014/15 1 Searching Definition (Search Problem) Input: a sequence or set A of n elements (objects)

More information

ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October

ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October Contents 1 Cartesian Tree. Definition. 1 2 Cartesian Tree. Construction 1 3 Cartesian Tree. Operations. 2 3.1 Split............................................

More information

Evaluating Complex Queries against XML Streams with Polynomial Combined Complexity

Evaluating Complex Queries against XML Streams with Polynomial Combined Complexity INSTITUT FÜR INFORMATIK Lehr- und Forschungseinheit für Programmier- und Modellierungssprachen Oettingenstraße 67, D 80538 München Evaluating Complex Queries against XML Streams with Polynomial Combined

More information

CSE 421 Greedy: Huffman Codes

CSE 421 Greedy: Huffman Codes CSE 421 Greedy: Huffman Codes Yin Tat Lee 1 Compression Example 100k file, 6 letter alphabet: File Size: ASCII, 8 bits/char: 800kbits 2 3 > 6; 3 bits/char: 300kbits better: 2.52 bits/char 74%*2 +26%*4:

More information

arxiv: v1 [cs.ds] 27 Mar 2017

arxiv: v1 [cs.ds] 27 Mar 2017 Tree Edit Distance Cannot be Computed in Strongly Subcubic Time (unless APSP can) Karl Bringmann Paweł Gawrychowski Shay Mozes Oren Weimann arxiv:1703.08940v1 [cs.ds] 27 Mar 2017 Abstract The edit distance

More information

Analysis of tree edit distance algorithms

Analysis of tree edit distance algorithms Analysis of tree edit distance algorithms Serge Dulucq 1 and Hélène Touzet 1 LaBRI - Universit Bordeaux I 33 0 Talence cedex, France Serge.Dulucq@labri.fr LIFL - Universit Lille 1 9 6 Villeneuve d Ascq

More information

Multi-Approximate-Keyword Routing Query

Multi-Approximate-Keyword Routing Query Bin Yao 1, Mingwang Tang 2, Feifei Li 2 1 Department of Computer Science and Engineering Shanghai Jiao Tong University, P. R. China 2 School of Computing University of Utah, USA Outline 1 Introduction

More information

Search Trees. Chapter 10. CSE 2011 Prof. J. Elder Last Updated: :52 AM

Search Trees. Chapter 10. CSE 2011 Prof. J. Elder Last Updated: :52 AM Search Trees Chapter 1 < 6 2 > 1 4 = 8 9-1 - Outline Ø Binary Search Trees Ø AVL Trees Ø Splay Trees - 2 - Outline Ø Binary Search Trees Ø AVL Trees Ø Splay Trees - 3 - Binary Search Trees Ø A binary search

More information

Tree Edit Distance Cannot be Computed in Strongly Subcubic Time (unless APSP can)

Tree Edit Distance Cannot be Computed in Strongly Subcubic Time (unless APSP can) Tree Edit Distance Cannot be Computed in Strongly Subcubic Time (unless APSP can) Karl Bringmann Paweł Gawrychowski Shay Mozes Oren Weimann Abstract The edit distance between two rooted ordered trees with

More information

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Binary Search Introduction Problem Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Strategy 1: Random Search Randomly select a page until the page containing

More information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.

More information

A General Lower Bound on the I/O-Complexity of Comparison-based Algorithms

A General Lower Bound on the I/O-Complexity of Comparison-based Algorithms A General Lower ound on the I/O-Complexity of Comparison-based Algorithms Lars Arge Mikael Knudsen Kirsten Larsent Aarhus University, Computer Science Department Ny Munkegade, DK-8000 Aarhus C. August

More information

FluXQuery: An Optimizing XQuery Processor

FluXQuery: An Optimizing XQuery Processor FluXQuery: An Optimizing XQuery Processor for Streaming XML Data Report by Cătălin Hriţcu (catalin.hritcu@gmail.com) International Max Planck Research School for Computer Science 1 Introduction While for

More information

Multivalued Decision Diagrams. Postoptimality Analysis Using. J. N. Hooker. Tarik Hadzic. Cork Constraint Computation Centre

Multivalued Decision Diagrams. Postoptimality Analysis Using. J. N. Hooker. Tarik Hadzic. Cork Constraint Computation Centre Postoptimality Analysis Using Multivalued Decision Diagrams Tarik Hadzic Cork Constraint Computation Centre J. N. Hooker Carnegie Mellon University London School of Economics June 2008 Postoptimality Analysis

More information

Dictionary: an abstract data type

Dictionary: an abstract data type 2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees

More information

Lecture 1 : Data Compression and Entropy

Lecture 1 : Data Compression and Entropy CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for

More information

Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine)

Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine) Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine) Mihai Pǎtraşcu, MIT, web.mit.edu/ mip/www/ Index terms: partial-sums problem, prefix sums, dynamic lower bounds Synonyms: dynamic trees 1

More information

META: An Efficient Matching-Based Method for Error-Tolerant Autocompletion

META: An Efficient Matching-Based Method for Error-Tolerant Autocompletion : An Efficient Matching-Based Method for Error-Tolerant Autocompletion Dong Deng Guoliang Li He Wen H. V. Jagadish Jianhua Feng Department of Computer Science, Tsinghua National Laboratory for Information

More information

Projection for Nested Word Automata Speeds up XPath Evaluation on XML Streams

Projection for Nested Word Automata Speeds up XPath Evaluation on XML Streams Projection for Nested Word Automata Speeds up XPath Evaluation on XML Streams Tom Sebastian, Joachim Niehren To cite this version: Tom Sebastian, Joachim Niehren. Projection for Nested Word Automata Speeds

More information

Space Efficient Algorithms for Ordered Tree Comparison

Space Efficient Algorithms for Ordered Tree Comparison Algorithmica (2008) 51: 283 297 DOI 10.1007/s00453-007-9100-z Space Efficient Algorithms for Ordered Tree Comparison Lusheng Wang Kaizhong Zhang Received: 27 January 2006 / Accepted: 27 June 2006 / Published

More information

Dynamic Structures for Top-k Queries on Uncertain Data

Dynamic Structures for Top-k Queries on Uncertain Data Dynamic Structures for Top-k Queries on Uncertain Data Jiang Chen 1 and Ke Yi 2 1 Center for Computational Learning Systems, Columbia University New York, NY 10115, USA. criver@cs.columbia.edu 2 Department

More information

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #26: Spatial Databases (R&G ch. 28) SAMs - Detailed outline spatial access methods problem dfn R-trees Faloutsos 2

More information

Finding Frequent Items in Probabilistic Data

Finding Frequent Items in Probabilistic Data Finding Frequent Items in Probabilistic Data Qin Zhang, Hong Kong University of Science & Technology Feifei Li, Florida State University Ke Yi, Hong Kong University of Science & Technology SIGMOD 2008

More information

Binary Search Trees. Dictionary Operations: Additional operations: find(key) insert(key, value) erase(key)

Binary Search Trees. Dictionary Operations: Additional operations: find(key) insert(key, value) erase(key) Binary Search Trees Dictionary Operations: find(key) insert(key, value) erase(key) Additional operations: ascend() get(index) (indexed binary search tree) delete(index) (indexed binary search tree) Complexity

More information

Heaps and Priority Queues

Heaps and Priority Queues Heaps and Priority Queues Motivation Situations where one has to choose the next most important from a collection. Examples: patients in an emergency room, scheduling programs in a multi-tasking OS. Need

More information

Maintaining Significant Stream Statistics over Sliding Windows

Maintaining Significant Stream Statistics over Sliding Windows Maintaining Significant Stream Statistics over Sliding Windows L.K. Lee H.F. Ting Abstract In this paper, we introduce the Significant One Counting problem. Let ε and θ be respectively some user-specified

More information

Outline. Similarity Search. Outline. Motivation. The String Edit Distance

Outline. Similarity Search. Outline. Motivation. The String Edit Distance Outline Similarity Search The Nikolaus Augsten nikolaus.augsten@sbg.ac.at Department of Computer Sciences University of Salzburg 1 http://dbresearch.uni-salzburg.at WS 2017/2018 Version March 12, 2018

More information

Data Structures and Algorithms " Search Trees!!

Data Structures and Algorithms  Search Trees!! Data Structures and Algorithms " Search Trees!! Outline" Binary Search Trees! AVL Trees! (2,4) Trees! 2 Binary Search Trees! "! < 6 2 > 1 4 = 8 9 Ordered Dictionaries" Keys are assumed to come from a total

More information

Search Trees. EECS 2011 Prof. J. Elder Last Updated: 24 March 2015

Search Trees. EECS 2011 Prof. J. Elder Last Updated: 24 March 2015 Search Trees < 6 2 > 1 4 = 8 9-1 - Outline Ø Binary Search Trees Ø AVL Trees Ø Splay Trees - 2 - Learning Outcomes Ø From this lecture, you should be able to: q Define the properties of a binary search

More information

Lecture 2 September 4, 2014

Lecture 2 September 4, 2014 CS 224: Advanced Algorithms Fall 2014 Prof. Jelani Nelson Lecture 2 September 4, 2014 Scribe: David Liu 1 Overview In the last lecture we introduced the word RAM model and covered veb trees to solve the

More information

Factorized Relational Databases Olteanu and Závodný, University of Oxford

Factorized Relational Databases   Olteanu and Závodný, University of Oxford November 8, 2013 Database Seminar, U Washington Factorized Relational Databases http://www.cs.ox.ac.uk/projects/fd/ Olteanu and Závodný, University of Oxford Factorized Representations of Relations Cust

More information

QuickScorer: a fast algorithm to rank documents with additive ensembles of regression trees

QuickScorer: a fast algorithm to rank documents with additive ensembles of regression trees QuickScorer: a fast algorithm to rank documents with additive ensembles of regression trees Claudio Lucchese, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto HPC Lab, ISTI-CNR, Pisa, Italy & Tiscali

More information

K-Nearest Neighbor Temporal Aggregate Queries

K-Nearest Neighbor Temporal Aggregate Queries Experiments and Conclusion K-Nearest Neighbor Temporal Aggregate Queries Yu Sun Jianzhong Qi Yu Zheng Rui Zhang Department of Computing and Information Systems University of Melbourne Microsoft Research,

More information

Computing the Entropy of a Stream

Computing the Entropy of a Stream Computing the Entropy of a Stream To appear in SODA 2007 Graham Cormode graham@research.att.com Amit Chakrabarti Dartmouth College Andrew McGregor U. Penn / UCSD Outline Introduction Entropy Upper Bound

More information

6.889 Lecture 4: Single-Source Shortest Paths

6.889 Lecture 4: Single-Source Shortest Paths 6.889 Lecture 4: Single-Source Shortest Paths Christian Sommer csom@mit.edu September 19 and 26, 2011 Single-Source Shortest Path SSSP) Problem: given a graph G = V, E) and a source vertex s V, compute

More information

Scalable Algorithms for Distribution Search

Scalable Algorithms for Distribution Search Scalable Algorithms for Distribution Search Yasuko Matsubara (Kyoto University) Yasushi Sakurai (NTT Communication Science Labs) Masatoshi Yoshikawa (Kyoto University) 1 Introduction Main intuition and

More information

/463 Algorithms - Fall 2013 Solution to Assignment 4

/463 Algorithms - Fall 2013 Solution to Assignment 4 600.363/463 Algorithms - Fall 2013 Solution to Assignment 4 (30+20 points) I (10 points) This problem brings in an extra constraint on the optimal binary search tree - for every node v, the number of nodes

More information

8: Splay Trees. Move-to-Front Heuristic. Splaying. Splay(12) CSE326 Spring April 16, Search for Pebbles: Move found item to front of list

8: Splay Trees. Move-to-Front Heuristic. Splaying. Splay(12) CSE326 Spring April 16, Search for Pebbles: Move found item to front of list : Splay Trees SE Spring 00 pril, 00 Move-to-Front Heuristic Search for ebbles: Fred Wilma ebbles ino Fred Wilma ino ebbles Move found item to front of list Frequently searched items will move to start

More information

arxiv: v2 [cs.ds] 8 Apr 2016

arxiv: v2 [cs.ds] 8 Apr 2016 Optimal Dynamic Strings Paweł Gawrychowski 1, Adam Karczmarz 1, Tomasz Kociumaka 1, Jakub Łącki 2, and Piotr Sankowski 1 1 Institute of Informatics, University of Warsaw, Poland [gawry,a.karczmarz,kociumaka,sank]@mimuw.edu.pl

More information

Analysis of Software Artifacts

Analysis of Software Artifacts Analysis of Software Artifacts System Performance I Shu-Ngai Yeung (with edits by Jeannette Wing) Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 2001 by Carnegie Mellon University

More information

Rank and Select Operations on Binary Strings (1974; Elias)

Rank and Select Operations on Binary Strings (1974; Elias) Rank and Select Operations on Binary Strings (1974; Elias) Naila Rahman, University of Leicester, www.cs.le.ac.uk/ nyr1 Rajeev Raman, University of Leicester, www.cs.le.ac.uk/ rraman entry editor: Paolo

More information

Querying. 1 o Semestre 2008/2009

Querying. 1 o Semestre 2008/2009 Querying Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2008/2009 Outline 1 2 3 4 5 Outline 1 2 3 4 5 function sim(d j, q) = 1 W d W q W d is the document norm W q is the

More information

Speculative Parallelism in Cilk++

Speculative Parallelism in Cilk++ Speculative Parallelism in Cilk++ Ruben Perez & Gregory Malecha MIT May 11, 2010 Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 1 / 33 Parallelizing Embarrassingly Parallel

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization

More information

Introduction to Randomized Algorithms III

Introduction to Randomized Algorithms III Introduction to Randomized Algorithms III Joaquim Madeira Version 0.1 November 2017 U. Aveiro, November 2017 1 Overview Probabilistic counters Counting with probability 1 / 2 Counting with probability

More information

Efficient Evaluation of Semi-Skylines

Efficient Evaluation of Semi-Skylines Efficient Evaluation of Semi-Skylines Markus Endres and Werner Kießling Fifth International Workshop on Ranking in Databases (2011) Outline 1. Skyline Queries and Semi-Skylines 2. The Staircube Algorithm

More information

Binary Search Trees. Motivation

Binary Search Trees. Motivation Binary Search Trees Motivation Searching for a particular record in an unordered list takes O(n), too slow for large lists (databases) If the list is ordered, can use an array implementation and use binary

More information

CPT+: A Compact Model for Accurate Sequence Prediction

CPT+: A Compact Model for Accurate Sequence Prediction CPT+: A Compact Model for Accurate Sequence Prediction Ted Gueniche 1, Philippe Fournier-Viger 1, Rajeev Raman 2, Vincent S. Tseng 3 1 University of Moncton, Canada 2 University of Leicester, UK 3 National

More information

CS Data Structures and Algorithm Analysis

CS Data Structures and Algorithm Analysis CS 483 - Data Structures and Algorithm Analysis Lecture VII: Chapter 6, part 2 R. Paul Wiegand George Mason University, Department of Computer Science March 22, 2006 Outline 1 Balanced Trees 2 Heaps &

More information

Online Sorted Range Reporting and Approximating the Mode

Online Sorted Range Reporting and Approximating the Mode Online Sorted Range Reporting and Approximating the Mode Mark Greve Progress Report Department of Computer Science Aarhus University Denmark January 4, 2010 Supervisor: Gerth Stølting Brodal Online Sorted

More information

Similarity Search. The String Edit Distance. Nikolaus Augsten.

Similarity Search. The String Edit Distance. Nikolaus Augsten. Similarity Search The String Edit Distance Nikolaus Augsten nikolaus.augsten@sbg.ac.at Dept. of Computer Sciences University of Salzburg http://dbresearch.uni-salzburg.at Version October 18, 2016 Wintersemester

More information

Maintaining Frequent Itemsets over High-Speed Data Streams

Maintaining Frequent Itemsets over High-Speed Data Streams Maintaining Frequent Itemsets over High-Speed Data Streams James Cheng, Yiping Ke, and Wilfred Ng Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Kowloon,

More information

Dynamic Ordered Sets with Exponential Search Trees

Dynamic Ordered Sets with Exponential Search Trees Dynamic Ordered Sets with Exponential Search Trees Arne Andersson Computing Science Department Information Technology, Uppsala University Box 311, SE - 751 05 Uppsala, Sweden arnea@csd.uu.se http://www.csd.uu.se/

More information

Each internal node v with d(v) children stores d 1 keys. k i 1 < key in i-th sub-tree k i, where we use k 0 = and k d =.

Each internal node v with d(v) children stores d 1 keys. k i 1 < key in i-th sub-tree k i, where we use k 0 = and k d =. 7.5 (a, b)-trees 7.5 (a, b)-trees Definition For b a an (a, b)-tree is a search tree with the following properties. all leaves have the same distance to the root. every internal non-root vertex v has at

More information

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of

More information

On the Study of Tree Pattern Matching Algorithms and Applications

On the Study of Tree Pattern Matching Algorithms and Applications On the Study of Tree Pattern Matching Algorithms and Applications by Fei Ma B.Sc., Simon Fraser University, 2004 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of

More information

Highly-scalable branch and bound for maximum monomial agreement

Highly-scalable branch and bound for maximum monomial agreement Highly-scalable branch and bound for maximum monomial agreement Jonathan Eckstein (Rutgers) William Hart Cynthia A. Phillips Sandia National Laboratories Sandia National Laboratories is a multi-program

More information

Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome

Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression Sergio De Agostino Sapienza University di Rome Parallel Systems A parallel random access machine (PRAM)

More information

Ukkonen's suffix tree construction algorithm

Ukkonen's suffix tree construction algorithm Ukkonen's suffix tree construction algorithm aba$ $ab aba$ 2 2 1 1 $ab a ba $ 3 $ $ab a ba $ $ $ 1 2 4 1 String Algorithms; Nov 15 2007 Motivation Yet another suffix tree construction algorithm... Why?

More information

XML Security Views. Queries, Updates, and Schema. Benoît Groz. University of Lille, Mostrare INRIA. PhD defense, October 2012

XML Security Views. Queries, Updates, and Schema. Benoît Groz. University of Lille, Mostrare INRIA. PhD defense, October 2012 XML Security Views Queries, Updates, and Schema Benoît Groz University of Lille, Mostrare INRIA PhD defense, October 2012 Benoît Groz (Mostrare) XML Security Views PhD defense, October 2012 1 / 45 Talk

More information

Tree Edit Distance Cannot be Computed in Strongly Subcubic Time (unless APSP can) Karl Bringmann, Paweł Gawrychowski, Shay Mozes, Oren Weimann

Tree Edit Distance Cannot be Computed in Strongly Subcubic Time (unless APSP can) Karl Bringmann, Paweł Gawrychowski, Shay Mozes, Oren Weimann Karl Bringmann, Paweł Gawrychowsi, Shay Mozes, Oren Weimann Minimum edits to transform one tree into the other Minimum edits to transform one tree into the other rooted, ordered trees with node labels

More information

Notes on Logarithmic Lower Bounds in the Cell Probe Model

Notes on Logarithmic Lower Bounds in the Cell Probe Model Notes on Logarithmic Lower Bounds in the Cell Probe Model Kevin Zatloukal November 10, 2010 1 Overview Paper is by Mihai Pâtraşcu and Erik Demaine. Both were at MIT at the time. (Mihai is now at AT&T Labs.)

More information

Cardinality Networks: a Theoretical and Empirical Study

Cardinality Networks: a Theoretical and Empirical Study Constraints manuscript No. (will be inserted by the editor) Cardinality Networks: a Theoretical and Empirical Study Roberto Asín, Robert Nieuwenhuis, Albert Oliveras, Enric Rodríguez-Carbonell Received:

More information

Math Models of OR: Branch-and-Bound

Math Models of OR: Branch-and-Bound Math Models of OR: Branch-and-Bound John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA November 2018 Mitchell Branch-and-Bound 1 / 15 Branch-and-Bound Outline 1 Branch-and-Bound

More information

Diversified Top k Graph Pattern Matching

Diversified Top k Graph Pattern Matching Diversified Top k Graph Pattern Matching Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 3 1 University of Edinburgh 2 RCBD and SKLSDE Lab, Beihang University 3 UC Santa Barbara {wenfei@inf, x.wang 36@sms, y.wu 18@sms}.ed.ac.uk

More information

Smaller and Faster Lempel-Ziv Indices

Smaller and Faster Lempel-Ziv Indices Smaller and Faster Lempel-Ziv Indices Diego Arroyuelo and Gonzalo Navarro Dept. of Computer Science, Universidad de Chile, Chile. {darroyue,gnavarro}@dcc.uchile.cl Abstract. Given a text T[1..u] over an

More information

Sequence analysis and Genomics

Sequence analysis and Genomics Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

More information