TASM: Top-k Approximate Subtree Matching
|
|
- Aldous Dalton
- 6 years ago
- Views:
Transcription
1 TASM: Top-k Approximate Subtree Matching Nikolaus Augsten 1 Denilson Barbosa 2 Michael Böhlen 3 Themis Palpanas 4 1 Free University of Bozen-Bolzano, Italy augsten@inf.unibz.it 2 University of Alberta, Canada denilson@cs.ualberta.ca 3 University of Zurich, Switzerland boehlen@ifi.uzh.ch 4 University of Trento, Italy themis@disi.unitn.eu ICDE 2010, March 3 Long Beach, CA, USA Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
2 Outline 1 Motivation and Problem Definition 2 TASM-Postorder Upper Bound on Subtree Size 3 Experiments 4 Conclusion and Future Work Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
3 Outline Motivation and Problem Definition 1 Motivation and Problem Definition 2 TASM-Postorder Upper Bound on Subtree Size 3 Experiments 4 Conclusion and Future Work Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
4 Motivation Query (XML fragment) authors article author author ICDE Tim John Motivation and Problem Definition booktitle top-k matches? Document (very large XML) DBLP 28M nodes, 531MB Rank the top-k matches for the article query in the DBLP document! Example Answer: k = 3 inproceedings authors author author ICDE Tim booktitle article authors author author booktitle inproceedings authors author author author ICDE John Tim John TKDE Tim John Peter (1 error) (2 errors) (3 errors) booktitle Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
5 Motivation and Problem Definition TASM: Top-k Approximate Subtree Matching Definition (TASM: Top-k Approximate Subtree Matching) Given: query tree Q, document tree T, size k of ranking Goal: Compute a top-k ranking R = (T 1,T 2,...,T k ) of all subtrees T i of document T with respect to query Q using the tree edit distance for the ranking. Subtree T i : a node and all its descendants largest subtree is document itself top-k ranking R = (T 1,T i,..., T k ) subtrees sorted by distance to query best k subtrees: T i / R ted(q, T k ) ted(q, T i ) Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
6 Motivation and Problem Definition Ranking Function: Tree Edit Distance (TED) del(authors) ren(icde) article article article authors booktitle author author ICDE Tim John author Tim author John booktitle ICDE author Tim author John Tree Edit Distance: Minimum number of node edit operations (insert, rename, delete) that transform one tree into the other. TASM computes TED between query and document subtrees booktitle TKDE Size and number of computed subtrees define TASM complexity Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
7 Motivation and Problem Definition State of the Art TASM-Dynamic: dynamic programming solution 1 computes distance to every subtree of the document use smaller subtrees to compute larger ones rank subtrees by visiting memoization table Space complexity: O(mn), m: query size, n: document size Space complexity limits application to databases in database applications n is huge (database size!) TASM-Dynamic maintains two m n matrixes in RAM > 6GB RAM for our tiny query (m = 8) on DBLP (n = ) For database size solutions dynamic programming is too expensive. State-of-the-art algorithms do not scale! 1 Zhang and Shasha 1989, Demaine et al Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
8 Motivation and Problem Definition Problem Definition Find a solution for TASM (Top-k Approximate Subtree Matching) that scales to very large documents runs in small memory ranks subtrees correctly (no heuristics!) Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
9 Outline TASM-Postorder 1 Motivation and Problem Definition 2 TASM-Postorder Upper Bound on Subtree Size 3 Experiments 4 Conclusion and Future Work Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
10 Outline TASM-Postorder Upper Bound on Subtree Size 1 Motivation and Problem Definition 2 TASM-Postorder Upper Bound on Subtree Size 3 Experiments 4 Conclusion and Future Work Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
11 TASM-Postorder Upper Bound on Subtree Size Subtree Size Upper Bound in Three Steps worst match 1. Rank first k subtrees of T in postorder: R = (T 1,T 2,...,T k ) (i) ted(q,t k ) Q + T k Q delete Q insert T k T k 2. Final ranking R = (T 1,T 2,...,T k ) (=TASM result) T k k T i s in R are better than worst match T k of R (ii) ted(q,t i ) ted(q,t k ) Q + T k 3. Size upper bound for subtree T i at least: insert missing nodes T i Q T i Q ted(q,t i ) Q T i T i ted(q,t i ) + Q 2 Q + T k 2 Q + k Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
12 TASM-Postorder Upper Bound on Subtree Size Upper Bound on Subtree Size Theorem (Upper Bound on Subtree Size) TASM needs to consider only small document subtrees of size τ or less: τ = 2 Q + k Upper bound is very powerful: independent of document size and structure! linear in query size and k Example: top-10 with example query Q = 8 on DBLP (28M nodes) with bound: max subtree size τ = = 26 without bound: maximum subtree size is 28M (whole document)! Document-independent upper bound on subtree size! Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
13 Outline TASM-Postorder 1 Motivation and Problem Definition 2 TASM-Postorder Upper Bound on Subtree Size 3 Experiments 4 Conclusion and Future Work Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
14 TASM-Postorder Document Format: Postorder Queue dblp article proceedings book auth title conf article article title John X1 VLDB auth title auth title X2 Peter X3 Mike X4 John,1 auth,2 X1,1 title,2 article,5 VLDB,1 conf,2 Peter,1 auth,2 X3,1 title,2 article,5 Mike,1 auth,2 X4,1 title,2 article,5 proc,13 X2,1 title,2 book,3 dblp,22 Postorder queue: queue of (label,size)-pairs dequeue removes leftmost element, e.g., (John, 1) no random access! Relevant and state-of-the-art for XML Parsing full subtree known only at closing tag closing tags appear in postorder Implementation is efficient and heavily used for XML streams plain XML files (e.g., SAX) XML in database (Dewey, interval encoding,...) Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
15 Candidate Subtrees TASM-Postorder Candidate subtrees are all subtrees T i of the document with T i τ AND T i is not contained in a larger subtree T j τ Pruning: find candidate subtrees Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
16 TASM-Postorder Simple Pruning Approach article 5 dblp 22 proceedings 18 book 21 auth 2 John 1 title 4 X1 3 conf 7 VLDB 6 article 12 auth 9 Peter 8 title 11 X3 10 article 17 auth 14 Mike 13 title 16 X4 15 title 20 X2 19 Simple pruning approach: (τ = 6 in example above) add nodes to memory buffer until non-candidate ( T i > τ) is added subtrees of non-candidate with T i τ are candidate subtrees Problem: memory buffer can grow very large! must keep subtrees in memory until non-candidate ancestor is read worst case: memory buffer stores O(n) nodes (frequent in data-centric XML!) Example: DBLP, τ = 50 99% of nodes are still in buffer when root node is read! Simple pruning not feasible for large documents! Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
17 TASM-Postorder Efficient Pruning is Tricky! Problem: when can we remove a node from the buffer? when we see T i τ, we don t yet know about parent (postorder!) subtree of parent might be smaller than τ! Our Solution does not wait for parent prefix ring buffer: fixed size buffer pruning rule: prune based on following nodes Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
18 TASM-Postorder Pruning in Small Memory prefix ring buffer (τ = 6) VLDB,1 John,1 auth,2 X1,1 title,4 article,5 e s Prefix ring buffer of size τ + 1 (main memory) stores prefix (τ nodes in postorder) of the document two operations append new node remove leftmost subtree/node Pruning rule: If leftmost node in full ring buffer is leaf: leftmost subtree is candidate subtree non-leaf: leftmost node is non-candidate node Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
19 TASM-Postorder Pruning Rule Intuition Candidate subtree: leftmost node is a leaf T i : leftmost subtree, starts with leftmost node T j : smallest subtree that contains T i due to postorder: T j contains all nodes in buffer since T i τ and T j > τ: T i is a candidate Non-candidate node: leftmost node is a non-leaf leftmost non-leaf is parent of previously removed nodes we remove either candidate subtrees and non-candidate nodes in both cases: parent is a non-candidate Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
20 TASM-Postorder Example article auth title conf John X1 τ = 6 dblp proceedings article VLDB auth title auth title X2 Peter X3 article title Mike X4 append book 1. fill ring buffer 2. check leftmost node leaf: candidate subtree to result non-leaf: non-candidate remove 3. until queue and buffer empty postorder queue (input) article,5 Mike,1 auth,2 prefix ring buffer (main memory) Peter,1 auth,2 X3,1 title,2 e VLDB,1 s conf,2 candidate subtrees: (output) article auth title John X1 Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
21 TASM-Postorder TASM-Postorder TASM-postorder 1. empty ranking R, tightening upper bound τ = τ 2. for each candidate subtree T i a. if R = k: update τ = min(τ, max(r) + Q ) b. compute tree edit distance for all subtrees of T i within τ c. update ranking R Theorem (TASM-Postorder) The space complexity of TASM-postorder is independent of the document size: O(m 2 + mk) (m: query size, k: result size) TASM-postorder scales to very large documents! Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
22 Outline Experiments 1 Motivation and Problem Definition 2 TASM-Postorder Upper Bound on Subtree Size 3 Experiments 4 Conclusion and Future Work Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
23 Pruning Effectiveness Experiments Prefix ring buffer pruning is very effective! Maximum subtree reduced from 37M to 18 nodes. Dataset: PSD protein sequences, 37M nodes, 683MB Compute TASM ( Q = 4, k = 1) TASM-dynamic (state of the art) TASM-postorder (our solution) Histogram of computed subtrees number of subtrees 1e7 1e6 1e5 1e4 1e3 1e2 1e1 TASM-Dynamic largest subtree: 37M entire document 1e0 1e0 1e1 1e2 1e3 1e4 1e5 1e6 1e7 subtree size (nodes) number of subtrees TASM-Postorder 1e7 1e6 1e5 largest subtree: 18 1e4 1e3 1e2 1e1 1e0 1e0 1e1 1e2 1e3 1e4 1e5 1e6 1e7 subtree size (nodes) Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
24 Experiments Scalability: TASM-Postorder vs. TASM-Dynamic TASM-postorder much faster than TASM-dynamic. Dataset: XMark (synthetic XML for benchmark) Vary query size and document size Compute TASM (k = 5) 1e3 TASM-dynamic (state of the art) TASM-postorder (our solution) Measure wall clock time 1e3 time (seconds) 1e2 1e1 1e0 dyn, T:224MB dyn, T:112MB pos, T:224MB pos, T:112MB query size (nodes) time (seconds) 1e2 1e1 1e0 dyn, Q =8 dyn, Q =4 pos, Q =8 pos, Q = document size (MB) Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
25 Experiments Scalability with Result Size k TASM-postorder scales well with k. Increasing k by 4 orders of magnitude only doubles runtime. time (seconds) dyn, T:224MB dyn, T:112MB pos, T:224MB pos, T:112MB 0 1e0 1e1 1e2 1e3 1e4 k Dataset: XMark (synthetic XML for benchmark) Vary k (size of ranking) Compute TASM ( Q = 16) TASM-dynamic (state of the art) TASM-postorder (our solution) Measure wall clock time Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
26 Experiments Space complexity: TASM-Postorder vs. TASM-Dynamic TASM-postorder: space independent of document! 4e3 1e3 3GB Dataset: XMark (synthetic XML for benchmark) memory (MB) 1e2 1e1 8MB dyn, Q =16 dyn, Q =4 pos, Q =16 pos, Q =4 Vary document size Compute TASM (k = 5) TASM-dynamic (state of the art) TASM-postorder (our solution) Measure main memory usage 1e document size (MB) Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
27 Outline Conclusion and Future Work 1 Motivation and Problem Definition 2 TASM-Postorder Upper Bound on Subtree Size 3 Experiments 4 Conclusion and Future Work Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
28 Conclusion Conclusion and Future Work Conclusion Prefix Ring Buffer for space efficient pruning Dynamic programming does not scale for database size solutions. Upper bound τ: limit maximum subtree size for TASM TASM-postorder: highly scalable TASM algorithm TASM-postorder makes TASM feasible. Future Work New research opportunities: tune tree edit distance to different applications index the document: can we avoid a document scan? parallel TASM algorithm: where to split document? Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
29 Erik D. Demaine, Shay Mozes, Benjamin Rossman, and Oren Weimann. An optimal decomposition algorithm for tree edit distance. In ICALP, volume 4596 of LNCS, pages , Wroclaw, Poland, July Springer. K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM J. on Computing, 18(6): , Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE / 28
Approximation: Theory and Algorithms
Approximation: Theory and Algorithms The Tree Edit Distance (II) Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 7 April 17, 2009 Nikolaus Augsten (DIS) Approximation:
More informationAn Optimal Decomposition Algorithm for Tree Edit Distance
An Optimal Decomposition Algorithm for Tree Edit Distance ERIK D. DEMAINE Massachusetts Institute of Technology and SHAY MOZES Brown University and BENJAMIN ROSSMAN Massachusetts Institute of Technology
More informationOutline. Approximation: Theory and Algorithms. Application Scenario. 3 The q-gram Distance. Nikolaus Augsten. Definition and Properties
Outline Approximation: Theory and Algorithms Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 3 March 13, 2009 2 3 Nikolaus Augsten (DIS) Approximation: Theory and
More informationXReason: A Semantic Approach that Reasons with Patterns to Answer XML Keyword Queries
XReason: A Semantic Aroach that Reasons with Patterns to Answer XML Keyword Queries Cem Aksoy 1, Aggeliki Dimitriou 2, Dimitri Theodoratos 1, Xiaoying Wu 3 1 New Jersey Institute of Technology, Newark,
More informationApproximation: Theory and Algorithms
Approimation: Theor and Algorithms Edit Distance Compleit, Upper and Lower Bounds Nikolaus Augsten Free Universit of Bozen-Bolzano Facult of Computer Science DIS Unit 8 April, 009 Nikolaus Augsten (DIS)
More informationOptimal Color Range Reporting in One Dimension
Optimal Color Range Reporting in One Dimension Yakov Nekrich 1 and Jeffrey Scott Vitter 1 The University of Kansas. yakov.nekrich@googlemail.com, jsv@ku.edu Abstract. Color (or categorical) range reporting
More informationINF2220: algorithms and data structures Series 1
Universitetet i Oslo Institutt for Informatikk I. Yu, D. Karabeg INF2220: algorithms and data structures Series 1 Topic Function growth & estimation of running time, trees (Exercises with hints for solution)
More informationOutline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009
Outline Approximation: Theory and Algorithms The Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 1 Nikolaus Augsten (DIS) Approximation: Theory and
More informationBiased Quantiles. Flip Korn Graham Cormode S. Muthukrishnan
Biased Quantiles Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Flip Korn flip@research.att.com Divesh Srivastava divesh@research.att.com Quantiles Quantiles summarize data
More informationApproximation: Theory and Algorithms
Approximation: Theory and Algorithms The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 Nikolaus Augsten (DIS) Approximation:
More informationAdvanced Implementations of Tables: Balanced Search Trees and Hashing
Advanced Implementations of Tables: Balanced Search Trees and Hashing Balanced Search Trees Binary search tree operations such as insert, delete, retrieve, etc. depend on the length of the path to the
More informationTowards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya Yanki, Bojana Dimcheva, Donald Kossmann
Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya anki, Bojana Dimcheva, Donald Kossmann Systems Group ETH Zurich Moving Objects Intersection Finding Position at a future
More informationSkylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland
Yufei Tao ITEE University of Queensland Today we will discuss problems closely related to the topic of multi-criteria optimization, where one aims to identify objects that strike a good balance often optimal
More informationType Based XML Projection
Type Based XML Projection VLDB 2006, Seoul, Korea Véronique Benzaken Giuseppe Castagna Dario Colazzo Kim Nguy ên : Équipe Bases de Données, LRI, Université Paris-Sud 11, Orsay, France : Équipe Langage,
More information5 Spatial Access Methods
5 Spatial Access Methods 5.1 Quadtree 5.2 R-tree 5.3 K-d tree 5.4 BSP tree 3 27 A 17 5 G E 4 7 13 28 B 26 11 J 29 18 K 5.5 Grid file 9 31 8 17 24 28 5.6 Summary D C 21 22 F 15 23 H Spatial Databases and
More informationSimilarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012
Similarity Search The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 8, 2012 Nikolaus Augsten (DIS) Similarity Search Unit 2 March 8,
More informationDivide-and-Conquer Algorithms Part Two
Divide-and-Conquer Algorithms Part Two Recap from Last Time Divide-and-Conquer Algorithms A divide-and-conquer algorithm is one that works as follows: (Divide) Split the input apart into multiple smaller
More informationMulti-Dimensional Top-k Dominating Queries
Noname manuscript No. (will be inserted by the editor) Multi-Dimensional Top-k Dominating Queries Man Lung Yiu Nikos Mamoulis Abstract The top-k dominating query returns k data objects which dominate the
More informationOutline. Computer Science 331. Cost of Binary Search Tree Operations. Bounds on Height: Worst- and Average-Case
Outline Computer Science Average Case Analysis: Binary Search Trees Mike Jacobson Department of Computer Science University of Calgary Lecture #7 Motivation and Objective Definition 4 Mike Jacobson (University
More informationA Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window
A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch 1 and Srikanta Tirthapura 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY
More information5 Spatial Access Methods
5 Spatial Access Methods 5.1 Quadtree 5.2 R-tree 5.3 K-d tree 5.4 BSP tree 3 27 A 17 5 G E 4 7 13 28 B 26 11 J 29 18 K 5.5 Grid file 9 31 8 17 24 28 5.6 Summary D C 21 22 F 15 23 H Spatial Databases and
More informationWolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13 Indexes for Multimedia Data 13 Indexes for Multimedia
More information1 Approximate Quantiles and Summaries
CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity
More informationExam 1. March 12th, CS525 - Midterm Exam Solutions
Name CWID Exam 1 March 12th, 2014 CS525 - Midterm Exam s Please leave this empty! 1 2 3 4 5 Sum Things that you are not allowed to use Personal notes Textbook Printed lecture notes Phone The exam is 90
More informationEffective computation of biased quantiles over data streams
Effective computation of biased quantiles over data streams Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Flip Korn flip@research.att.com Divesh Srivastava divesh@research.att.com
More informationMultimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data
1/29/2010 13 Indexes for Multimedia Data 13 Indexes for Multimedia Data 13.1 R-Trees Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
More informationChapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn
Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn Priority Queue / Heap Stores (key,data) pairs (like dictionary) But, different set of operations: Initialize-Heap: creates new empty heap
More informationAn Efficient Algorithm for Tree Mapping in XML Databases
Journal of Computer Science 3 (7): 487-493, 2007 ISSN 1549-3636 Science Publications, 2007 Corresponding Author: An Efficient Algorithm for Tree Mapping in XML Databases Yangjun Chen University of Winnipeg,
More informationOptimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs
Optimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs Krishnendu Chatterjee Rasmus Ibsen-Jensen Andreas Pavlogiannis IST Austria Abstract. We consider graphs with n nodes together
More informationNearest Neighbor Search with Keywords in Spatial Databases
776 Nearest Neighbor Search with Keywords in Spatial Databases 1 Sphurti S. Sao, 2 Dr. Rahila Sheikh 1 M. Tech Student IV Sem, Dept of CSE, RCERT Chandrapur, MH, India 2 Head of Department, Dept of CSE,
More informationFinding Pareto Optimal Groups: Group based Skyline
Finding Pareto Optimal Groups: Group based Skyline Jinfei Liu Emory University jinfei.liu@emory.edu Jun Luo Lenovo; CAS jun.luo@siat.ac.cn Li Xiong Emory University lxiong@emory.edu Haoyu Zhang Emory University
More informationpast balancing schemes require maintenance of balance info at all times, are aggresive use work of searches to pay for work of rebalancing
1 Splay Trees Sleator and Tarjan, Self Adjusting Binary Search Trees JACM 32(3) 1985 The claim planning ahead. But matches previous idea of being lazy, letting potential build up, using it to pay for expensive
More informationImplementing Approximate Regularities
Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National
More informationFundamental Algorithms
Fundamental Algorithms Chapter 5: Searching Michael Bader Winter 2014/15 Chapter 5: Searching, Winter 2014/15 1 Searching Definition (Search Problem) Input: a sequence or set A of n elements (objects)
More informationENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October
ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October Contents 1 Cartesian Tree. Definition. 1 2 Cartesian Tree. Construction 1 3 Cartesian Tree. Operations. 2 3.1 Split............................................
More informationEvaluating Complex Queries against XML Streams with Polynomial Combined Complexity
INSTITUT FÜR INFORMATIK Lehr- und Forschungseinheit für Programmier- und Modellierungssprachen Oettingenstraße 67, D 80538 München Evaluating Complex Queries against XML Streams with Polynomial Combined
More informationCSE 421 Greedy: Huffman Codes
CSE 421 Greedy: Huffman Codes Yin Tat Lee 1 Compression Example 100k file, 6 letter alphabet: File Size: ASCII, 8 bits/char: 800kbits 2 3 > 6; 3 bits/char: 300kbits better: 2.52 bits/char 74%*2 +26%*4:
More informationarxiv: v1 [cs.ds] 27 Mar 2017
Tree Edit Distance Cannot be Computed in Strongly Subcubic Time (unless APSP can) Karl Bringmann Paweł Gawrychowski Shay Mozes Oren Weimann arxiv:1703.08940v1 [cs.ds] 27 Mar 2017 Abstract The edit distance
More informationAnalysis of tree edit distance algorithms
Analysis of tree edit distance algorithms Serge Dulucq 1 and Hélène Touzet 1 LaBRI - Universit Bordeaux I 33 0 Talence cedex, France Serge.Dulucq@labri.fr LIFL - Universit Lille 1 9 6 Villeneuve d Ascq
More informationMulti-Approximate-Keyword Routing Query
Bin Yao 1, Mingwang Tang 2, Feifei Li 2 1 Department of Computer Science and Engineering Shanghai Jiao Tong University, P. R. China 2 School of Computing University of Utah, USA Outline 1 Introduction
More informationSearch Trees. Chapter 10. CSE 2011 Prof. J. Elder Last Updated: :52 AM
Search Trees Chapter 1 < 6 2 > 1 4 = 8 9-1 - Outline Ø Binary Search Trees Ø AVL Trees Ø Splay Trees - 2 - Outline Ø Binary Search Trees Ø AVL Trees Ø Splay Trees - 3 - Binary Search Trees Ø A binary search
More informationTree Edit Distance Cannot be Computed in Strongly Subcubic Time (unless APSP can)
Tree Edit Distance Cannot be Computed in Strongly Subcubic Time (unless APSP can) Karl Bringmann Paweł Gawrychowski Shay Mozes Oren Weimann Abstract The edit distance between two rooted ordered trees with
More informationProblem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26
Binary Search Introduction Problem Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Strategy 1: Random Search Randomly select a page until the page containing
More informationSIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding
SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.
More informationA General Lower Bound on the I/O-Complexity of Comparison-based Algorithms
A General Lower ound on the I/O-Complexity of Comparison-based Algorithms Lars Arge Mikael Knudsen Kirsten Larsent Aarhus University, Computer Science Department Ny Munkegade, DK-8000 Aarhus C. August
More informationFluXQuery: An Optimizing XQuery Processor
FluXQuery: An Optimizing XQuery Processor for Streaming XML Data Report by Cătălin Hriţcu (catalin.hritcu@gmail.com) International Max Planck Research School for Computer Science 1 Introduction While for
More informationMultivalued Decision Diagrams. Postoptimality Analysis Using. J. N. Hooker. Tarik Hadzic. Cork Constraint Computation Centre
Postoptimality Analysis Using Multivalued Decision Diagrams Tarik Hadzic Cork Constraint Computation Centre J. N. Hooker Carnegie Mellon University London School of Economics June 2008 Postoptimality Analysis
More informationDictionary: an abstract data type
2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees
More informationLecture 1 : Data Compression and Entropy
CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for
More informationLower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine)
Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine) Mihai Pǎtraşcu, MIT, web.mit.edu/ mip/www/ Index terms: partial-sums problem, prefix sums, dynamic lower bounds Synonyms: dynamic trees 1
More informationMETA: An Efficient Matching-Based Method for Error-Tolerant Autocompletion
: An Efficient Matching-Based Method for Error-Tolerant Autocompletion Dong Deng Guoliang Li He Wen H. V. Jagadish Jianhua Feng Department of Computer Science, Tsinghua National Laboratory for Information
More informationProjection for Nested Word Automata Speeds up XPath Evaluation on XML Streams
Projection for Nested Word Automata Speeds up XPath Evaluation on XML Streams Tom Sebastian, Joachim Niehren To cite this version: Tom Sebastian, Joachim Niehren. Projection for Nested Word Automata Speeds
More informationSpace Efficient Algorithms for Ordered Tree Comparison
Algorithmica (2008) 51: 283 297 DOI 10.1007/s00453-007-9100-z Space Efficient Algorithms for Ordered Tree Comparison Lusheng Wang Kaizhong Zhang Received: 27 January 2006 / Accepted: 27 June 2006 / Published
More informationDynamic Structures for Top-k Queries on Uncertain Data
Dynamic Structures for Top-k Queries on Uncertain Data Jiang Chen 1 and Ke Yi 2 1 Center for Computational Learning Systems, Columbia University New York, NY 10115, USA. criver@cs.columbia.edu 2 Department
More informationCarnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #26: Spatial Databases (R&G ch. 28) SAMs - Detailed outline spatial access methods problem dfn R-trees Faloutsos 2
More informationFinding Frequent Items in Probabilistic Data
Finding Frequent Items in Probabilistic Data Qin Zhang, Hong Kong University of Science & Technology Feifei Li, Florida State University Ke Yi, Hong Kong University of Science & Technology SIGMOD 2008
More informationBinary Search Trees. Dictionary Operations: Additional operations: find(key) insert(key, value) erase(key)
Binary Search Trees Dictionary Operations: find(key) insert(key, value) erase(key) Additional operations: ascend() get(index) (indexed binary search tree) delete(index) (indexed binary search tree) Complexity
More informationHeaps and Priority Queues
Heaps and Priority Queues Motivation Situations where one has to choose the next most important from a collection. Examples: patients in an emergency room, scheduling programs in a multi-tasking OS. Need
More informationMaintaining Significant Stream Statistics over Sliding Windows
Maintaining Significant Stream Statistics over Sliding Windows L.K. Lee H.F. Ting Abstract In this paper, we introduce the Significant One Counting problem. Let ε and θ be respectively some user-specified
More informationOutline. Similarity Search. Outline. Motivation. The String Edit Distance
Outline Similarity Search The Nikolaus Augsten nikolaus.augsten@sbg.ac.at Department of Computer Sciences University of Salzburg 1 http://dbresearch.uni-salzburg.at WS 2017/2018 Version March 12, 2018
More informationData Structures and Algorithms " Search Trees!!
Data Structures and Algorithms " Search Trees!! Outline" Binary Search Trees! AVL Trees! (2,4) Trees! 2 Binary Search Trees! "! < 6 2 > 1 4 = 8 9 Ordered Dictionaries" Keys are assumed to come from a total
More informationSearch Trees. EECS 2011 Prof. J. Elder Last Updated: 24 March 2015
Search Trees < 6 2 > 1 4 = 8 9-1 - Outline Ø Binary Search Trees Ø AVL Trees Ø Splay Trees - 2 - Learning Outcomes Ø From this lecture, you should be able to: q Define the properties of a binary search
More informationLecture 2 September 4, 2014
CS 224: Advanced Algorithms Fall 2014 Prof. Jelani Nelson Lecture 2 September 4, 2014 Scribe: David Liu 1 Overview In the last lecture we introduced the word RAM model and covered veb trees to solve the
More informationFactorized Relational Databases Olteanu and Závodný, University of Oxford
November 8, 2013 Database Seminar, U Washington Factorized Relational Databases http://www.cs.ox.ac.uk/projects/fd/ Olteanu and Závodný, University of Oxford Factorized Representations of Relations Cust
More informationQuickScorer: a fast algorithm to rank documents with additive ensembles of regression trees
QuickScorer: a fast algorithm to rank documents with additive ensembles of regression trees Claudio Lucchese, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto HPC Lab, ISTI-CNR, Pisa, Italy & Tiscali
More informationK-Nearest Neighbor Temporal Aggregate Queries
Experiments and Conclusion K-Nearest Neighbor Temporal Aggregate Queries Yu Sun Jianzhong Qi Yu Zheng Rui Zhang Department of Computing and Information Systems University of Melbourne Microsoft Research,
More informationComputing the Entropy of a Stream
Computing the Entropy of a Stream To appear in SODA 2007 Graham Cormode graham@research.att.com Amit Chakrabarti Dartmouth College Andrew McGregor U. Penn / UCSD Outline Introduction Entropy Upper Bound
More information6.889 Lecture 4: Single-Source Shortest Paths
6.889 Lecture 4: Single-Source Shortest Paths Christian Sommer csom@mit.edu September 19 and 26, 2011 Single-Source Shortest Path SSSP) Problem: given a graph G = V, E) and a source vertex s V, compute
More informationScalable Algorithms for Distribution Search
Scalable Algorithms for Distribution Search Yasuko Matsubara (Kyoto University) Yasushi Sakurai (NTT Communication Science Labs) Masatoshi Yoshikawa (Kyoto University) 1 Introduction Main intuition and
More information/463 Algorithms - Fall 2013 Solution to Assignment 4
600.363/463 Algorithms - Fall 2013 Solution to Assignment 4 (30+20 points) I (10 points) This problem brings in an extra constraint on the optimal binary search tree - for every node v, the number of nodes
More information8: Splay Trees. Move-to-Front Heuristic. Splaying. Splay(12) CSE326 Spring April 16, Search for Pebbles: Move found item to front of list
: Splay Trees SE Spring 00 pril, 00 Move-to-Front Heuristic Search for ebbles: Fred Wilma ebbles ino Fred Wilma ino ebbles Move found item to front of list Frequently searched items will move to start
More informationarxiv: v2 [cs.ds] 8 Apr 2016
Optimal Dynamic Strings Paweł Gawrychowski 1, Adam Karczmarz 1, Tomasz Kociumaka 1, Jakub Łącki 2, and Piotr Sankowski 1 1 Institute of Informatics, University of Warsaw, Poland [gawry,a.karczmarz,kociumaka,sank]@mimuw.edu.pl
More informationAnalysis of Software Artifacts
Analysis of Software Artifacts System Performance I Shu-Ngai Yeung (with edits by Jeannette Wing) Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 2001 by Carnegie Mellon University
More informationRank and Select Operations on Binary Strings (1974; Elias)
Rank and Select Operations on Binary Strings (1974; Elias) Naila Rahman, University of Leicester, www.cs.le.ac.uk/ nyr1 Rajeev Raman, University of Leicester, www.cs.le.ac.uk/ rraman entry editor: Paolo
More informationQuerying. 1 o Semestre 2008/2009
Querying Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2008/2009 Outline 1 2 3 4 5 Outline 1 2 3 4 5 function sim(d j, q) = 1 W d W q W d is the document norm W q is the
More informationSpeculative Parallelism in Cilk++
Speculative Parallelism in Cilk++ Ruben Perez & Gregory Malecha MIT May 11, 2010 Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 1 / 33 Parallelizing Embarrassingly Parallel
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization
More informationIntroduction to Randomized Algorithms III
Introduction to Randomized Algorithms III Joaquim Madeira Version 0.1 November 2017 U. Aveiro, November 2017 1 Overview Probabilistic counters Counting with probability 1 / 2 Counting with probability
More informationEfficient Evaluation of Semi-Skylines
Efficient Evaluation of Semi-Skylines Markus Endres and Werner Kießling Fifth International Workshop on Ranking in Databases (2011) Outline 1. Skyline Queries and Semi-Skylines 2. The Staircube Algorithm
More informationBinary Search Trees. Motivation
Binary Search Trees Motivation Searching for a particular record in an unordered list takes O(n), too slow for large lists (databases) If the list is ordered, can use an array implementation and use binary
More informationCPT+: A Compact Model for Accurate Sequence Prediction
CPT+: A Compact Model for Accurate Sequence Prediction Ted Gueniche 1, Philippe Fournier-Viger 1, Rajeev Raman 2, Vincent S. Tseng 3 1 University of Moncton, Canada 2 University of Leicester, UK 3 National
More informationCS Data Structures and Algorithm Analysis
CS 483 - Data Structures and Algorithm Analysis Lecture VII: Chapter 6, part 2 R. Paul Wiegand George Mason University, Department of Computer Science March 22, 2006 Outline 1 Balanced Trees 2 Heaps &
More informationOnline Sorted Range Reporting and Approximating the Mode
Online Sorted Range Reporting and Approximating the Mode Mark Greve Progress Report Department of Computer Science Aarhus University Denmark January 4, 2010 Supervisor: Gerth Stølting Brodal Online Sorted
More informationSimilarity Search. The String Edit Distance. Nikolaus Augsten.
Similarity Search The String Edit Distance Nikolaus Augsten nikolaus.augsten@sbg.ac.at Dept. of Computer Sciences University of Salzburg http://dbresearch.uni-salzburg.at Version October 18, 2016 Wintersemester
More informationMaintaining Frequent Itemsets over High-Speed Data Streams
Maintaining Frequent Itemsets over High-Speed Data Streams James Cheng, Yiping Ke, and Wilfred Ng Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Kowloon,
More informationDynamic Ordered Sets with Exponential Search Trees
Dynamic Ordered Sets with Exponential Search Trees Arne Andersson Computing Science Department Information Technology, Uppsala University Box 311, SE - 751 05 Uppsala, Sweden arnea@csd.uu.se http://www.csd.uu.se/
More informationEach internal node v with d(v) children stores d 1 keys. k i 1 < key in i-th sub-tree k i, where we use k 0 = and k d =.
7.5 (a, b)-trees 7.5 (a, b)-trees Definition For b a an (a, b)-tree is a search tree with the following properties. all leaves have the same distance to the root. every internal non-root vertex v has at
More informationChapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining
Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of
More informationOn the Study of Tree Pattern Matching Algorithms and Applications
On the Study of Tree Pattern Matching Algorithms and Applications by Fei Ma B.Sc., Simon Fraser University, 2004 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of
More informationHighly-scalable branch and bound for maximum monomial agreement
Highly-scalable branch and bound for maximum monomial agreement Jonathan Eckstein (Rutgers) William Hart Cynthia A. Phillips Sandia National Laboratories Sandia National Laboratories is a multi-program
More informationComputing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome
Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression Sergio De Agostino Sapienza University di Rome Parallel Systems A parallel random access machine (PRAM)
More informationUkkonen's suffix tree construction algorithm
Ukkonen's suffix tree construction algorithm aba$ $ab aba$ 2 2 1 1 $ab a ba $ 3 $ $ab a ba $ $ $ 1 2 4 1 String Algorithms; Nov 15 2007 Motivation Yet another suffix tree construction algorithm... Why?
More informationXML Security Views. Queries, Updates, and Schema. Benoît Groz. University of Lille, Mostrare INRIA. PhD defense, October 2012
XML Security Views Queries, Updates, and Schema Benoît Groz University of Lille, Mostrare INRIA PhD defense, October 2012 Benoît Groz (Mostrare) XML Security Views PhD defense, October 2012 1 / 45 Talk
More informationTree Edit Distance Cannot be Computed in Strongly Subcubic Time (unless APSP can) Karl Bringmann, Paweł Gawrychowski, Shay Mozes, Oren Weimann
Karl Bringmann, Paweł Gawrychowsi, Shay Mozes, Oren Weimann Minimum edits to transform one tree into the other Minimum edits to transform one tree into the other rooted, ordered trees with node labels
More informationNotes on Logarithmic Lower Bounds in the Cell Probe Model
Notes on Logarithmic Lower Bounds in the Cell Probe Model Kevin Zatloukal November 10, 2010 1 Overview Paper is by Mihai Pâtraşcu and Erik Demaine. Both were at MIT at the time. (Mihai is now at AT&T Labs.)
More informationCardinality Networks: a Theoretical and Empirical Study
Constraints manuscript No. (will be inserted by the editor) Cardinality Networks: a Theoretical and Empirical Study Roberto Asín, Robert Nieuwenhuis, Albert Oliveras, Enric Rodríguez-Carbonell Received:
More informationMath Models of OR: Branch-and-Bound
Math Models of OR: Branch-and-Bound John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA November 2018 Mitchell Branch-and-Bound 1 / 15 Branch-and-Bound Outline 1 Branch-and-Bound
More informationDiversified Top k Graph Pattern Matching
Diversified Top k Graph Pattern Matching Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 3 1 University of Edinburgh 2 RCBD and SKLSDE Lab, Beihang University 3 UC Santa Barbara {wenfei@inf, x.wang 36@sms, y.wu 18@sms}.ed.ac.uk
More informationSmaller and Faster Lempel-Ziv Indices
Smaller and Faster Lempel-Ziv Indices Diego Arroyuelo and Gonzalo Navarro Dept. of Computer Science, Universidad de Chile, Chile. {darroyue,gnavarro}@dcc.uchile.cl Abstract. Given a text T[1..u] over an
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More information