Small-Space Dictionary Matching (Dissertation Proposal)

Size: px
Start display at page:

Download "Small-Space Dictionary Matching (Dissertation Proposal)"

Transcription

1 Small-Space Dictionary Matching (Dissertation Proposal) Graduate Center of CUNY 1/24/2012

2 Problem Definition Dictionary Matching Input: Dictionary D = P 1,P 2,...,P d containing d patterns. Text T of length n. Output: All positions in text at which a dictionary pattern occurs.

3 Applications Dictionary Matching Search for specific phrases in a book Scanning file for virus signatures Network intrusion detection systems Searching DNA sequence for a set of motifs

4 Small-Space Challenge: Limited storage capacity in devices. Massive proliferation of data. Goal: efficient algorithms with respect to both time and space.

5 Research Plan Thesis Goals: 1 Succinct 2D dictionary matching 2 Succinct dynamic 2D dictionary matching 3 for succinct 1D dictionary matching

6 Small-Space 1D One-Dimensional Two-Dimensional 1D single pattern matching in linear time and O(1) working space: Galil and Seiferas (1981) Crochemore and Perrin (1991) Rytter (2003) Gasieniec and Kolpakov (2004)

7 Empirical Entropy One-Dimensional Two-Dimensional How is space measured in succinct data structures and algorithms? empirical entropy Definition Empirical entropy of a string, H 0 or H k describes the minimum number of bits that are needed to encode the string within context. H 0 (S) = σ i=1 n i n log n i n

8 Empirical Entropy One-Dimensional Two-Dimensional H 0( s) = σ i= 1 ni ni log n n

9 Empirical Entropy One-Dimensional Two-Dimensional H k (S): lower bound to the compression we can achieve for each symbol using a code which depends on the k symbols preceding it. H k (S) = 1 w n S H 0 (w S ) w Σ k w S is the sequence of characters following occurrences of w in S.

10 Empirical Entropy One-Dimensional Two-Dimensional For any string S and k 0, H k (S) H k 1 (S) H 0 (S) log Σ

11 Small-Space 1D One-Dimensional Two-Dimensional 1D dictionary matching in small space : Space (bits) Search Time Reference O(l log l) O(n + occ) Aho-Corasick (1975) O(l) O((n+occ)log 2 l) Chan et al. (2007) lh k (D)+o(llogσ)+O(d logl) O(n(log ǫ l +logd)+occ) Hon et al. (2008) l(h 0 +O(1))+O(d log(l/d)) O(n+occ) Belazzougui (2010) lh k (D)+O(l) O(n+occ) Hon et al. (2010) d is the number of patterns in D. l is the total size of the dictionary.

12 Small-Space 1D Dictionary One-Dimensional Two-Dimensional Compressed AC automaton [Hon et al. (2010)]: No slowdown! Separates the three functions of the AC automaton. * goto * report * fail Encodes each function differently. Space complexity meets H k (D), kth order empirical entropy.

13 Dynamic Dictionary Matching One-Dimensional Two-Dimensional 1D dynamic dictionary matching: Preprocess Time Update Time Search Time Ref. O(llogl) O(plogl) O((n+occ)logl) AF91 O(llogσ) O(plogl) O((n+occ)logl) IS94 O(llogσ) O(p logl loglogl ) logl O((n+occ) loglogl ) AFIPS95 O(l) O(p) O(n + occ) SV96 d is the number of patterns in D. l is the total size of the dictionary. p is the size of a pattern to add / remove.

14 Dynamic Dictionary Matching One-Dimensional Two-Dimensional Previous dynamic algorithms require O(l log l) extra space. 1D dynamic dictionary matching in small space : Chan et al. (2007) Update Time: O(plog 2 l) Search Time: O((n+occ)log 2 l) Space: O(lσ) Hon et al. (2009) Update Time: O(plogσ +logl) Search Time: O(nlogl+occ) Space: meets kth order empirical entropy

15 Small-Space 2D One-Dimensional Two-Dimensional 2D linear-time single pattern matching Crochemore et al. (1995): Preprocessing: linear time within log space. Text Scanning: linear time, O(1) extra space. Use small-space 2D single pattern matching for set of patterns * requires several scans of text.

16 2D Dictionary Matching One-Dimensional Two-Dimensional Existing 2D dictionary matching algorithms: Bird (1977) / Baker (1978) Amir, Farach (1992) Giancarlo (1995) Idury, Schaffer (1995) Require working space proportional to input.

17 2D Dictionary Matching One-Dimensional Two-Dimensional Existing 2D dictionary matching algorithms: Bird (1977) / Baker (1978) * For rectangular patterns same size in one dimension. * Linearize dictionary and label text. * Use 1D dictionary matching. * Linear time and space. Amir, Farach (1992) Giancarlo (1995) Idury, Schaffer (1995)

18 2D Dictionary Matching One-Dimensional Two-Dimensional Existing 2D dictionary matching algorithms: Bird (1977) / Baker (1978) Amir, Farach (1992) * For square patterns of different sizes. * Linearization around diagonals. * Use suffix tree and AC automaton. * Linear time and space. Giancarlo (1995) Idury, Schaffer (1995)

19 2D Dictionary Matching One-Dimensional Two-Dimensional Existing 2D dictionary matching algorithms: Bird (1977) / Baker (1978) Amir, Farach (1992) Giancarlo (1995) * For square patterns of different sizes. * Uses 2D suffix tree. * Slower than linear time. * Linear space. Idury, Schaffer (1995)

20 2D Dictionary Matching One-Dimensional Two-Dimensional Existing 2D dictionary matching algorithms: Bird (1977) / Baker (1978) Amir, Farach (1992) Giancarlo (1995) Idury, Schaffer (1995) * For rectangular patterns of different sizes. * Multidemensional range searching and dictionary matching. * Slower than linear text processing. * Linear space.

21 Dynamic 2D Dictionary Matching One-Dimensional Two-Dimensional For square patterns: Update Time Search Time Ref. O(p 2 log D ) O((n 2 +occ)log D ) AFIPS (1995) O(p 2 log 2 D ) O((n 2 logd +occ)log D ) Giancarlo95 O(p 2 +plogd) O((n 2 +occ)logd) CL96 Pattern P i is a square of size p i p i. d d D = pi 2 and D = p i. i=1 i=1 P, of size p p, is added / removed.

22 Dynamic 2D Dictionary Matching One-Dimensional Two-Dimensional For rectangular patterns of different sizes: Idury, Schaffer (1995) Groups patterns by size. Multidemensional range searching and dictionary matching. O(log 4 D ) slowdown in all stages * Preprocess dictionary * Scan text * Update dictionary

23 Research Plan One-Dimensional Two-Dimensional Thesis Goals: 1 Succinct 2D dictionary matching Develop first efficient algorithm Linear Time Small-Space 2 Succinct dynamic 2D dictionary matching 3 for succinct 1D dictionary matching

24 Problem Definition One-Dimensional Two-Dimensional 2D Dictionary Matching Input: Dictionary of d patterns, each is m m in size. Text T of size n n. Output: All positions in text at which a dictionary pattern occurs.

25 2D Dictionary Matching One-Dimensional Two-Dimensional Bird / Baker Convert 2D data to 1D representation. Name patterns rows. Name text positions. Use 1D dictionary matching to find pattern occurrences.

26 2D Dictionary Matching One-Dimensional Two-Dimensional Bird / Baker Convert 2D data to 1D representation. Name patterns rows. Name text positions. Use 1D dictionary matching to find pattern occurrences. Our work: mimic Bird/Baker algorithm in small space.

27 1D Periodicity One-Dimensional Two-Dimensional Definition A string p is periodic in u if p = u k u where u is a proper prefix of u, u is primitive, and k 2. aabcaabcaabcaa

28 1D Periodicity One-Dimensional Two-Dimensional Definition A string p is periodic in u if p = u k u where u is a proper prefix of u, u is primitive, and k 2. We divide patterns into 2 groups based on 1D periodicity. In each case, different difficulties to overcome.

29 Types of Patterns One-Dimensional Two-Dimensional Case I: Patterns with ALL rows periodic, period m/4. Problem: can have more candidates than the space we allow. Case II: Patterns contain aperiodic row or row with period > m/4. Problem: several patterns can overlap in both directions.

30 Types of Patterns One-Dimensional Two-Dimensional Case I: Patterns with ALL rows periodic, period m/4. Problem: can have more candidates than the space we allow.

31 Lyndon Words One-Dimensional Two-Dimensional Definition Two words x, y are conjugate if x = uv, y = vu for some u, v. Definition A Lyndon word is a primitive string which is lexicographically smaller than any of its conjugates. Canonization computes the least conjugate of a word.

32 Naming One-Dimensional Two-Dimensional New technique for naming rows: same name if periods are conjugate

33 Naming One-Dimensional Two-Dimensional New technique for naming rows: same name if periods are conjugate How is this done in linear time, yet small space? witness tree Witness tree stores a distinction between two names. To name a new row, it is compared to only one other row.

34 Types of Patterns One-Dimensional Two-Dimensional Case II: Patterns contain aperiodic row or row with period > m/4. Problem: several patterns can overlap in both directions.

35 Types of Patterns One-Dimensional Two-Dimensional Case II: Patterns contain aperiodic row or row with period > m/4. Problem: several patterns can overlap in both directions. Many 1D names can overlap in a text block row. Identification of candidates is simpler. Identify candidates with aperiodic row of each pattern.

36 Types of Patterns One-Dimensional Two-Dimensional Case II: Patterns contain aperiodic row or row with period > m/4. Problem: several patterns can overlap in both directions. Difficulty: single pass verification. We introduce dynamic dueling. Eliminate vertically inconsistent candidates. 1 Use 1D representations in each column. 2 Use witness tree to compare row names in constant time. Then verify consistent candidates against text rows.

37 Research Plan One-Dimensional Two-Dimensional Thesis Goals: 1 Succinct 2D dictionary matching 2 Succinct dynamic 2D dictionary matching Develop dynamic algorithm Adapts to modifications of dictionary * Efficiently insert pattern * Efficiently delete pattern Linear Time Small-Space Dynamic succinct version of Bird/Baker algorithm. 3 for succinct 1D dictionary matching

38 Small-Space 1D Indexing Dictionary Matching 1D dictionary matching in small space : Space (bits) Search Time Reference O(l log l) O(n + occ) Aho-Corasick (1975) O(l) O((n+occ)log 2 l) Chan et al. (2007) lh k (D)+o(llogσ)+O(d logl) O(n(log ǫ l +logd)+occ) Hon et al. (2008) l(h 0 +O(1))+O(d log(l/d)) O(n+occ) Belazzougui (2010) lh k (D)+O(l) O(n+occ) Hon et al. (2010) d is the number of patterns in D. l is the total size of the dictionary.

39 1D Dictionary Matching Indexing Dictionary Matching For 1D data, Time-Space efficient dictionary matching has been achieved. * Only in the theoretical realm. * Implementations lag behind. * Rely on complex data structures that have not been implemented. Our goal: Efficient software for succinct dictionary matching that relies on popular succinct data structures.

40 Suffix Tree Indexing Dictionary Matching M i s s i s s i p p i $

41 Suffix Tree Indexing Dictionary Matching Applications of Suffix Tree: Single pattern matching Longest common substring of suffixes Longest repeated substring Longest palindrome

42 Suffix Array Indexing Dictionary Matching sort suffixes alphabetically

43 Suffix Tree Indexing Dictionary Matching M i s s i s s i p p i $ s 12 s 11 s 8 s 3 s 6 s 5 s 2 s 1 s10 s 9 s 7 s 4

44 Suffix Tree Indexing Dictionary Matching Dynamic Suffix Tree Choi, Lam (1997) Strings can be inserted or deleted efficiently. Update time: O(p) to insert/delete string of size p. No edge labeled by a deleted string. Additional Storage: two-way pointer for each edge.

45 Compressed Suffix Tree Indexing Dictionary Matching Compressed suffix tree (CST) Compressed self-index Replaces input data and answers queries No more space than underlying data Minor slowdown in compressed suffix array as well

46 Compressed Suffix Tree Indexing Dictionary Matching Compressed suffix tree (CST) is composed of several compressed data structures: * Compressed Suffix Array * Compressed LCP Array * Navigation between nodes

47 Compressed Suffix Tree Indexing Dictionary Matching Uncompressed Suffix Tree: O(n log n) bits of space. Compressed Suffix Tree: 1 Sadakane (2007): O(n log σ) bits, O(polylog(n)) slowdown. 2 Russo et al. (2011): kth order empirical entropy, O(log n) slowdown. 3 Fischer et al. (2009): kth order empirical entropy, sub-logarithmic slowdown. 4 Ohlebusch, Fischer, Gog (2011): some queries in O(1) time. Have all been implemented.

48 Compressed Suffix Tree Indexing Dictionary Matching Dynamic Compressed Suffix Tree 1 Chan et al. (2007): Adaptation of Sadakane (2007). O(n) bits O(log 2 n) slowdown. 2 Russo et al. (2011): nh k +o(nlogσ) bits. O(log 2 n) slowdown. Have not been implemented.

49 Suffix Tree Indexing Dictionary Matching Suffix Tree for set of patterns: 1 Concatenate patterns with unique delimiter between them. OR * Problem: indexes artificial suffixes. 2 Generalized suffix tree * Combines suffixes of each pattern in a single data structure. * Construction: Ukkonen s online suffix tree construction algorithm.

50 Suffix Links Indexing Dictionary Matching Definition A suffix link is a pointer from an internal node labeled xs to another internal node labeled S, where x is an arbitrary character and S is a possibly empty substring. Suffix links facilitate traversal of suffix tree during and after construction.

51 Suffix Links Indexing Dictionary Matching M i s s i s s i p p i $ s 12 s 11 s 8 s 3 s 6 s 5 s 2 s 1 s10 s 9 s 7 s 4

52 Indexing Dictionary Matching Algorithm for 1D dictionary matching on suffix tree: Generalized suffix tree: index of several strings Ukkonen can insert one string at a time Our algorithm: modeled after Ukkonen s suffix tree construction algorithm Online processing of text Linear time: as if inserting new pattern Skip-count trick uses suffix links

53 Indexing Dictionary Matching Algorithm for 1D dictionary matching on suffix tree: Generalized suffix tree: index of several strings Ukkonen can insert one string at a time Our algorithm: modeled after Ukkonen s suffix tree construction algorithm Online processing of text Linear time: as if inserting new pattern Skip-count trick uses suffix links Small-Space: compressed suffix tree

54 Skip-Count Trick Indexing Dictionary Matching M i s s i s s i p p i $ s 12 s 11 s 8 s 3 s 6 s 5 s 2 s 1 s10 s 9 s 7 s 4

55 Research Plan Indexing Dictionary Matching Thesis Goals: 1 Succinct 2D dictionary matching 2 Succinct dynamic 2D dictionary matching 3 for succinct 1D dictionary matching Uses compressed suffix tree Space meets entropy-compressed bounds Linear time, with slowdown to query compressed self-index

56 Evaluation Indexing Dictionary Matching Data Sets: Biological * Fly sequences: FlyBase * Flu sequences: Influenza Virus Sequence Database Security * Network intrusion detection signatures: ClamAV Virus detection * Virus signatures: Snort

57 Timeline Indexing Dictionary Matching Thesis Goals: 1 Succinct 2D dictionary matching 2 Succinct dynamic 2D dictionary matching 3 for succinct 1D dictionary matching

58 Timeline Indexing Dictionary Matching Thesis Goals: 1 Succinct 2D dictionary matching Case I: presented at CPM 2010 Case II: presented at WADS 2011 * With no slowdown! * Complete algorithm accepted to Algorithmica 2 Succinct dynamic 2D dictionary matching 3 for succinct 1D dictionary matching

59 Timeline Indexing Dictionary Matching Thesis Goals: 1 Succinct 2D dictionary matching 2 Succinct dynamic 2D dictionary matching Algorithm has been developed Complexity has been proven Manuscript is in final editing stages 3 for succinct 1D dictionary matching

60 Timeline Indexing Dictionary Matching Thesis Goals: 1 Succinct 2D dictionary matching 2 Succinct dynamic 2D dictionary matching 3 for succinct 1D dictionary matching Dictionary matching over generalized Ukkonen suffix tree. Ported code to run over compressed suffix tree, using SDSL. Assumption: no pattern is substring of another. Add lowest marked ancestor framework to overcome. Evaluate time/space trade-offs of CST implementations. Run over large, realistic data sets.

61 Indexing Dictionary Matching Thank you!

Succinct 2D Dictionary Matching with No Slowdown

Succinct 2D Dictionary Matching with No Slowdown Succinct 2D Dictionary Matching with No Slowdown Shoshana Neuburger and Dina Sokol City University of New York Problem Definition Dictionary Matching Input: Dictionary D = P 1,P 2,...,P d containing d

More information

Compressed Index for Dynamic Text

Compressed Index for Dynamic Text Compressed Index for Dynamic Text Wing-Kai Hon Tak-Wah Lam Kunihiko Sadakane Wing-Kin Sung Siu-Ming Yiu Abstract This paper investigates how to index a text which is subject to updates. The best solution

More information

Smaller and Faster Lempel-Ziv Indices

Smaller and Faster Lempel-Ziv Indices Smaller and Faster Lempel-Ziv Indices Diego Arroyuelo and Gonzalo Navarro Dept. of Computer Science, Universidad de Chile, Chile. {darroyue,gnavarro}@dcc.uchile.cl Abstract. Given a text T[1..u] over an

More information

Alphabet Friendly FM Index

Alphabet Friendly FM Index Alphabet Friendly FM Index Author: Rodrigo González Santiago, November 8 th, 2005 Departamento de Ciencias de la Computación Universidad de Chile Outline Motivations Basics Burrows Wheeler Transform FM

More information

Lecture 18 April 26, 2012

Lecture 18 April 26, 2012 6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 18 April 26, 2012 1 Overview In the last lecture we introduced the concept of implicit, succinct, and compact data structures, and

More information

Alphabet-Independent Compressed Text Indexing

Alphabet-Independent Compressed Text Indexing Alphabet-Independent Compressed Text Indexing DJAMAL BELAZZOUGUI Université Paris Diderot GONZALO NAVARRO University of Chile Self-indexes are able to represent a text within asymptotically the information-theoretic

More information

Theoretical Computer Science. Dynamic rank/select structures with applications to run-length encoded texts

Theoretical Computer Science. Dynamic rank/select structures with applications to run-length encoded texts Theoretical Computer Science 410 (2009) 4402 4413 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Dynamic rank/select structures with

More information

arxiv: v1 [cs.ds] 15 Feb 2012

arxiv: v1 [cs.ds] 15 Feb 2012 Linear-Space Substring Range Counting over Polylogarithmic Alphabets Travis Gagie 1 and Pawe l Gawrychowski 2 1 Aalto University, Finland travis.gagie@aalto.fi 2 Max Planck Institute, Germany gawry@cs.uni.wroc.pl

More information

Space-Efficient Construction Algorithm for Circular Suffix Tree

Space-Efficient Construction Algorithm for Circular Suffix Tree Space-Efficient Construction Algorithm for Circular Suffix Tree Wing-Kai Hon, Tsung-Han Ku, Rahul Shah and Sharma Thankachan CPM2013 1 Outline Preliminaries and Motivation Circular Suffix Tree Our Indexes

More information

Text matching of strings in terms of straight line program by compressed aleshin type automata

Text matching of strings in terms of straight line program by compressed aleshin type automata Text matching of strings in terms of straight line program by compressed aleshin type automata 1 A.Jeyanthi, 2 B.Stalin 1 Faculty, 2 Assistant Professor 1 Department of Mathematics, 2 Department of Mechanical

More information

Data Structure for Dynamic Patterns

Data Structure for Dynamic Patterns Data Structure for Dynamic Patterns Chouvalit Khancome and Veera Booning Member IAENG Abstract String matching and dynamic dictionary matching are significant principles in computer science. These principles

More information

String Range Matching

String Range Matching String Range Matching Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi Department of Computer Science, University of Helsinki Helsinki, Finland firstname.lastname@cs.helsinki.fi Abstract. Given strings

More information

Compressed Representations of Sequences and Full-Text Indexes

Compressed Representations of Sequences and Full-Text Indexes Compressed Representations of Sequences and Full-Text Indexes PAOLO FERRAGINA Università di Pisa GIOVANNI MANZINI Università del Piemonte Orientale VELI MÄKINEN University of Helsinki AND GONZALO NAVARRO

More information

Longest Gapped Repeats and Palindromes

Longest Gapped Repeats and Palindromes Discrete Mathematics and Theoretical Computer Science DMTCS vol. 19:4, 2017, #4 Longest Gapped Repeats and Palindromes Marius Dumitran 1 Paweł Gawrychowski 2 Florin Manea 3 arxiv:1511.07180v4 [cs.ds] 11

More information

arxiv: v2 [cs.ds] 8 Apr 2016

arxiv: v2 [cs.ds] 8 Apr 2016 Optimal Dynamic Strings Paweł Gawrychowski 1, Adam Karczmarz 1, Tomasz Kociumaka 1, Jakub Łącki 2, and Piotr Sankowski 1 1 Institute of Informatics, University of Warsaw, Poland [gawry,a.karczmarz,kociumaka,sank]@mimuw.edu.pl

More information

String Matching with Variable Length Gaps

String Matching with Variable Length Gaps String Matching with Variable Length Gaps Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj, and David Kofoed Wind Technical University of Denmark Abstract. We consider string matching with variable length

More information

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts Philip Bille 1, Rolf Fagerberg 2, and Inge Li Gørtz 3 1 IT University of Copenhagen. Rued Langgaards

More information

Indexing LZ77: The Next Step in Self-Indexing. Gonzalo Navarro Department of Computer Science, University of Chile

Indexing LZ77: The Next Step in Self-Indexing. Gonzalo Navarro Department of Computer Science, University of Chile Indexing LZ77: The Next Step in Self-Indexing Gonzalo Navarro Department of Computer Science, University of Chile gnavarro@dcc.uchile.cl Part I: Why Jumping off the Cliff The Past Century Self-Indexing:

More information

String Regularities and Degenerate Strings

String Regularities and Degenerate Strings M. Sc. Thesis Defense Md. Faizul Bari (100705050P) Supervisor: Dr. M. Sohel Rahman String Regularities and Degenerate Strings Department of Computer Science and Engineering Bangladesh University of Engineering

More information

Stronger Lempel-Ziv Based Compressed Text Indexing

Stronger Lempel-Ziv Based Compressed Text Indexing Stronger Lempel-Ziv Based Compressed Text Indexing Diego Arroyuelo 1, Gonzalo Navarro 1, and Kunihiko Sadakane 2 1 Dept. of Computer Science, Universidad de Chile, Blanco Encalada 2120, Santiago, Chile.

More information

On-line String Matching in Highly Similar DNA Sequences

On-line String Matching in Highly Similar DNA Sequences On-line String Matching in Highly Similar DNA Sequences Nadia Ben Nsira 1,2,ThierryLecroq 1,,MouradElloumi 2 1 LITIS EA 4108, Normastic FR3638, University of Rouen, France 2 LaTICE, University of Tunis

More information

A Unifying Framework for Compressed Pattern Matching

A Unifying Framework for Compressed Pattern Matching A Unifying Framework for Compressed Pattern Matching Takuya Kida Yusuke Shibata Masayuki Takeda Ayumi Shinohara Setsuo Arikawa Department of Informatics, Kyushu University 33 Fukuoka 812-8581, Japan {

More information

Finding all covers of an indeterminate string in O(n) time on average

Finding all covers of an indeterminate string in O(n) time on average Finding all covers of an indeterminate string in O(n) time on average Md. Faizul Bari, M. Sohel Rahman, and Rifat Shahriyar Department of Computer Science and Engineering Bangladesh University of Engineering

More information

Shift-And Approach to Pattern Matching in LZW Compressed Text

Shift-And Approach to Pattern Matching in LZW Compressed Text Shift-And Approach to Pattern Matching in LZW Compressed Text Takuya Kida, Masayuki Takeda, Ayumi Shinohara, and Setsuo Arikawa Department of Informatics, Kyushu University 33 Fukuoka 812-8581, Japan {kida,

More information

COMPRESSED INDEXING DATA STRUCTURES FOR BIOLOGICAL SEQUENCES

COMPRESSED INDEXING DATA STRUCTURES FOR BIOLOGICAL SEQUENCES COMPRESSED INDEXING DATA STRUCTURES FOR BIOLOGICAL SEQUENCES DO HUY HOANG (B.C.S. (Hons), NUS) A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN COMPUTER SCIENCE SCHOOL OF COMPUTING NATIONAL

More information

arxiv: v1 [cs.ds] 19 Apr 2011

arxiv: v1 [cs.ds] 19 Apr 2011 Fixed Block Compression Boosting in FM-Indexes Juha Kärkkäinen 1 and Simon J. Puglisi 2 1 Department of Computer Science, University of Helsinki, Finland juha.karkkainen@cs.helsinki.fi 2 Department of

More information

Fast Fully-Compressed Suffix Trees

Fast Fully-Compressed Suffix Trees Fast Fully-Compressed Suffix Trees Gonzalo Navarro Department of Computer Science University of Chile, Chile gnavarro@dcc.uchile.cl Luís M. S. Russo INESC-ID / Instituto Superior Técnico Technical University

More information

CSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182

CSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182 CSE182-L7 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding 10-07 CSE182 Bell Labs Honors Pattern matching 10-07 CSE182 Just the Facts Consider the set of all substrings

More information

Define M to be a binary n by m matrix such that:

Define M to be a binary n by m matrix such that: The Shift-And Method Define M to be a binary n by m matrix such that: M(i,j) = iff the first i characters of P exactly match the i characters of T ending at character j. M(i,j) = iff P[.. i] T[j-i+.. j]

More information

A Faster Grammar-Based Self-Index

A Faster Grammar-Based Self-Index A Faster Grammar-Based Self-Index Travis Gagie 1 Pawe l Gawrychowski 2 Juha Kärkkäinen 3 Yakov Nekrich 4 Simon Puglisi 5 Aalto University Max-Planck-Institute für Informatik University of Helsinki University

More information

On-Line Construction of Suffix Trees I. E. Ukkonen z

On-Line Construction of Suffix Trees I. E. Ukkonen z .. t Algorithmica (1995) 14:249-260 Algorithmica 9 1995 Springer-Verlag New York Inc. On-Line Construction of Suffix Trees I E. Ukkonen z Abstract. An on-line algorithm is presented for constructing the

More information

arxiv: v1 [cs.ds] 25 Nov 2009

arxiv: v1 [cs.ds] 25 Nov 2009 Alphabet Partitioning for Compressed Rank/Select with Applications Jérémy Barbay 1, Travis Gagie 1, Gonzalo Navarro 1 and Yakov Nekrich 2 1 Department of Computer Science University of Chile {jbarbay,

More information

Theoretical aspects of ERa, the fastest practical suffix tree construction algorithm

Theoretical aspects of ERa, the fastest practical suffix tree construction algorithm Theoretical aspects of ERa, the fastest practical suffix tree construction algorithm Matevž Jekovec University of Ljubljana Faculty of Computer and Information Science Oct 10, 2013 Text indexing problem

More information

arxiv: v2 [cs.ds] 5 Mar 2014

arxiv: v2 [cs.ds] 5 Mar 2014 Order-preserving pattern matching with k mismatches Pawe l Gawrychowski 1 and Przemys law Uznański 2 1 Max-Planck-Institut für Informatik, Saarbrücken, Germany 2 LIF, CNRS and Aix-Marseille Université,

More information

Fast String Kernels. Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200

Fast String Kernels. Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200 Fast String Kernels Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200 Alex.Smola@anu.edu.au joint work with S.V.N. Vishwanathan Slides (soon) available

More information

Opportunistic Data Structures with Applications

Opportunistic Data Structures with Applications Opportunistic Data Structures with Applications Paolo Ferragina Giovanni Manzini Abstract There is an upsurging interest in designing succinct data structures for basic searching problems (see [23] and

More information

Succinct Suffix Arrays based on Run-Length Encoding

Succinct Suffix Arrays based on Run-Length Encoding Succinct Suffix Arrays based on Run-Length Encoding Veli Mäkinen Gonzalo Navarro Abstract A succinct full-text self-index is a data structure built on a text T = t 1 t 2...t n, which takes little space

More information

Breaking a Time-and-Space Barrier in Constructing Full-Text Indices

Breaking a Time-and-Space Barrier in Constructing Full-Text Indices Breaking a Time-and-Space Barrier in Constructing Full-Text Indices Wing-Kai Hon Kunihiko Sadakane Wing-Kin Sung Abstract Suffix trees and suffix arrays are the most prominent full-text indices, and their

More information

Space-Efficient Re-Pair Compression

Space-Efficient Re-Pair Compression Space-Efficient Re-Pair Compression Philip Bille, Inge Li Gørtz, and Nicola Prezza Technical University of Denmark, DTU Compute {phbi,inge,npre}@dtu.dk Abstract Re-Pair [5] is an effective grammar-based

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 2: Text Compression Lecture 5: Context-Based Compression Juha Kärkkäinen 14.11.2017 1 / 19 Text Compression We will now look at techniques for text compression. These techniques

More information

Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome

Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression Sergio De Agostino Sapienza University di Rome Parallel Systems A parallel random access machine (PRAM)

More information

Succincter text indexing with wildcards

Succincter text indexing with wildcards University of British Columbia CPM 2011 June 27, 2011 Problem overview Problem overview Problem overview Problem overview Problem overview Problem overview Problem overview Problem overview Problem overview

More information

An on{line algorithm is presented for constructing the sux tree for a. the desirable property of processing the string symbol by symbol from left to

An on{line algorithm is presented for constructing the sux tree for a. the desirable property of processing the string symbol by symbol from left to (To appear in ALGORITHMICA) On{line construction of sux trees 1 Esko Ukkonen Department of Computer Science, University of Helsinki, P. O. Box 26 (Teollisuuskatu 23), FIN{00014 University of Helsinki,

More information

On the Number of Distinct Squares

On the Number of Distinct Squares Frantisek (Franya) Franek Advanced Optimization Laboratory Department of Computing and Software McMaster University, Hamilton, Ontario, Canada Invited talk - Prague Stringology Conference 2014 Outline

More information

Faster Compact On-Line Lempel-Ziv Factorization

Faster Compact On-Line Lempel-Ziv Factorization Faster Compact On-Line Lempel-Ziv Factorization Jun ichi Yamamoto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda Department of Informatics, Kyushu University, Nishiku, Fukuoka, Japan

More information

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts Philip Bille IT University of Copenhagen Rolf Fagerberg University of Southern Denmark Inge Li Gørtz

More information

String Indexing for Patterns with Wildcards

String Indexing for Patterns with Wildcards MASTER S THESIS String Indexing for Patterns with Wildcards Hjalte Wedel Vildhøj and Søren Vind Technical University of Denmark August 8, 2011 Abstract We consider the problem of indexing a string t of

More information

Compressed Representations of Sequences and Full-Text Indexes

Compressed Representations of Sequences and Full-Text Indexes Compressed Representations of Sequences and Full-Text Indexes PAOLO FERRAGINA Dipartimento di Informatica, Università di Pisa, Italy GIOVANNI MANZINI Dipartimento di Informatica, Università del Piemonte

More information

arxiv: v1 [cs.ds] 22 Nov 2012

arxiv: v1 [cs.ds] 22 Nov 2012 Faster Compact Top-k Document Retrieval Roberto Konow and Gonzalo Navarro Department of Computer Science, University of Chile {rkonow,gnavarro}@dcc.uchile.cl arxiv:1211.5353v1 [cs.ds] 22 Nov 2012 Abstract:

More information

A Space-Efficient Frameworks for Top-k String Retrieval

A Space-Efficient Frameworks for Top-k String Retrieval A Space-Efficient Frameworks for Top-k String Retrieval Wing-Kai Hon, National Tsing Hua University Rahul Shah, Louisiana State University Sharma V. Thankachan, Louisiana State University Jeffrey Scott

More information

Compact Indexes for Flexible Top-k Retrieval

Compact Indexes for Flexible Top-k Retrieval Compact Indexes for Flexible Top-k Retrieval Simon Gog Matthias Petri Institute of Theoretical Informatics, Karlsruhe Institute of Technology Computing and Information Systems, The University of Melbourne

More information

SUFFIX TREE. SYNONYMS Compact suffix trie

SUFFIX TREE. SYNONYMS Compact suffix trie SUFFIX TREE Maxime Crochemore King s College London and Université Paris-Est, http://www.dcs.kcl.ac.uk/staff/mac/ Thierry Lecroq Université de Rouen, http://monge.univ-mlv.fr/~lecroq SYNONYMS Compact suffix

More information

Self-Indexed Grammar-Based Compression

Self-Indexed Grammar-Based Compression Fundamenta Informaticae XXI (2001) 1001 1025 1001 IOS Press Self-Indexed Grammar-Based Compression Francisco Claude David R. Cheriton School of Computer Science University of Waterloo fclaude@cs.uwaterloo.ca

More information

On Compressing and Indexing Repetitive Sequences

On Compressing and Indexing Repetitive Sequences On Compressing and Indexing Repetitive Sequences Sebastian Kreft a,1,2, Gonzalo Navarro a,2 a Department of Computer Science, University of Chile Abstract We introduce LZ-End, a new member of the Lempel-Ziv

More information

Efficient High-Similarity String Comparison: The Waterfall Algorithm

Efficient High-Similarity String Comparison: The Waterfall Algorithm Efficient High-Similarity String Comparison: The Waterfall Algorithm Alexander Tiskin Department of Computer Science University of Warwick http://go.warwick.ac.uk/alextiskin Alexander Tiskin (Warwick)

More information

A Fully Compressed Pattern Matching Algorithm for Simple Collage Systems

A Fully Compressed Pattern Matching Algorithm for Simple Collage Systems A Fully Compressed Pattern Matching Algorithm for Simple Collage Systems Shunsuke Inenaga 1, Ayumi Shinohara 2,3 and Masayuki Takeda 2,3 1 Department of Computer Science, P.O. Box 26 (Teollisuuskatu 23)

More information

A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS *

A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS * A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS * 1 Jorma Tarhio and Esko Ukkonen Department of Computer Science, University of Helsinki Tukholmankatu 2, SF-00250 Helsinki,

More information

Number of occurrences of powers in strings

Number of occurrences of powers in strings Author manuscript, published in "International Journal of Foundations of Computer Science 21, 4 (2010) 535--547" DOI : 10.1142/S0129054110007416 Number of occurrences of powers in strings Maxime Crochemore

More information

Dynamic Entropy-Compressed Sequences and Full-Text Indexes

Dynamic Entropy-Compressed Sequences and Full-Text Indexes Dynamic Entropy-Compressed Sequences and Full-Text Indexes VELI MÄKINEN University of Helsinki and GONZALO NAVARRO University of Chile First author funded by the Academy of Finland under grant 108219.

More information

Self-Indexed Grammar-Based Compression

Self-Indexed Grammar-Based Compression Fundamenta Informaticae XXI (2001) 1001 1025 1001 IOS Press Self-Indexed Grammar-Based Compression Francisco Claude David R. Cheriton School of Computer Science University of Waterloo fclaude@cs.uwaterloo.ca

More information

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner

More information

Hierarchical Overlap Graph

Hierarchical Overlap Graph Hierarchical Overlap Graph B. Cazaux and E. Rivals LIRMM & IBC, Montpellier 8. Feb. 2018 arxiv:1802.04632 2018 B. Cazaux & E. Rivals 1 / 29 Overlap Graph for a set of words Consider the set P := {abaa,

More information

Pattern Matching (Exact Matching) Overview

Pattern Matching (Exact Matching) Overview CSI/BINF 5330 Pattern Matching (Exact Matching) Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Pattern Matching Exhaustive Search DFA Algorithm KMP Algorithm

More information

Internal Pattern Matching Queries in a Text and Applications

Internal Pattern Matching Queries in a Text and Applications Internal Pattern Matching Queries in a Text and Applications Tomasz Kociumaka Jakub Radoszewski Wojciech Rytter Tomasz Waleń Abstract We consider several types of internal queries: questions about subwords

More information

Converting SLP to LZ78 in almost Linear Time

Converting SLP to LZ78 in almost Linear Time CPM 2013 Converting SLP to LZ78 in almost Linear Time Hideo Bannai 1, Paweł Gawrychowski 2, Shunsuke Inenaga 1, Masayuki Takeda 1 1. Kyushu University 2. Max-Planck-Institut für Informatik Recompress SLP

More information

BLAST: Basic Local Alignment Search Tool

BLAST: Basic Local Alignment Search Tool .. CSC 448 Bioinformatics Algorithms Alexander Dekhtyar.. (Rapid) Local Sequence Alignment BLAST BLAST: Basic Local Alignment Search Tool BLAST is a family of rapid approximate local alignment algorithms[2].

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

Motif Extraction from Weighted Sequences

Motif Extraction from Weighted Sequences Motif Extraction from Weighted Sequences C. Iliopoulos 1, K. Perdikuri 2,3, E. Theodoridis 2,3,, A. Tsakalidis 2,3 and K. Tsichlas 1 1 Department of Computer Science, King s College London, London WC2R

More information

Optimal Dynamic Sequence Representations

Optimal Dynamic Sequence Representations Optimal Dynamic Sequence Representations Gonzalo Navarro Yakov Nekrich Abstract We describe a data structure that supports access, rank and select queries, as well as symbol insertions and deletions, on

More information

Implementing Approximate Regularities

Implementing Approximate Regularities Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National

More information

String Regularities and Degenerate Strings

String Regularities and Degenerate Strings M.Sc. Engg. Thesis String Regularities and Degenerate Strings by Md. Faizul Bari Submitted to Department of Computer Science and Engineering in partial fulfilment of the requirments for the degree of Master

More information

Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line

Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VF Files On-line MatBio 18 Solon P. Pissis and Ahmad Retha King s ollege London 02-Aug-2018 Solon P. Pissis and Ahmad Retha

More information

Suffix Array of Alignment: A Practical Index for Similar Data

Suffix Array of Alignment: A Practical Index for Similar Data Suffix Array of Alignment: A Practical Index for Similar Data Joong Chae Na 1, Heejin Park 2, Sunho Lee 3, Minsung Hong 3, Thierry Lecroq 4, Laurent Mouchard 4, and Kunsoo Park 3, 1 Department of Computer

More information

Preview: Text Indexing

Preview: Text Indexing Simon Gog gog@ira.uka.de - Simon Gog: KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu Text Indexing Motivation Problems Given a text

More information

Average Complexity of Exact and Approximate Multiple String Matching

Average Complexity of Exact and Approximate Multiple String Matching Average Complexity of Exact and Approximate Multiple String Matching Gonzalo Navarro Department of Computer Science University of Chile gnavarro@dcc.uchile.cl Kimmo Fredriksson Department of Computer Science

More information

A Simple Alphabet-Independent FM-Index

A Simple Alphabet-Independent FM-Index A Simple Alphabet-Independent -Index Szymon Grabowski 1, Veli Mäkinen 2, Gonzalo Navarro 3, Alejandro Salinger 3 1 Computer Engineering Dept., Tech. Univ. of Lódź, Poland. e-mail: sgrabow@zly.kis.p.lodz.pl

More information

Simple Real-Time Constant-Space String Matching

Simple Real-Time Constant-Space String Matching Simple Real-Time Constant-Space String Matching Dany Breslauer 1, Roberto Grossi 2, and Filippo Mignosi 3 1 Caesarea Rothchild Institute, University of Haifa, Haifa, Israel 2 Dipartimento di Informatica,

More information

How many double squares can a string contain?

How many double squares can a string contain? How many double squares can a string contain? F. Franek, joint work with A. Deza and A. Thierry Algorithms Research Group Department of Computing and Software McMaster University, Hamilton, Ontario, Canada

More information

Burrows-Wheeler Transforms in Linear Time and Linear Bits

Burrows-Wheeler Transforms in Linear Time and Linear Bits Burrows-Wheeler Transforms in Linear Time and Linear Bits Russ Cox (following Hon, Sadakane, Sung, Farach, and others) 18.417 Final Project BWT in Linear Time and Linear Bits Three main parts to the result.

More information

Skriptum VL Text Indexing Sommersemester 2012 Johannes Fischer (KIT)

Skriptum VL Text Indexing Sommersemester 2012 Johannes Fischer (KIT) Skriptum VL Text Indexing Sommersemester 2012 Johannes Fischer (KIT) Disclaimer Students attending my lectures are often astonished that I present the material in a much livelier form than in this script.

More information

Computing Longest Common Substrings Using Suffix Arrays

Computing Longest Common Substrings Using Suffix Arrays Computing Longest Common Substrings Using Suffix Arrays Maxim A. Babenko, Tatiana A. Starikovskaya Moscow State University Computer Science in Russia, 2008 Maxim A. Babenko, Tatiana A. Starikovskaya (MSU)Computing

More information

Forbidden Patterns. {vmakinen leena.salmela

Forbidden Patterns. {vmakinen leena.salmela Forbidden Patterns Johannes Fischer 1,, Travis Gagie 2,, Tsvi Kopelowitz 3, Moshe Lewenstein 4, Veli Mäkinen 5,, Leena Salmela 5,, and Niko Välimäki 5, 1 KIT, Karlsruhe, Germany, johannes.fischer@kit.edu

More information

Longest Lyndon Substring After Edit

Longest Lyndon Substring After Edit Longest Lyndon Substring After Edit Yuki Urabe Department of Electrical Engineering and Computer Science, Kyushu University, Japan yuki.urabe@inf.kyushu-u.ac.jp Yuto Nakashima Department of Informatics,

More information

Text Indexing: Lecture 6

Text Indexing: Lecture 6 Simon Gog gog@kit.edu - 0 Simon Gog: KIT The Research University in the Helmholtz Association www.kit.edu Reviewing the last two lectures We have seen two top-k document retrieval frameworks. Question

More information

Sequence comparison by compression

Sequence comparison by compression Sequence comparison by compression Motivation similarity as a marker for homology. And homology is used to infer function. Sometimes, we are only interested in a numerical distance between two sequences.

More information

Computing a Longest Common Palindromic Subsequence

Computing a Longest Common Palindromic Subsequence Fundamenta Informaticae 129 (2014) 1 12 1 DOI 10.3233/FI-2014-860 IOS Press Computing a Longest Common Palindromic Subsequence Shihabur Rahman Chowdhury, Md. Mahbubul Hasan, Sumaiya Iqbal, M. Sohel Rahman

More information

Optimal-Time Text Indexing in BWT-runs Bounded Space

Optimal-Time Text Indexing in BWT-runs Bounded Space Optimal-Time Text Indexing in BWT-runs Bounded Space Travis Gagie Gonzalo Navarro Nicola Prezza Abstract Indexing highly repetitive texts such as genomic databases, software repositories and versioned

More information

arxiv: v1 [cs.ds] 8 Sep 2018

arxiv: v1 [cs.ds] 8 Sep 2018 Fully-Functional Suffix Trees and Optimal Text Searching in BWT-runs Bounded Space Travis Gagie 1,2, Gonzalo Navarro 2,3, and Nicola Prezza 4 1 EIT, Diego Portales University, Chile 2 Center for Biotechnology

More information

Skriptum VL Text-Indexierung Sommersemester 2010 Johannes Fischer (KIT)

Skriptum VL Text-Indexierung Sommersemester 2010 Johannes Fischer (KIT) 1 Recommended Reading Skriptum VL Text-Indexierung Sommersemester 2010 Johannes Fischer (KIT) D. Gusfield: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997. M. Crochemore,

More information

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2 Text Compression Jayadev Misra The University of Texas at Austin December 5, 2003 Contents 1 Introduction 1 2 A Very Incomplete Introduction to Information Theory 2 3 Huffman Coding 5 3.1 Uniquely Decodable

More information

Fast profile matching algorithms A survey

Fast profile matching algorithms A survey Theoretical Computer Science 395 (2008) 137 157 www.elsevier.com/locate/tcs Fast profile matching algorithms A survey Cinzia Pizzi a,,1, Esko Ukkonen b a Department of Computer Science, University of Helsinki,

More information

String Search. 6th September 2018

String Search. 6th September 2018 String Search 6th September 2018 Search for a given (short) string in a long string Search problems have become more important lately The amount of stored digital information grows steadily (rapidly?)

More information

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts 3 PHILIP BILLE IT University of Copenhagen ROLF FAGERBERG University of Southern Denmark AND INGE LI

More information

ON THE BIT-COMPLEXITY OF LEMPEL-ZIV COMPRESSION

ON THE BIT-COMPLEXITY OF LEMPEL-ZIV COMPRESSION ON THE BIT-COMPLEXITY OF LEMPEL-ZIV COMPRESSION PAOLO FERRAGINA, IGOR NITTO, AND ROSSANO VENTURINI Abstract. One of the most famous and investigated lossless data-compression schemes is the one introduced

More information

Text Searching. Thierry Lecroq Laboratoire d Informatique, du Traitement de l Information et des

Text Searching. Thierry Lecroq Laboratoire d Informatique, du Traitement de l Information et des Text Searching Thierry Lecroq Thierry.Lecroq@univ-rouen.fr Laboratoire d Informatique, du Traitement de l Information et des Systèmes. International PhD School in Formal Languages and Applications Tarragona,

More information

Intrusion Detection and Malware Analysis

Intrusion Detection and Malware Analysis Intrusion Detection and Malware Analysis IDS feature extraction Pavel Laskov Wilhelm Schickard Institute for Computer Science Metric embedding of byte sequences Sequences 1. blabla blubla blablabu aa 2.

More information

arxiv: v1 [cs.ds] 30 Nov 2018

arxiv: v1 [cs.ds] 30 Nov 2018 Faster Attractor-Based Indexes Gonzalo Navarro 1,2 and Nicola Prezza 3 1 CeBiB Center for Biotechnology and Bioengineering 2 Dept. of Computer Science, University of Chile, Chile. gnavarro@dcc.uchile.cl

More information

OPTIMAL PARALLEL SUPERPRIMITIVITY TESTING FOR SQUARE ARRAYS

OPTIMAL PARALLEL SUPERPRIMITIVITY TESTING FOR SQUARE ARRAYS Parallel Processing Letters, c World Scientific Publishing Company OPTIMAL PARALLEL SUPERPRIMITIVITY TESTING FOR SQUARE ARRAYS COSTAS S. ILIOPOULOS Department of Computer Science, King s College London,

More information

Rank and Select Operations on Binary Strings (1974; Elias)

Rank and Select Operations on Binary Strings (1974; Elias) Rank and Select Operations on Binary Strings (1974; Elias) Naila Rahman, University of Leicester, www.cs.le.ac.uk/ nyr1 Rajeev Raman, University of Leicester, www.cs.le.ac.uk/ rraman entry editor: Paolo

More information

Theoretical Computer Science

Theoretical Computer Science Theoretical Computer Science 443 (2012) 25 34 Contents lists available at SciVerse ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs String matching with variable

More information