OBLIVIOUS STRING EMBEDDINGS AND EDIT DISTANCE APPROXIMATIONS
|
|
- Sheryl Cecilia Boone
- 6 years ago
- Views:
Transcription
1 OBLIVIOUS STRING EMBEDDINGS AND EDIT DISTANCE APPROXIMATIONS Tuğkan Batu a, Funda Ergun b, and Cenk Sahinalp b a LONDON SCHOOL OF ECONOMICS b SIMON FRASER UNIVERSITY LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 1
2 EDIT DISTANCE Let S and T be strings over alphabet Σ. Edit Distance D(S, T): number character insertions, deletions, and substitutions required to transform S into T. Many string similarity problems are based on edit distance. Used in text processing, analysis of genomic sequences,... Variants: non-uniform costs, block operations,... Exact Computation: [Masek Paterson 80] gave an O(n 2 / log n)-time algorithm for constant-size alphabets. LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 2
3 EDIT DISTANCE APPROXIMATION γ-approximation: Given strings S,T, and γ > 1, output a value d such that D(S,T) d γ D(S,T). [Bar-Yossef et al. 04]: n 3/7 -approximation in near linear time (Õ(n) time). Our results: n 1 3 +o(1) -approximation in near linear time. n 1 ɛ 3 +o(1) -approximation in Õ(n1+ɛ ) time. Block edit distance variants: Almost logarithmic approximation factors [Muthukrishnan Sahinalp 00], [Cormode Muthukrishnan 02] LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 3
4 STRING (TO STRING) EMBEDDINGS Def. Let Σ and Γ be two alphabets. A string embedding with distortion d is a mapping φ : Σ Γ such that D(S,T) d 1 D(φ(S),φ(T)) d 2 D(S,T) for all S,T, and d 1 d 2 d. Interesting if φ reduces string length (dimensionality reduction): Def. A string embedding with reduction r > 1 maps a string of length n to a string of length at most n r. Warning: This is not a compression. We require Σ Γ. LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 4
5 STRING EMBEDDINGS (CONT.) Lemma. A string embedding with reduction r has a distortion at least r. "Proof." Maximum edit distance reduces by a factor of r. Minimum edit distance stays at 1. Our result: A string embedding with reduction r and distortion r 1+o(1) can be computed in Õ(n) time. LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 5
6 STRING EMBEDDINGS VIA PARTITIONING S : adebccebaaadebcbadec adebc cebaa adebc badec α β α δ T : dadebccebaaadebcbade dadeb cceba aadeb cbade ε γ κ λ We want: Consistency S : adebc cebaa adebc badec T : d adebc cebaa adebc bade Maybe too strong to ask! Need to use content! Locality LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 6
7 LOCALLY CONSISTENT PARSING (LCP)[SAHINALP VISHKIN 95,96] Partitions depend only on local string content. Consistency condition: If sufficiently long substring w occurs more than once in S, then most blocks in w will be set identically in each occurrence of w (in S or any other string). most blocks = all but some boundary blocks LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 7
8 PARSING CONSISTENTLY Convention: a > b > c > d > e >... > z Assume no character is repeated consecutively. S :... dcadecbgabcdefghkmopstuadecbhc... We will partition S into blocks of size 2 or Mark every local maximum in sliding window of 3 (= a character larger than immediate neighbors) (Primary markers) (No consecutive markers) S :... dc adec bg abcdef ghkmopstu adec bhc Partition the segments into blocks of size 2 (or 3 if necessary) (Secondary markers) S :... dc ad ec bg ab cd ef gh km op stu ad ec bhc... LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 8
9 CONSISTENT, BUT IS IT LOCAL? dcadecbgabcdef ghkmopstuadecbhc dc adec bg abcdef ghkmopstu adec bhc dc ad ec bg a b cd ef gh km op stu ad ec bhc Problem: Primary markers can be far apart (as far as twice the alphabet size). Hence, marker locations can depend on far away characters. LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 9
10 ALPHABET REDUCTION Note that all we need for parsing are no repetitions and total order of characters. We will reduce alphabet size to avoid far-away primary markers. Σ = {a = 1111, b = 1110, c = 1101, d = 1011, e = 1010,...} Assign a tag (with shorter bit complexity) to each character. Tag of ith character: rightmost bit position where S[i] and S[i 1] differ concatenated with value of that bit in S[i] S : a e b tag : Remark. Still no repetitions in tags. Reduction in alphabet: k bits to log k + 1 bits LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 10
11 ITERATIVE ALPHABET REDUCTION Apply alphabet reduction log k times ( k: initial alphabet size) the tags are constant use the LCP procedure (on tags) to partition string into blocks of size 2 and 3 Each marker is set based on O(log k) locations in the string. LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 11
12 OBTAINING LONGER BLOCKS Problem. Blocks are of size 2 and 3. What if we need longer blocks? 1. Label blocks (by a new alphabet) Use the same character for different occurences of the same block. 2. Partition this new string into blocks of size 2 and 3. Step 1 : Step 2 : ad ec bg ab cd ef gh ko mp stu ad ec α β χ δ ε φ ν ß κ λ α β αβ χδε φ ν ßκλ αβ adec bgabcd ef gh kompstu adec Block sizes between 2 2 = 4 and 3 2 = 9. After t repetitions: Block sizes between 2 t and 3 t. LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 12
13 LCP(C) We generalize above technique to obtain blocks of size between c and 2c 1. Same underlying ideas Various periodicities are handled separately Alphabet reduction is achieved by comparing substrings of length 2c 3 instead Computed in O(c 2 n) time. Markers depend on roughly O(c 2c ) locations Lemma. One edit operation to a string can change at most O(c 2c ) markers in LCP(c). LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 13
14 ITERATIVE APPLICATION OF LCP(C) 1. Label blocks (by a new alphabet) Use the same character for different occurences of the same block. 2. Partition this new string into blocks of size between c and 2c 1. Step 1 : Step 2 : ad ec bg ab cd ef gh ko mp stu ad ec α β χ δ ε φ ν ß κ λ α β αβ χδε φ ν ßκλ αβ adec bgabcd ef gh kompstu adec Block sizes between c 2 and (2c 1) 2. After t repetitions: Block sizes between c t and (2c 1) t. LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 14
15 STRING EMBEDDINGS VIA LCP(C) String embedding φ(s) with reduction r = Ω(log n): 1. Choose c = log log n/ log log log n and t such that c t = r. 2. Apply LCP(c) t times to partition S. block sizes between c t = r and (2c 1) t = r 1+o(1). 3. Label the blocks with new alphabet Γ to obtain string φ(s). Lemma. D(S,T)/(2c 1) t D(φ(S),φ(T)) O(c 2c ) D(S,T). Corollary. Embedding φ has distortion r 1+o(1). LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 15
16 LCP(2) VS LCP(C) When we use LCP(c) to get reduction r, we set c t = r. The distortion is the largest block size: (2c 1) t (2c) t = 2 t r. Overhead of LCP(c) on the distortion: 2 t = r log c 2 LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 16
17 APPROXIMATING EDIT DISTANCE A naive idea: Given S, T, and γ > 1, 1. Let S = φ(s) and T = φ(t) for a suitable reduction r (see below). 2. Calculate D(S,T ) using dynamic programming (in O((n/r) 2 ) time). 3. Translate into an approximation to D(S,T) using distortion of φ. Setting (n/r) 2 = n, gives r = n. Does not yield better than n-approximation in linear time! LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 17
18 HOW TO CALCULATE D(S, T ) MORE EFFICIENTLY We can exploit properties of S and T to compute D(S, T ). If D(S, T) k, then the number of insertions and deletions (from S to T ) is less than k/r. Observation. During the computation of D(S, T ), we do not have to compare far away locations in S and T. Hence, we can restrict algorithm to look at a "narrow" band along diagonal of DP table. Hence, we can make S and T longer (read: smaller r) for better accuracy. Say, use r = n 1/3. Result: n (1 ɛ)/3+o(1) -approximation in Õ(n 1+ɛ ) time. (We can get even better if D(S, T) < n 2/3 ). LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 18
19 FUTURE DIRECTIONS Better edit distance approximation Low distortion L 1 embeddings for strings: φ : Σ L d 1 Ω(log n) distortion lower bound [Krauthgamer Ostrovsky 06] 2 O( log nlog log n) best known [Ostrovsky Rabani 05] Other string-similarity problems LSE CDAM Seminar Oblivious String Embeddings and Edit Distance Approximations October 19, 2006 p. 19
Oblivious String Embeddings and Edit Distance Approximations
Oblivious String Embeddings and Edit Distance Approximations Tuğkan Batu Funda Ergun Cenk Sahinalp Abstract We introduce an oblivious embedding that maps strings of length n under edit distance to strings
More informationEfficient Approximation of Large LCS in Strings Over Not Small Alphabet
Efficient Approximation of Large LCS in Strings Over Not Small Alphabet Gad M. Landau 1, Avivit Levy 2,3, and Ilan Newman 1 1 Department of Computer Science, Haifa University, Haifa 31905, Israel. E-mail:
More informationTrace Reconstruction Revisited
Trace Reconstruction Revisited Andrew McGregor 1, Eric Price 2, Sofya Vorotnikova 1 1 University of Massachusetts Amherst 2 IBM Almaden Research Center Problem Description Take original string x of length
More informationCompressed Index for Dynamic Text
Compressed Index for Dynamic Text Wing-Kai Hon Tak-Wah Lam Kunihiko Sadakane Wing-Kin Sung Siu-Ming Yiu Abstract This paper investigates how to index a text which is subject to updates. The best solution
More informationarxiv: v2 [cs.ds] 8 Apr 2016
Optimal Dynamic Strings Paweł Gawrychowski 1, Adam Karczmarz 1, Tomasz Kociumaka 1, Jakub Łącki 2, and Piotr Sankowski 1 1 Institute of Informatics, University of Warsaw, Poland [gawry,a.karczmarz,kociumaka,sank]@mimuw.edu.pl
More informationLow Distortion Embedding from Edit to Hamming Distance using Coupling
Electronic Colloquium on Computational Complexity, Report No. 111 (2015) Low Distortion Embedding from Edit to Hamming Distance using Coupling Diptarka Chakraborty Elazar Goldenberg Michal Koucký July
More informationA Faster Grammar-Based Self-Index
A Faster Grammar-Based Self-Index Travis Gagie 1 Pawe l Gawrychowski 2 Juha Kärkkäinen 3 Yakov Nekrich 4 Simon Puglisi 5 Aalto University Max-Planck-Institute für Informatik University of Helsinki University
More informationSequence comparison by compression
Sequence comparison by compression Motivation similarity as a marker for homology. And homology is used to infer function. Sometimes, we are only interested in a numerical distance between two sequences.
More informationarxiv: v1 [cs.ds] 15 Feb 2012
Linear-Space Substring Range Counting over Polylogarithmic Alphabets Travis Gagie 1 and Pawe l Gawrychowski 2 1 Aalto University, Finland travis.gagie@aalto.fi 2 Max Planck Institute, Germany gawry@cs.uni.wroc.pl
More informationPATTERN MATCHING WITH SWAPS IN PRACTICE
International Journal of Foundations of Computer Science c World Scientific Publishing Company PATTERN MATCHING WITH SWAPS IN PRACTICE MATTEO CAMPANELLI Università di Catania, Scuola Superiore di Catania
More informationApproximate Pattern Matching and the Query Complexity of Edit Distance
Krzysztof Onak Approximate Pattern Matching p. 1/20 Approximate Pattern Matching and the Query Complexity of Edit Distance Joint work with: Krzysztof Onak MIT Alexandr Andoni (CCI) Robert Krauthgamer (Weizmann
More informationComputation Theory Finite Automata
Computation Theory Dept. of Computing ITT Dublin October 14, 2010 Computation Theory I 1 We would like a model that captures the general nature of computation Consider two simple problems: 2 Design a program
More informationTrace Reconstruction Revisited
Trace Reconstruction Revisited Andrew McGregor 1, Eric Price 2, and Sofya Vorotnikova 1 1 University of Massachusetts Amherst {mcgregor,svorotni}@cs.umass.edu 2 IBM Almaden Research Center ecprice@mit.edu
More informationInternal Pattern Matching Queries in a Text and Applications
Internal Pattern Matching Queries in a Text and Applications Tomasz Kociumaka Jakub Radoszewski Wojciech Rytter Tomasz Waleń Abstract We consider several types of internal queries: questions about subwords
More informationHow many double squares can a string contain?
How many double squares can a string contain? F. Franek, joint work with A. Deza and A. Thierry Algorithms Research Group Department of Computing and Software McMaster University, Hamilton, Ontario, Canada
More informationComputing the Entropy of a Stream
Computing the Entropy of a Stream To appear in SODA 2007 Graham Cormode graham@research.att.com Amit Chakrabarti Dartmouth College Andrew McGregor U. Penn / UCSD Outline Introduction Entropy Upper Bound
More informationOvercoming the l 1 Non-Embeddability Barrier: Algorithms for Product Metrics
Overcoming the l 1 Non-Embeddability Barrier: Algorithms for Product Metrics Alexandr Andoni MIT andoni@mit.edu Piotr Indyk MIT indyk@mit.edu Robert Krauthgamer Weizmann Institute of Science robert.krauthgamer@weizmann.ac.il
More informationImproved Sketching of Hamming Distance with Error Correcting
Improved Setching of Hamming Distance with Error Correcting Ohad Lipsy Bar-Ilan University Ely Porat Bar-Ilan University Abstract We address the problem of setching the hamming distance of data streams.
More informationOn Pattern Matching With Swaps
On Pattern Matching With Swaps Fouad B. Chedid Dhofar University, Salalah, Oman Notre Dame University - Louaize, Lebanon P.O.Box: 2509, Postal Code 211 Salalah, Oman Tel: +968 23237200 Fax: +968 23237720
More informationOptimal spaced seeds for faster approximate string matching
Optimal spaced seeds for faster approximate string matching Martin Farach-Colton Gad M. Landau S. Cenk Sahinalp Dekel Tsur Abstract Filtering is a standard technique for fast approximate string matching
More informationOptimal spaced seeds for faster approximate string matching
Optimal spaced seeds for faster approximate string matching Martin Farach-Colton Gad M. Landau S. Cenk Sahinalp Dekel Tsur Abstract Filtering is a standard technique for fast approximate string matching
More informationIndexing LZ77: The Next Step in Self-Indexing. Gonzalo Navarro Department of Computer Science, University of Chile
Indexing LZ77: The Next Step in Self-Indexing Gonzalo Navarro Department of Computer Science, University of Chile gnavarro@dcc.uchile.cl Part I: Why Jumping off the Cliff The Past Century Self-Indexing:
More informationCompressing Kinetic Data From Sensor Networks. Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park
Compressing Kinetic Data From Sensor Networks Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park Motivation Motivation Computer Science Graphics: Image and video
More informationFinding all covers of an indeterminate string in O(n) time on average
Finding all covers of an indeterminate string in O(n) time on average Md. Faizul Bari, M. Sohel Rahman, and Rifat Shahriyar Department of Computer Science and Engineering Bangladesh University of Engineering
More informationA fast algorithm for the Kolakoski sequence
A fast algorithm for the Kolakoski sequence Richard P. Brent Australian National University and University of Newcastle 13 December 2016 (updated 30 Dec. 2016) Joint work with Judy-anne Osborn The Kolakoski
More informationOptimal Data-Dependent Hashing for Approximate Near Neighbors
Optimal Data-Dependent Hashing for Approximate Near Neighbors Alexandr Andoni 1 Ilya Razenshteyn 2 1 Simons Institute 2 MIT, CSAIL April 20, 2015 1 / 30 Nearest Neighbor Search (NNS) Let P be an n-point
More informationOnline Computation of Abelian Runs
Online Computation of Abelian Runs Gabriele Fici 1, Thierry Lecroq 2, Arnaud Lefebvre 2, and Élise Prieur-Gaston2 1 Dipartimento di Matematica e Informatica, Università di Palermo, Italy Gabriele.Fici@unipa.it
More informationConverting SLP to LZ78 in almost Linear Time
CPM 2013 Converting SLP to LZ78 in almost Linear Time Hideo Bannai 1, Paweł Gawrychowski 2, Shunsuke Inenaga 1, Masayuki Takeda 1 1. Kyushu University 2. Max-Planck-Institut für Informatik Recompress SLP
More informationReconstructing Strings from Random Traces
Reconstructing Strings from Random Traces Tuğkan Batu Sampath Kannan Sanjeev Khanna Andrew McGregor Abstract We are given a collection of m random subsequences (traces) of a string t of length n where
More informationSimilarity searching, or how to find your neighbors efficiently
Similarity searching, or how to find your neighbors efficiently Robert Krauthgamer Weizmann Institute of Science CS Research Day for Prospective Students May 1, 2009 Background Geometric spaces and techniques
More informationCommunication complexity of document exchange
Communication complexity of document exchange Graham Cormode Mike Paterson Süleyman Cenk Ṣahinalp Uzi Vishkin Abstract We address the problem of minimizing the communication involved in the exchange of
More informationSelf-Indexed Grammar-Based Compression
Fundamenta Informaticae XXI (2001) 1001 1025 1001 IOS Press Self-Indexed Grammar-Based Compression Francisco Claude David R. Cheriton School of Computer Science University of Waterloo fclaude@cs.uwaterloo.ca
More informationProofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.
Proofs, Strings, and Finite Automata CS154 Chris Pollett Feb 5, 2007. Outline Proofs and Proof Strategies Strings Finding proofs Example: For every graph G, the sum of the degrees of all the nodes in G
More informationEnumeration and symmetry of edit metric spaces. Jessie Katherine Campbell. A dissertation submitted to the graduate faculty
Enumeration and symmetry of edit metric spaces by Jessie Katherine Campbell A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY
More informationEfficient High-Similarity String Comparison: The Waterfall Algorithm
Efficient High-Similarity String Comparison: The Waterfall Algorithm Alexander Tiskin Department of Computer Science University of Warwick http://go.warwick.ac.uk/alextiskin Alexander Tiskin (Warwick)
More informationImproved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts
Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts Philip Bille 1, Rolf Fagerberg 2, and Inge Li Gørtz 3 1 IT University of Copenhagen. Rued Langgaards
More informationOn the Number of Distinct Squares
Frantisek (Franya) Franek Advanced Optimization Laboratory Department of Computing and Software McMaster University, Hamilton, Ontario, Canada Invited talk - Prague Stringology Conference 2014 Outline
More informationnx + 1 = (n + 1)x 13(n + 1) and nx = (n + 1)x + 27(n + 1).
1. (Answer: 630) 001 AIME SOLUTIONS Let a represent the tens digit and b the units digit of an integer with the required property. Then 10a + b must be divisible by both a and b. It follows that b must
More informationThe CENTRE for EDUCATION in MATHEMATICS and COMPUTING cemc.uwaterloo.ca Euclid Contest. Tuesday, April 12, 2016
The CENTRE for EDUCATION in MATHEMATICS and COMPUTING cemc.uwaterloo.ca 016 Euclid Contest Tuesday, April 1, 016 (in North America and South America) Wednesday, April 13, 016 (outside of North America
More informationFinding Frequent Patterns in a String in Sublinear Time
Finding Frequent Patterns in a String in Sublinear Time Petra Berenbrink 1, Funda Ergun 2, and Tom Friedetzky 3 1 School of Computing Science, Simon Fraser University, Burnaby, B.C., V5A 1S6, Canada http://www.cs.sfu.ca/
More informationMaking Nearest Neighbors Easier. Restrictions on Input Algorithms for Nearest Neighbor Search: Lecture 4. Outline. Chapter XI
Restrictions on Input Algorithms for Nearest Neighbor Search: Lecture 4 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology Making Nearest
More informationAlphabet Friendly FM Index
Alphabet Friendly FM Index Author: Rodrigo González Santiago, November 8 th, 2005 Departamento de Ciencias de la Computación Universidad de Chile Outline Motivations Basics Burrows Wheeler Transform FM
More informationAlgorithms for Calculating Statistical Properties on Moving Points
Algorithms for Calculating Statistical Properties on Moving Points Dissertation Proposal Sorelle Friedler Committee: David Mount (Chair), William Gasarch Samir Khuller, Amitabh Varshney January 14, 2009
More informationarxiv: v1 [cs.ds] 2 Dec 2009
Variants of Constrained Longest Common Subsequence arxiv:0912.0368v1 [cs.ds] 2 Dec 2009 Paola Bonizzoni Gianluca Della Vedova Riccardo Dondi Yuri Pirola Abstract In this work, we consider a variant of
More informationSamson Zhou. Pattern Matching over Noisy Data Streams
Samson Zhou Pattern Matching over Noisy Data Streams Finding Structure in Data Pattern Matching Finding all instances of a pattern within a string ABCD ABCAABCDAACAABCDBCABCDADDDEAEABCDA Knuth-Morris-Pratt
More informationSelf-Indexed Grammar-Based Compression
Fundamenta Informaticae XXI (2001) 1001 1025 1001 IOS Press Self-Indexed Grammar-Based Compression Francisco Claude David R. Cheriton School of Computer Science University of Waterloo fclaude@cs.uwaterloo.ca
More informationLecture 1 : Data Compression and Entropy
CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for
More informationString Matching. Thanks to Piotr Indyk. String Matching. Simple Algorithm. for s 0 to n-m. Match 0. for j 1 to m if T[s+j] P[j] then
String Matching Thanks to Piotr Indyk String Matching Input: Two strings T[1 n] and P[1 m], containing symbols from alphabet Σ Goal: find all shifts 0 s n-m such that T[s+1 s+m]=p Example: Σ={,a,b,,z}
More informationEnds of Finitely Generated Groups from a Nonstandard Perspective
of Finitely of Finitely from a University of Illinois at Urbana Champaign McMaster Model Theory Seminar September 23, 2008 Outline of Finitely Outline of Finitely Outline of Finitely Outline of Finitely
More informationMotivation for Arithmetic Coding
Motivation for Arithmetic Coding Motivations for arithmetic coding: 1) Huffman coding algorithm can generate prefix codes with a minimum average codeword length. But this length is usually strictly greater
More information25 Minimum bandwidth: Approximation via volume respecting embeddings
25 Minimum bandwidth: Approximation via volume respecting embeddings We continue the study of Volume respecting embeddings. In the last lecture, we motivated the use of volume respecting embeddings by
More informationStreaming and communication complexity of Hamming distance
Streaming and communication complexity of Hamming distance Tatiana Starikovskaya IRIF, Université Paris-Diderot (Joint work with Raphaël Clifford, ICALP 16) Approximate pattern matching Problem Pattern
More informationSpace-Efficient Re-Pair Compression
Space-Efficient Re-Pair Compression Philip Bille, Inge Li Gørtz, and Nicola Prezza Technical University of Denmark, DTU Compute {phbi,inge,npre}@dtu.dk Abstract Re-Pair [5] is an effective grammar-based
More informationSmall-Space Dictionary Matching (Dissertation Proposal)
Small-Space Dictionary Matching (Dissertation Proposal) Graduate Center of CUNY 1/24/2012 Problem Definition Dictionary Matching Input: Dictionary D = P 1,P 2,...,P d containing d patterns. Text T of length
More informationSpace Complexity vs. Query Complexity
Space Complexity vs. Query Complexity Oded Lachish Ilan Newman Asaf Shapira Abstract Combinatorial property testing deals with the following relaxation of decision problems: Given a fixed property and
More informationRecursive Definitions
Recursive Definitions Example: Give a recursive definition of a n. a R and n N. Basis: n = 0, a 0 = 1. Recursion: a n+1 = a a n. Example: Give a recursive definition of n i=0 a i. Let S n = n i=0 a i,
More informationGeometric Optimization Problems over Sliding Windows
Geometric Optimization Problems over Sliding Windows Timothy M. Chan and Bashir S. Sadjad School of Computer Science University of Waterloo Waterloo, Ontario, N2L 3G1, Canada {tmchan,bssadjad}@uwaterloo.ca
More informationA Sublinear Algorithm for Weakly Approximating Edit Distance
A Sublinear Algorithm for Weakly Approximating Edit Distance Tuğkan Batu University of Pennsylvania batu@cis.upenn.edu Avner Magen University of Toronto avner@cs.toronto.edu Funda Ergün Case Western Reserve
More informationON THE BIT-COMPLEXITY OF LEMPEL-ZIV COMPRESSION
ON THE BIT-COMPLEXITY OF LEMPEL-ZIV COMPRESSION PAOLO FERRAGINA, IGOR NITTO, AND ROSSANO VENTURINI Abstract. One of the most famous and investigated lossless data-compression schemes is the one introduced
More informationTree Adjoining Grammars
Tree Adjoining Grammars TAG: Parsing and formal properties Laura Kallmeyer & Benjamin Burkhardt HHU Düsseldorf WS 2017/2018 1 / 36 Outline 1 Parsing as deduction 2 CYK for TAG 3 Closure properties of TALs
More informationInf2A: The Pumping Lemma
Inf2A: Stuart Anderson School of Informatics University of Edinburgh October 8, 2009 Outline 1 Deterministic Finite State Machines and Regular Languages 2 3 4 The language of a DFA ( M = Q, Σ, q 0, F,
More informationString Indexing for Patterns with Wildcards
MASTER S THESIS String Indexing for Patterns with Wildcards Hjalte Wedel Vildhøj and Søren Vind Technical University of Denmark August 8, 2011 Abstract We consider the problem of indexing a string t of
More informationarxiv: v1 [cs.fl] 29 Jun 2013
On a compact encoding of the swap automaton Kimmo Fredriksson 1 and Emanuele Giaquinta 2 arxiv:1307.0099v1 [cs.fl] 29 Jun 2013 1 School of Computing, University of Eastern Finland kimmo.fredriksson@uef.fi
More informationThe streaming k-mismatch problem
The streaming k-mismatch problem Raphaël Clifford 1, Tomasz Kociumaka 2, and Ely Porat 3 1 Department of Computer Science, University of Bristol, United Kingdom raphael.clifford@bristol.ac.uk 2 Institute
More informationarxiv: v1 [cs.ds] 9 Apr 2018
From Regular Expression Matching to Parsing Philip Bille Technical University of Denmark phbi@dtu.dk Inge Li Gørtz Technical University of Denmark inge@dtu.dk arxiv:1804.02906v1 [cs.ds] 9 Apr 2018 Abstract
More informationGuess & Check Codes for Deletions, Insertions, and Synchronization
Guess & Check Codes for Deletions, Insertions, and Synchronization Serge Kas Hanna, Salim El Rouayheb ECE Department, Rutgers University sergekhanna@rutgersedu, salimelrouayheb@rutgersedu arxiv:759569v3
More informationarxiv: v1 [cs.dc] 4 Oct 2018
Distributed Reconfiguration of Maximal Independent Sets Keren Censor-Hillel 1 and Mikael Rabie 2 1 Department of Computer Science, Technion, Israel, ckeren@cs.technion.ac.il 2 Aalto University, Helsinki,
More informationInformation Complexity vs. Communication Complexity: Hidden Layers Game
Information Complexity vs. Communication Complexity: Hidden Layers Game Jiahui Liu Final Project Presentation for Information Theory in TCS Introduction Review of IC vs CC Hidden Layers Game Upper Bound
More informationOptimal compression of approximate Euclidean distances
Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch
More informationAverage Complexity of Exact and Approximate Multiple String Matching
Average Complexity of Exact and Approximate Multiple String Matching Gonzalo Navarro Department of Computer Science University of Chile gnavarro@dcc.uchile.cl Kimmo Fredriksson Department of Computer Science
More informationA Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus
A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus Timothy A. D. Fowler Department of Computer Science University of Toronto 10 King s College Rd., Toronto, ON, M5S 3G4, Canada
More informationAutomata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,
Automata & languages A primer on the Theory of Computation Laurent Vanbever www.vanbever.eu ETH Zürich (D-ITET) September, 24 2015 Last week was all about Deterministic Finite Automaton We saw three main
More informationPerfect Two-Fault Tolerant Search with Minimum Adaptiveness 1
Advances in Applied Mathematics 25, 65 101 (2000) doi:10.1006/aama.2000.0688, available online at http://www.idealibrary.com on Perfect Two-Fault Tolerant Search with Minimum Adaptiveness 1 Ferdinando
More informationarxiv: v1 [math.co] 11 Jul 2016
Characterization and recognition of proper tagged probe interval graphs Sourav Chakraborty, Shamik Ghosh, Sanchita Paul and Malay Sen arxiv:1607.02922v1 [math.co] 11 Jul 2016 October 29, 2018 Abstract
More informationName Geometry Common Core Regents Review Packet - 3. Topic 1 : Equation of a circle
Name Geometry Common Core Regents Review Packet - 3 Topic 1 : Equation of a circle Equation with center (0,0) and radius r Equation with center (h,k) and radius r ( ) ( ) 1. The endpoints of a diameter
More informationarxiv: v1 [cs.cc] 15 Nov 2016
Diploid Alignment is NP-hard Romeo Rizzi 1, Massimo Cairo 1, Veli Mäkinen 2, and Daniel Valenzuela 2 1 Department of Computer Science, University of Verona, Italy 2 Helsinki Institute for Information echnology,
More informationThe Intractability of Computing the Hamming Distance
The Intractability of Computing the Hamming Distance Bodo Manthey and Rüdiger Reischuk Universität zu Lübeck, Institut für Theoretische Informatik Wallstraße 40, 23560 Lübeck, Germany manthey/reischuk@tcs.uni-luebeck.de
More informationStreaming algorithms for embedding and computing edit distance in the low distance regime
Electronic Colloquium on Computational Complexity, Revision 1 of Report No. 111 (2015) Streaming algorithms for embedding and computing edit distance in the low distance regime Diptarka Chakraborty Department
More informationBOUNDS ON ZIMIN WORD AVOIDANCE
BOUNDS ON ZIMIN WORD AVOIDANCE JOSHUA COOPER* AND DANNY RORABAUGH* Abstract. How long can a word be that avoids the unavoidable? Word W encounters word V provided there is a homomorphism φ defined by mapping
More informationTesting random variables for independence and identity
Testing random variables for independence and identity Tuğkan Batu Eldar Fischer Lance Fortnow Ravi Kumar Ronitt Rubinfeld Patrick White January 10, 2003 Abstract Given access to independent samples of
More informationCS 455/555: Mathematical preliminaries
CS 455/555: Mathematical preliminaries Stefan D. Bruda Winter 2019 SETS AND RELATIONS Sets: Operations: intersection, union, difference, Cartesian product Big, powerset (2 A ) Partition (π 2 A, π, i j
More informationString Range Matching
String Range Matching Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi Department of Computer Science, University of Helsinki Helsinki, Finland firstname.lastname@cs.helsinki.fi Abstract. Given strings
More informationDiscrete Mathematics & Mathematical Reasoning Chapter 6: Counting
Discrete Mathematics & Mathematical Reasoning Chapter 6: Counting Kousha Etessami U. of Edinburgh, UK Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 6) 1 / 39 Chapter Summary The Basics
More informationTitle. Author(s) 花田, 博幸. Issue Date DOI. Doc URL. Type. File Information. The q-gram Distance as an Approximation of the Edit
Title The q-gram Distance as an Approximation of the Edit Author(s) 花田, 博幸 Issue Date 2014-06-30 DOI 10.14943/doctoral.k11490 Doc URL http://hdl.handle.net/2115/64515 Type theses (doctoral) File Information
More informationCS 530: Theory of Computation Based on Sipser (second edition): Notes on regular languages(version 1.1)
CS 530: Theory of Computation Based on Sipser (second edition): Notes on regular languages(version 1.1) Definition 1 (Alphabet) A alphabet is a finite set of objects called symbols. Definition 2 (String)
More informationLongest Gapped Repeats and Palindromes
Discrete Mathematics and Theoretical Computer Science DMTCS vol. 19:4, 2017, #4 Longest Gapped Repeats and Palindromes Marius Dumitran 1 Paweł Gawrychowski 2 Florin Manea 3 arxiv:1511.07180v4 [cs.ds] 11
More informationComputing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome
Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression Sergio De Agostino Sapienza University di Rome Parallel Systems A parallel random access machine (PRAM)
More informationMultiple Pattern Matching
Multiple Pattern Matching Stephen Fulwider and Amar Mukherjee College of Engineering and Computer Science University of Central Florida Orlando, FL USA Email: {stephen,amar}@cs.ucf.edu Abstract In this
More informationAdapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts
Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts Domenico Cantone Simone Faro Emanuele Giaquinta Department of Mathematics and Computer Science, University of Catania, Italy 1 /
More informationThe Smoothed Complexity of Edit Distance 1
0 The Smoothed Complexity of Edit Distance 1 Alexandr Andoni 2, Microsoft Research SVC (andoni@microsoft.com) Robert Krauthgamer 3, The Weizmann Institute of Science (robert.krauthgamer@weizmann.ac.il)
More informationarxiv: v2 [cs.ds] 28 Jan 2009
Minimax Trees in Linear Time Pawe l Gawrychowski 1 and Travis Gagie 2, arxiv:0812.2868v2 [cs.ds] 28 Jan 2009 1 Institute of Computer Science University of Wroclaw, Poland gawry1@gmail.com 2 Research Group
More informationProofs of Proximity for Context-Free Languages and Read-Once Branching Programs
Proofs of Proximity for Context-Free Languages and Read-Once Branching Programs Oded Goldreich Weizmann Institute of Science oded.goldreich@weizmann.ac.il Ron D. Rothblum Weizmann Institute of Science
More informationarxiv:cs/ v1 [cs.dm] 7 May 2006
arxiv:cs/0605026v1 [cs.dm] 7 May 2006 Strongly Almost Periodic Sequences under Finite Automata Mappings Yuri Pritykin April 11, 2017 Abstract The notion of almost periodicity nontrivially generalizes the
More informationTheoretical Computer Science. Dynamic rank/select structures with applications to run-length encoded texts
Theoretical Computer Science 410 (2009) 4402 4413 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Dynamic rank/select structures with
More informationFinite State Automata Design
Finite State Automata Design Nicholas Mainardi 1 Dipartimento di Elettronica e Informazione Politecnico di Milano nicholas.mainardi@polimi.it March 14, 2017 1 Mostly based on Alessandro Barenghi s material,
More informationExam 1 CSU 390 Theory of Computation Fall 2007
Exam 1 CSU 390 Theory of Computation Fall 2007 Solutions Problem 1 [10 points] Construct a state transition diagram for a DFA that recognizes the following language over the alphabet Σ = {a, b}: L 1 =
More informationthe subset partial order Paul Pritchard Technical Report CIT School of Computing and Information Technology
A simple sub-quadratic algorithm for computing the subset partial order Paul Pritchard P.Pritchard@cit.gu.edu.au Technical Report CIT-95-04 School of Computing and Information Technology Grith University
More informationCONWAY S COSMOLOGICAL THEOREM
CONWAY S COSMOLOGICAL THEOREM R.A. LITHERLAND. Introduction In [C], Conway introduced an operator on strings (finite sequences) of positive integers, the audioactive (or look and say ) operator. Usually,
More informationEfficient (δ, γ)-pattern-matching with Don t Cares
fficient (δ, γ)-pattern-matching with Don t Cares Yoan José Pinzón Ardila Costas S. Iliopoulos Manolis Christodoulakis Manal Mohamed King s College London, Department of Computer Science, London WC2R 2LS,
More informationHow do regular expressions work? CMSC 330: Organization of Programming Languages
How do regular expressions work? CMSC 330: Organization of Programming Languages Regular Expressions and Finite Automata What we ve learned What regular expressions are What they can express, and cannot
More information