Streaming and communication complexity of Hamming distance
|
|
- Doris Garrett
- 5 years ago
- Views:
Transcription
1 Streaming and communication complexity of Hamming distance Tatiana Starikovskaya IRIF, Université Paris-Diderot (Joint work with Raphaël Clifford, ICALP 16)
2 Approximate pattern matching Problem Pattern P of length n, text T Find the Hamming distance between P and each n-length substring of T
3 Approximate pattern matching Problem Pattern P of length n, text T Find the Hamming distance between P and each n-length substring of T Big Data Applications Computational biology Signal processing Text retrieval Standard algorithms: Ω(n) space
4 Model of computation Problem Pattern P of length n, text T Find the Hamming distance between P and each n-length substring of T Model T = stream of characters Length of the text and size of the universe are extremely large Can t store a copy of T or P Space = total space used; Time = time per character of T Text T c
5 Model of computation Problem Pattern P of length n, text T Find the Hamming distance between P and each n-length substring of T Model T = stream of characters Length of the text and size of the universe are extremely large Can t store a copy of T or P Space = total space used; Time = time per character of T Text T c a
6 Model of computation Problem Pattern P of length n, text T Find the Hamming distance between P and each n-length substring of T Model T = stream of characters Length of the text and size of the universe are extremely large Can t store a copy of T or P Space = total space used; Time = time per character of T Text T c a a
7 Model of computation Problem Pattern P of length n, text T Find the Hamming distance between P and each n-length substring of T Model T = stream of characters Length of the text and size of the universe are extremely large Can t store a copy of T or P Space = total space used; Time = time per character of T Text T c a a b
8 Model of computation Problem Pattern P of length n, text T Find the Hamming distance between P and each n-length substring of T Model T = stream of characters Length of the text and size of the universe are extremely large Can t store a copy of T or P Space = total space used; Time = time per character of T Text T c a a b c
9 Model of computation Problem Pattern P of length n, text T Find the Hamming distance between P and each n-length substring of T Model T = stream of characters Length of the text and size of the universe are extremely large Can t store a copy of T or P Space = total space used; Time = time per character of T Text T c a a b c a b c a a a c Pattern P
10 Model of computation Problem Pattern P of length n, text T Find the Hamming distance between P and each n-length substring of T Model T = stream of characters Length of the text and size of the universe are extremely large Can t store a copy of T or P Space = total space used; Time = time per character of T Text T c a a b c a a b c a a a c Pattern P
11 Model of computation Problem Pattern P of length n, text T Find the Hamming distance between P and each n-length substring of T Model T = stream of characters Length of the text and size of the universe are extremely large Can t store a copy of T or P Space = total space used; Time = time per character of T Text T c a a b c a a a b c a a a c Pattern P
12 Model of computation Problem Pattern P of length n, text T Find the Hamming distance between P and each n-length substring of T Model T = stream of characters Length of the text and size of the universe are extremely large Can t store a copy of T or P Space = total space used; Time = time per character of T Text T c a a b c a a a c b c a a a c Pattern P
13 Model of computation Problem Pattern P of length n, text T Find the Hamming distance between P and each n-length substring of T Model T = stream of characters Length of the text and size of the universe are extremely large Can t store a copy of T or P Space = total space used; Time = time per character of T Text T c a a b c a a a c a b c a a a c Pattern P
14 What is known: Hamming distance All distances Space Ω(n) [Folklore] Time O(log 2 n) [Clifford et al., CPM 11]
15 What is known: Hamming distance All distances Space Ω(n) [Folklore] Time O(log 2 n) [Clifford et al., CPM 11] Only distances k [Clifford et al., SODA 16] Exact values: space O(k 2 polylog n), time O( k log k + polylog n) (1 + ε)-approx.: space O(ε 2 k 2 polylog n), time O(ε 2 polylog n)
16 This work: (1+ε)-Approximate HDs problem Lower bounds: reduction to a CC problem Upper bounds: show a streaming algorithm
17 This work: (1+ε)-Approximate HDs problem Lower bounds: reduction to a CC problem Upper bounds: show a streaming algorithm Let's discuss that!
18 Lower bound for all HDs, approximate Bob Charlie b a a b a b a a a a a a Alice a a a a a a 3-parties CC problem Alice holds the pattern, Bob holds T[1, n], Charlie holds T[n + 1, 2n] Charlie s output: (1 + ε)-hd for each alignment of P and T Min. communication between Alice, Bob, and Charlie?
19 Lower bound for all HDs, approximate Bob Charlie b a a b a b a a a a a a Alice a a a a a a Streaming algorithm: T = stream, not allowed to store a copy of P or T, output = (1 + ε)-hds At time = n it stores all the information needed to compute the (1 + ε)-hds Comm. protocol: send this information from A and B to C Lower bound for the CC problem streaming lower bound
20 This work: (1+ε)-Approximate HDs problem Lower bounds: reduction to a CC problem Upper bounds: show a streaming algorithm
21 This work: (1+ε)-Approximate HDs problem Lower bounds: reduction to a CC problem Upper bounds: show a streaming algorithm 3-parties CC problem Simpler CC problem: B and C know the pattern
22 This work: (1+ε)-Approximate HDs problem Lower bounds: reduction to a CC problem Upper bounds: show a streaming algorithm 3-parties CC problem Simpler CC problem: B and C know the pattern Upper bounds Upper bounds
23 This work: (1+ε)-Approximate HDs problem Lower bounds: reduction to a CC problem Upper bounds: show a streaming algorithm 3-parties CC problem Simpler CC problem: B and C know the pattern Upper bounds Upper bounds
24 This work: (1+ε)-Approximate HDs problem Lower bounds: reduction to a CC problem Upper bounds: show a streaming algorithm 3-parties CC problem Simpler CC problem: B and C know the pattern Upper bounds Upper bounds
25 Communication complexity
26 Simpler CC problem: B and C know the pattern Lower bound: Ω(ε 1 log 2 ε 1 n) Bob Charlie b a a b a b a a a a a a Window counting: (1 + ε)-approx. of #(b) in a sliding window of width n = (1 + ε)-approx. of HD between P = aa... a and T Ω(ε 1 log 2 ε 1 n) bits [Datar et al., 2013]
27 3-parties CC problem Lower bound: Ω(ε 1 log 2 ε 1 n + ε 2 log n) Bob Charlie b a a b a b a a a a a a Output = (1 + ε)-hd between T[1, n] and T[n + 1, 2n] = (1 + ε)-approx. of HD between T = T[1, n] (Bob and Charlie) and P = T[n + 1, 2n] (Alice) Ω(ε 2 log n) bits [Jayram & Woordruff, 2013]
28 Important notion: (1 + ε)-approximate sketch for HD Intuition Sketch of a string is a very short vector L 2 -distance between sketches HD between strings
29 Important notion: (1 + ε)-approximate sketch for HD Intuition Sketch of a string is a very short vector L 2 -distance between sketches HD between strings Formal definition (binary alphabets) Y = 1/ε 2 n matrix of IID unbiased ±1 random variables sketch(s) = length = 1/ε 2 ±1 ±1... S[1] ±1 S[2] Y S
30 Important notion: (1 + ε)-approximate sketch for HD Formal definition (binary alphabets) Y = 1/ε 2 n matrix of IID unbiased ±1 random variables sketch(s) = YS length = 1/ε 2
31 Important notion: (1 + ε)-approximate sketch for HD Formal definition (binary alphabets) Y = 1/ε 2 n matrix of IID unbiased ±1 random variables sketch(s) = YS length = 1/ε 2 Lemma (1 ε) HD(S 1, S 2 ) ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 (1 + ε) HD(S 1, S 2 ) Proof
32 Important notion: (1 + ε)-approximate sketch for HD Formal definition (binary alphabets) Y = 1/ε 2 n matrix of IID unbiased ±1 random variables sketch(s) = YS length = 1/ε 2 Lemma (1 ε) HD(S 1, S 2 ) ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 (1 + ε) HD(S 1, S 2 ) Proof E[ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 ] = E[ε2 Y(S 1 S 2 ) 2 2 ] = ε2 E[ Y(S 1 S 2 ) 2 2 ] =
33 Important notion: (1 + ε)-approximate sketch for HD Formal definition (binary alphabets) Y = 1/ε 2 n matrix of IID unbiased ±1 random variables sketch(s) = YS length = 1/ε 2 Lemma (1 ε) HD(S 1, S 2 ) ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 (1 + ε) HD(S 1, S 2 ) Proof E[ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 ] = E[ε2 Y(S 1 S 2 ) 2 2 ] = ε2 E[ Y(S 1 S 2 ) 2 2 ] = = ε 2 E[ 1/ε2 j=1 (Y j(s 1 S 2 )) 2 ] = E[(Y 1 (S 1 S 2 )) 2 ] = S 1 S 2 2 2
34 Important notion: (1 + ε)-approximate sketch for HD Formal definition (binary alphabets) Y = 1/ε 2 n matrix of IID unbiased ±1 random variables sketch(s) = YS length = 1/ε 2 Lemma (1 ε) HD(S 1, S 2 ) ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 (1 + ε) HD(S 1, S 2 ) Proof E[ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 ] = S 1 S 2 2 2
35 Important notion: (1 + ε)-approximate sketch for HD Formal definition (binary alphabets) Y = 1/ε 2 n matrix of IID unbiased ±1 random variables sketch(s) = YS length = 1/ε 2 Lemma (1 ε) HD(S 1, S 2 ) ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 (1 + ε) HD(S 1, S 2 ) Proof E[ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 ] = S 1 S Var[ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 ] = ε2 Var[(Y 1 (S 1 S 2 )) 2 ]
36 Important notion: (1 + ε)-approximate sketch for HD Formal definition (binary alphabets) Y = 1/ε 2 n matrix of IID unbiased ±1 random variables sketch(s) = YS length = 1/ε 2 Lemma (1 ε) HD(S 1, S 2 ) ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 (1 + ε) HD(S 1, S 2 ) Proof E[ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 ] = S 1 S Var[ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 ] = ε2 Var[(Y 1 (S 1 S 2 )) 2 ] ε 2 E[(Y 1 (S 1 S 2 )) 4 ] ε 2 C E[(Y 1 (S 1 S 2 )) 2 ] 2 = ε 2 C S 1 S 2 4 2
37 Important notion: (1 + ε)-approximate sketch for HD Formal definition (binary alphabets) Y = 1/ε 2 n matrix of IID unbiased ±1 random variables sketch(s) = YS length = 1/ε 2 Lemma (1 ε) HD(S 1, S 2 ) ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 (1 + ε) HD(S 1, S 2 ) Proof E[ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 ] = S 1 S Var[ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 ] ε2 C S 1 S 2 4 2
38 Important notion: (1 + ε)-approximate sketch for HD Formal definition (binary alphabets) Y = 1/ε 2 n matrix of IID unbiased ±1 random variables sketch(s) = YS length = 1/ε 2 Lemma (1 ε) HD(S 1, S 2 ) ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 (1 + ε) HD(S 1, S 2 ) Proof E[ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 ] = S 1 S Var[ε 2 sketch(s 1 ) sketch(s 2 ) 2 2 ] ε2 C S 1 S By Chebyshev s inequality, with constant probability: (1 ε) S 1 S ε2 sketch(s 1 ) sketch(s 2 ) 2 2 (1 + ε) S 1 S 2 2 2
39 Important notion: (1 + ε)-approximate sketch for HD One more trick Y can be generated from O(log n) random bits (random preudorandom)
40 Important notion: (1 + ε)-approximate sketch for HD One more trick Y can be generated from O(log n) random bits (random preudorandom) Summary Sketch of a string is a vector of length O(ε 2 log n) bits Sketches give (1 + ε)-approximation of HD
41 Simpler CC problem: B and C know the pattern Bob Charlie b a a b a b a a a a a a B & C a a a a a a B knows T[1, n], C knows T[n + 1, 2n], B and C know P Observation: C doesn t need any information to compute HDs between suffixes of P and T[n + 1, 2n]
42 Simpler CC problem: B and C know the pattern HD (1/ε) HD (1/ε) 3 Bob HD (1/ε) 2 Select O(log ε n) prefixes of the pattern First prefix: Prefix of maximal length l 1 with HD (1/ε) 2 Second prefix: Prefix of maximal length l 2 l 1 with HD (1/ε) 3...
43 Simpler CC problem: B and C know the pattern HD (1/ε) HD (1/ε) 3 Bob HD (1/ε) 2 Divide prefix j into 1/ε 2 blocks with HD (1/ε) j 1
44 Simpler CC problem: B and C know the pattern HD (1/ε) HD (1/ε) 3 Bob HD (1/ε) 2 Divide prefix j into 1/ε 2 blocks with HD (1/ε) j 1 Compute O(1/ε 2 ) sketches for the text
45 Simpler CC problem: B and C know the pattern HD (1/ε) HD (1/ε) 3 Bob HD (1/ε) 2 sketch sketch sketch sketch Divide prefix j into 1/ε 2 blocks with HD (1/ε) j 1 Compute O(1/ε 2 ) sketches for the text Send the block borders and the sketches to Charlie
46 Simpler CC problem: B and C know the pattern HD (1/ε) HD (1/ε) 3 Bob HD (1/ε) 2
47 Simpler CC problem: B and C know the pattern HD (1/ε) HD (1/ε) 3 Bob HD (1/ε) 2 sketch sketch sketch sketch P 1 P 2 Find the shortest prefix containing P
48 Simpler CC problem: B and C know the pattern HD (1/ε) HD (1/ε) 3 Bob HD (1/ε) 2 sketch sketch sketch sketch P 1 P 2 Find the shortest prefix containing P HD(P 2, T): use sketches (1 + ε)-approximation
49 Simpler CC problem: B and C know the pattern HD (1/ε) HD (1/ε) 3 Bob HD (1/ε) 2 sketch sketch sketch sketch P 1 P 2 Find the shortest prefix containing P HD(P 2, T): use sketches (1 + ε)-approximation HD(P 1, T): use the prefix s block additive error ε HD(P, T)
50 Simpler CC problem: B and C know the pattern HD (1/ε) HD (1/ε) 3 Bob HD (1/ε) 2 sketch sketch sketch sketch P 1 P 2 Find the shortest prefix containing P HD(P 2, T): use sketches (1 + ε)-approximation HD(P 1, T): use the prefix s block additive error ε HD(P, T) CC = O(ε 4 log 2 n) [Lower bound: Ω(ε 1 log 2 ε 1 n)]
51 This work: (1+ε)-Approximate HDs problem Lower bounds: reduction to a CC problem Upper bounds: show a streaming algorithm 3-parties CC problem Simpler CC problem: B and C know the pattern Upper bounds Upper bounds
52 3-parties CC problem Bob Charlie c a a b a b a a a a a a Alice a a a a a a B knows T[1, n], C knows T[n + 1, 2n], only A knows P Observation: C doesn t need any information to compute HDs between suffixes of P and his part of the text Can t use prefixes of P to approximate T C doesn t know P
53 3-parties CC problem Bob sketch B sketch B sketch B sketch B sketch B sketch B sketch B sketch B sketch P 1 P 2 Divide the text T into blocks of length B = n Compute a sketch of each block Large Hamming distance: HD (prefix of P, T) B/ε HD(P1, T): use sketches to compute (1 + ε)-approx. H HD(P2, T): ignore
54 3-parties CC problem Bob sketch B sketch B sketch B sketch B sketch B sketch B sketch B sketch B sketch P 1 P 2 Divide the text T into blocks of length B = n Compute a sketch of each block Large Hamming distance: HD (prefix of P, T) B/ε HD(P1, T): use sketches to compute (1 + ε)-approx. H HD(P2, T): ignore Lemma H is a good approximation of HD Proof 1. H (1 + ε) HD(P 2, T) (1 + ε) HD 2. H (1 ε) HD(P 2, T) (1 ε) HD HD(P 1, T) (1 2ε) HD
55 3-parties CC problem Bob B B B B B B B B B P 1 P 2 Small Hamming distance: HD (prefix of P, T) B/ε If #( ) in a block 1, B sends it to C Starting from the first block where #( ) 2, T and P can be encoded in small space (periodicity) C can restore P and T from the encoding and compute HDs CC = O(1/ε 2 n log n) [Lower bound: Ω(ε 2 log n + ε 1 log 2 ε 1 n)]
56 This work: (1+ε)-Approximate HDs problem Lower bounds: reduction to a CC problem Upper bounds: show a streaming algorithm 3-parties CC problem Simpler CC problem: B and C know the pattern Upper bounds Upper bounds
57 Streaming algorithm
58 Streaming algorithm Text T sketch Pattern P sketch Reminder Y = 1/ε 2 n matrix of IID unbiased ±1 random variables sketch(s) = Y S Problem How to maintain the sketch of T? We don t have random access to T and we can t store many of its characters
59 Streaming algorithm Reminder B sketch B sketch B sketch super-sketch B sketch Y = (1/ε 2 ) n matrix of IID unbiased ±1 random variables sketch(s) = Y S New notion: super-sketch σ i IID unbiased ±1 variables super-sketch = σ i sketch i Analysis: similar to sketches
60 Streaming algorithm B B B B B sketch sketch sketch super-sketch P[1, B i] P[B i + 1, n i] P[n i + 1, n] HD between P[B i + 1, n i] and T: super-sketch Store a super-sketch for each (n B)-length substring of P B = n/ε super-sketches in total At each block border compute a super-sketch of the last n/b blocks from their sketches O(n/B) = O(ε n) time, can be de-amortized
61 Streaming algorithm B B B B B simpler CC problem super-sketch sketch P[1, B i] P[B i + 1, n i] P[n i + 1, n] HD between the suffix of P and T: sketch
62 Streaming algorithm B B B B B simpler CC problem super-sketch sketch P[1, B i] P[B i + 1, n i] P[n i + 1, n] HD between the suffix of P and T: sketch HD between the prefix of P and T: similar to the simpler CC problem for the pattern P[1, B] B B simpler CC problem P[1, B i] Complexity: O(1/ε 3 n log 2 n) bits of space, O(1/ε 2 log 2 n) time
63 This work: (1+ε)-Approximate HDs problem Lower bounds: reduction to a CC problem Upper bounds: O (ε 3 n log 1.5 n) 3-parties CC problem Simpler CC problem: B and C know the pattern Upper bounds O (ε 2 nlog 2 n) O (ε 2 log n+ε 1 log 2 (ε 1 n)) Upper bounds O (ε 4 log 2 n) O (ε 1 log 2 (ε 1 n))
Computing the Entropy of a Stream
Computing the Entropy of a Stream To appear in SODA 2007 Graham Cormode graham@research.att.com Amit Chakrabarti Dartmouth College Andrew McGregor U. Penn / UCSD Outline Introduction Entropy Upper Bound
More informationThe streaming k-mismatch problem
The streaming k-mismatch problem Raphaël Clifford 1, Tomasz Kociumaka 2, and Ely Porat 3 1 Department of Computer Science, University of Bristol, United Kingdom raphael.clifford@bristol.ac.uk 2 Institute
More informationTight Bounds for Distributed Functional Monitoring
Tight Bounds for Distributed Functional Monitoring Qin Zhang MADALGO, Aarhus University Joint with David Woodruff, IBM Almaden NII Shonan meeting, Japan Jan. 2012 1-1 The distributed streaming model (a.k.a.
More informationDynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.
Microsoft Research Asia September 5, 2005 1 2 3 4 Section I What is? Definition is a technique for efficiently recurrence computing by storing partial results. In this slides, I will NOT use too many formal
More informationTight Bounds for Distributed Streaming
Tight Bounds for Distributed Streaming (a.k.a., Distributed Functional Monitoring) David Woodruff IBM Research Almaden Qin Zhang MADALGO, Aarhus Univ. STOC 12 May 22, 2012 1-1 The distributed streaming
More informationAn exponential separation between quantum and classical one-way communication complexity
An exponential separation between quantum and classical one-way communication complexity Ashley Montanaro Centre for Quantum Information and Foundations, Department of Applied Mathematics and Theoretical
More informationThe space complexity of approximating the frequency moments
The space complexity of approximating the frequency moments Felix Biermeier November 24, 2015 1 Overview Introduction Approximations of frequency moments lower bounds 2 Frequency moments Problem Estimate
More informationMaximal Noise in Interactive Communication over Erasure Channels and Channels with Feedback
Maximal Noise in Interactive Communication over Erasure Channels and Channels with Feedback Klim Efremenko UC Berkeley klimefrem@gmail.com Ran Gelles Princeton University rgelles@cs.princeton.edu Bernhard
More informationLecture Lecture 9 October 1, 2015
CS 229r: Algorithms for Big Data Fall 2015 Lecture Lecture 9 October 1, 2015 Prof. Jelani Nelson Scribe: Rachit Singh 1 Overview In the last lecture we covered the distance to monotonicity (DTM) and longest
More informationBandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)
Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner
More informationAn Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems
An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya, Palash Dey, and David P. Woodruff Indian Institute of Science, Bangalore {arnabb,palash}@csa.iisc.ernet.in
More informationLower Bound Techniques for Multiparty Communication Complexity
Lower Bound Techniques for Multiparty Communication Complexity Qin Zhang Indiana University Bloomington Based on works with Jeff Phillips, Elad Verbin and David Woodruff 1-1 The multiparty number-in-hand
More informationA Near-Optimal Algorithm for Computing the Entropy of a Stream
A Near-Optimal Algorithm for Computing the Entropy of a Stream Amit Chakrabarti ac@cs.dartmouth.edu Graham Cormode graham@research.att.com Andrew McGregor andrewm@seas.upenn.edu Abstract We describe a
More informationSamson Zhou. Pattern Matching over Noisy Data Streams
Samson Zhou Pattern Matching over Noisy Data Streams Finding Structure in Data Pattern Matching Finding all instances of a pattern within a string ABCD ABCAABCDAACAABCDBCABCDADDDEAEABCDA Knuth-Morris-Pratt
More informationLecture 11: Quantum Information III - Source Coding
CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that
More informationInformation Complexity and Applications. Mark Braverman Princeton University and IAS FoCM 17 July 17, 2017
Information Complexity and Applications Mark Braverman Princeton University and IAS FoCM 17 July 17, 2017 Coding vs complexity: a tale of two theories Coding Goal: data transmission Different channels
More informationCS Communication Complexity: Applications and New Directions
CS 2429 - Communication Complexity: Applications and New Directions Lecturer: Toniann Pitassi 1 Introduction In this course we will define the basic two-party model of communication, as introduced in the
More informationLecture 16 Oct 21, 2014
CS 395T: Sublinear Algorithms Fall 24 Prof. Eric Price Lecture 6 Oct 2, 24 Scribe: Chi-Kit Lam Overview In this lecture we will talk about information and compression, which the Huffman coding can achieve
More informationCMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms
CMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms Instructor: Mohammad T. Hajiaghayi Scribe: Huijing Gong November 11, 2014 1 Overview In the
More informationMajority is incompressible by AC 0 [p] circuits
Majority is incompressible by AC 0 [p] circuits Igor Carboni Oliveira Columbia University Joint work with Rahul Santhanam (Univ. Edinburgh) 1 Part 1 Background, Examples, and Motivation 2 Basic Definitions
More informationLinear Sketches A Useful Tool in Streaming and Compressive Sensing
Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M =
More informationTrace Reconstruction Revisited
Trace Reconstruction Revisited Andrew McGregor 1, Eric Price 2, Sofya Vorotnikova 1 1 University of Massachusetts Amherst 2 IBM Almaden Research Center Problem Description Take original string x of length
More informationDistributed Statistical Estimation of Matrix Products with Applications
Distributed Statistical Estimation of Matrix Products with Applications David Woodruff CMU Qin Zhang IUB PODS 2018 June, 2018 1-1 The Distributed Computation Model p-norms, heavy-hitters,... A {0, 1} m
More informationJumbled String Matching: Motivations, Variants, Algorithms
Jumbled String Matching: Motivations, Variants, Algorithms Zsuzsanna Lipták University of Verona (Italy) Workshop Combinatorial structures for sequence analysis in bioinformatics Milano-Bicocca, 27 Nov
More informationLecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1
Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a
More informationMultimedia Communications. Mathematical Preliminaries for Lossless Compression
Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when
More informationNondeterministic Finite Automata
Nondeterministic Finite Automata Mahesh Viswanathan Introducing Nondeterminism Consider the machine shown in Figure. Like a DFA it has finitely many states and transitions labeled by symbols from an input
More informationLecture 4: September 19
CSCI1810: Computational Molecular Biology Fall 2017 Lecture 4: September 19 Lecturer: Sorin Istrail Scribe: Cyrus Cousins Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More informationAlphabet Friendly FM Index
Alphabet Friendly FM Index Author: Rodrigo González Santiago, November 8 th, 2005 Departamento de Ciencias de la Computación Universidad de Chile Outline Motivations Basics Burrows Wheeler Transform FM
More informationEfficient Approximation of Large LCS in Strings Over Not Small Alphabet
Efficient Approximation of Large LCS in Strings Over Not Small Alphabet Gad M. Landau 1, Avivit Levy 2,3, and Ilan Newman 1 1 Department of Computer Science, Haifa University, Haifa 31905, Israel. E-mail:
More informationFourier analysis of boolean functions in quantum computation
Fourier analysis of boolean functions in quantum computation Ashley Montanaro Centre for Quantum Information and Foundations, Department of Applied Mathematics and Theoretical Physics, University of Cambridge
More informationCS 455/555: Mathematical preliminaries
CS 455/555: Mathematical preliminaries Stefan D. Bruda Winter 2019 SETS AND RELATIONS Sets: Operations: intersection, union, difference, Cartesian product Big, powerset (2 A ) Partition (π 2 A, π, i j
More informationOn the Number of Distinct Squares
Frantisek (Franya) Franek Advanced Optimization Laboratory Department of Computing and Software McMaster University, Hamilton, Ontario, Canada Invited talk - Prague Stringology Conference 2014 Outline
More informationarxiv: v1 [cs.ds] 21 Dec 2018
Lower bounds for text indexing with mismatches and differences Vincent Cohen-Addad 1 Sorbonne Université, UPMC Univ Paris 06, CNRS, LIP6 vcohenad@gmail.com Laurent Feuilloley 2 IRIF, CNRS and University
More informationInterval Selection in the streaming model
Interval Selection in the streaming model Pascal Bemmann Abstract In the interval selection problem we are given a set of intervals via a stream and want to nd the maximum set of pairwise independent intervals.
More informationSliding Window Algorithms for Regular Languages
Sliding Window Algorithms for Regular Languages Moses Ganardi, Danny Hucke, and Markus Lohrey Universität Siegen Department für Elektrotechnik und Informatik Hölderlinstrasse 3 D-57076 Siegen, Germany
More informationHow many double squares can a string contain?
How many double squares can a string contain? F. Franek, joint work with A. Deza and A. Thierry Algorithms Research Group Department of Computing and Software McMaster University, Hamilton, Ontario, Canada
More informationLecture 18 April 26, 2012
6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 18 April 26, 2012 1 Overview In the last lecture we introduced the concept of implicit, succinct, and compact data structures, and
More informationApproximate Pattern Matching and the Query Complexity of Edit Distance
Krzysztof Onak Approximate Pattern Matching p. 1/20 Approximate Pattern Matching and the Query Complexity of Edit Distance Joint work with: Krzysztof Onak MIT Alexandr Andoni (CCI) Robert Krauthgamer (Weizmann
More informationCOS597D: Information Theory in Computer Science October 19, Lecture 10
COS597D: Information Theory in Computer Science October 9, 20 Lecture 0 Lecturer: Mark Braverman Scribe: Andrej Risteski Kolmogorov Complexity In the previous lectures, we became acquainted with the concept
More informationSource Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria
Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal
More information3F1 Information Theory, Lecture 3
3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free
More informationTrace Reconstruction Revisited
Trace Reconstruction Revisited Andrew McGregor 1, Eric Price 2, and Sofya Vorotnikova 1 1 University of Massachusetts Amherst {mcgregor,svorotni}@cs.umass.edu 2 IBM Almaden Research Center ecprice@mit.edu
More information1 Randomized Computation
CS 6743 Lecture 17 1 Fall 2007 1 Randomized Computation Why is randomness useful? Imagine you have a stack of bank notes, with very few counterfeit ones. You want to choose a genuine bank note to pay at
More informationOptimal spaced seeds for faster approximate string matching
Optimal spaced seeds for faster approximate string matching Martin Farach-Colton Gad M. Landau S. Cenk Sahinalp Dekel Tsur Abstract Filtering is a standard technique for fast approximate string matching
More informationMaximal Unbordered Factors of Random Strings arxiv: v1 [cs.ds] 14 Apr 2017
Maximal Unbordered Factors of Random Strings arxiv:1704.04472v1 [cs.ds] 14 Apr 2017 Patrick Hagge Cording 1 and Mathias Bæk Tejs Knudsen 2 1 DTU Compute, Technical University of Denmark, phaco@dtu.dk 2
More informationEfficient Probabilistically Checkable Debates
Efficient Probabilistically Checkable Debates Andrew Drucker MIT Andrew Drucker MIT, Efficient Probabilistically Checkable Debates 1/53 Polynomial-time Debates Given: language L, string x; Player 1 argues
More informationState Complexity of Neighbourhoods and Approximate Pattern Matching
State Complexity of Neighbourhoods and Approximate Pattern Matching Timothy Ng, David Rappaport, and Kai Salomaa School of Computing, Queen s University, Kingston, Ontario K7L 3N6, Canada {ng, daver, ksalomaa}@cs.queensu.ca
More informationOptimal spaced seeds for faster approximate string matching
Optimal spaced seeds for faster approximate string matching Martin Farach-Colton Gad M. Landau S. Cenk Sahinalp Dekel Tsur Abstract Filtering is a standard technique for fast approximate string matching
More informationLecture 7: Fingerprinting. David Woodruff Carnegie Mellon University
Lecture 7: Fingerprinting David Woodruff Carnegie Mellon University How to Pick a Random Prime How to pick a random prime in the range {1, 2,, M}? How to pick a random integer X? Pick a uniformly random
More informationarxiv: v2 [cs.ds] 23 Feb 2012
Finding Matches in Unlabelled Streams Raphaël Clifford 1, Markus Jalsenius 1 and Benjamin Sach 2 1 Department of Computer Science, University of Bristol, U.K. 2 Department of Computer Science, University
More informationcommunication complexity lower bounds yield data structure lower bounds
communication complexity lower bounds yield data structure lower bounds Implementation of a database - D: D represents a subset S of {...N} 2 3 4 Access to D via "membership queries" - Q for each i, can
More informationString Matching. Thanks to Piotr Indyk. String Matching. Simple Algorithm. for s 0 to n-m. Match 0. for j 1 to m if T[s+j] P[j] then
String Matching Thanks to Piotr Indyk String Matching Input: Two strings T[1 n] and P[1 m], containing symbols from alphabet Σ Goal: find all shifts 0 s n-m such that T[s+1 s+m]=p Example: Σ={,a,b,,z}
More informationIntroduction to Turing Machines. Reading: Chapters 8 & 9
Introduction to Turing Machines Reading: Chapters 8 & 9 1 Turing Machines (TM) Generalize the class of CFLs: Recursively Enumerable Languages Recursive Languages Context-Free Languages Regular Languages
More informationImplementing Approximate Regularities
Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National
More informationStreaming Periodicity with Mismatches. Funda Ergun, Elena Grigorescu, Erfan Sadeqi Azer, Samson Zhou
Streaming Periodicity with Mismatches Funda Ergun, Elena Grigorescu, Erfan Sadeqi Azer, Samson Zhou Periodicity A portion of a string that repeats ABCDABCDABCDABCD ABCDABCDABCDABCD Periodicity Alternate
More informationTopics in Algorithms. 1 Generation of Basic Combinatorial Objects. Exercises. 1.1 Generation of Subsets. 1. Consider the sum:
Topics in Algorithms Exercises 1 Generation of Basic Combinatorial Objects 1.1 Generation of Subsets 1. Consider the sum: (ε 1,...,ε n) {0,1} n f(ε 1,..., ε n ) (a) Show that it may be calculated in O(n)
More informationInformation Complexity vs. Communication Complexity: Hidden Layers Game
Information Complexity vs. Communication Complexity: Hidden Layers Game Jiahui Liu Final Project Presentation for Information Theory in TCS Introduction Review of IC vs CC Hidden Layers Game Upper Bound
More informationProofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.
Proofs, Strings, and Finite Automata CS154 Chris Pollett Feb 5, 2007. Outline Proofs and Proof Strategies Strings Finding proofs Example: For every graph G, the sum of the degrees of all the nodes in G
More informationLongest Common Prefixes
Longest Common Prefixes The standard ordering for strings is the lexicographical order. It is induced by an order over the alphabet. We will use the same symbols (,
More informationMining Data Streams. The Stream Model. The Stream Model Sliding Windows Counting 1 s
Mining Data Streams The Stream Model Sliding Windows Counting 1 s 1 The Stream Model Data enters at a rapid rate from one or more input ports. The system cannot store the entire stream. How do you make
More information25.2 Last Time: Matrix Multiplication in Streaming Model
EE 381V: Large Scale Learning Fall 01 Lecture 5 April 18 Lecturer: Caramanis & Sanghavi Scribe: Kai-Yang Chiang 5.1 Review of Streaming Model Streaming model is a new model for presenting massive data.
More informationSpace Complexity vs. Query Complexity
Space Complexity vs. Query Complexity Oded Lachish Ilan Newman Asaf Shapira Abstract Combinatorial property testing deals with the following relaxation of decision problems: Given a fixed property and
More informationKolmogorov complexity and its applications
CS860, Winter, 2010 Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo http://www.cs.uwaterloo.ca/~mli/cs860.html We live in an information society. Information
More informationGeometric Problems in Moderate Dimensions
Geometric Problems in Moderate Dimensions Timothy Chan UIUC Basic Problems in Comp. Geometry Orthogonal range search preprocess n points in R d s.t. we can detect, or count, or report points inside a query
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 8 Greedy Algorithms V Huffman Codes Adam Smith Review Questions Let G be a connected undirected graph with distinct edge weights. Answer true or false: Let e be the
More informationToday: Linear Programming (con t.)
Today: Linear Programming (con t.) COSC 581, Algorithms April 10, 2014 Many of these slides are adapted from several online sources Reading Assignments Today s class: Chapter 29.4 Reading assignment for
More informationCMPSCI 311: Introduction to Algorithms Second Midterm Exam
CMPSCI 311: Introduction to Algorithms Second Midterm Exam April 11, 2018. Name: ID: Instructions: Answer the questions directly on the exam pages. Show all your work for each question. Providing more
More informationAnnouncements. CompSci 102 Discrete Math for Computer Science. Chap. 3.1 Algorithms. Specifying Algorithms
CompSci 102 Discrete Math for Computer Science Announcements Read for next time Chap. 3.1-3.3 Homework 3 due Tuesday We ll finish Chapter 2 first today February 7, 2012 Prof. Rodger Chap. 3.1 Algorithms
More information1 Estimating Frequency Moments in Streams
CS 598CSC: Algorithms for Big Data Lecture date: August 28, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri 1 Estimating Frequency Moments in Streams A significant fraction of streaming literature
More informationAn Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems
An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya Indian Institute of Science, Bangalore arnabb@csa.iisc.ernet.in Palash Dey Indian Institute of
More informationA Universal Turing Machine
A Universal Turing Machine A limitation of Turing Machines: Turing Machines are hardwired they execute only one program Real Computers are re-programmable Solution: Universal Turing Machine Attributes:
More informationAlgorithms and Data S tructures Structures Complexity Complexit of Algorithms Ulf Leser
Algorithms and Data Structures Complexity of Algorithms Ulf Leser Content of this Lecture Efficiency of Algorithms Machine Model Complexity Examples Multiplication of two binary numbers (unit cost?) Exact
More informationLecture 01 August 31, 2017
Sketching Algorithms for Big Data Fall 2017 Prof. Jelani Nelson Lecture 01 August 31, 2017 Scribe: Vinh-Kha Le 1 Overview In this lecture, we overviewed the six main topics covered in the course, reviewed
More informationKernelization Lower Bounds: A Brief History
Kernelization Lower Bounds: A Brief History G Philip Max Planck Institute for Informatics, Saarbrücken, Germany New Developments in Exact Algorithms and Lower Bounds. Pre-FSTTCS 2014 Workshop, IIT Delhi
More informationQuantum Error Correcting Codes and Quantum Cryptography. Peter Shor M.I.T. Cambridge, MA 02139
Quantum Error Correcting Codes and Quantum Cryptography Peter Shor M.I.T. Cambridge, MA 02139 1 We start out with two processes which are fundamentally quantum: superdense coding and teleportation. Superdense
More information2. Exact String Matching
2. Exact String Matching Let T = T [0..n) be the text and P = P [0..m) the pattern. We say that P occurs in T at position j if T [j..j + m) = P. Example: P = aine occurs at position 6 in T = karjalainen.
More informationCompressing Kinetic Data From Sensor Networks. Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park
Compressing Kinetic Data From Sensor Networks Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park Motivation Motivation Computer Science Graphics: Image and video
More informationPart 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX
Automata & languages A primer on the Theory of Computation Laurent Vanbever www.vanbever.eu Part 4 out of 5 ETH Zürich (D-ITET) October, 12 2017 Last week, we showed the equivalence of DFA, NFA and REX
More informationTesting Cluster Structure of Graphs. Artur Czumaj
Testing Cluster Structure of Graphs Artur Czumaj DIMAP and Department of Computer Science University of Warwick Joint work with Pan Peng and Christian Sohler (TU Dortmund) Dealing with BigData in Graphs
More informationTheoretical Computer Science. Dynamic rank/select structures with applications to run-length encoded texts
Theoretical Computer Science 410 (2009) 4402 4413 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Dynamic rank/select structures with
More informationTwo Query PCP with Sub-Constant Error
Electronic Colloquium on Computational Complexity, Report No 71 (2008) Two Query PCP with Sub-Constant Error Dana Moshkovitz Ran Raz July 28, 2008 Abstract We show that the N P-Complete language 3SAT has
More informationLecture 21: Quantum communication complexity
CPSC 519/619: Quantum Computation John Watrous, University of Calgary Lecture 21: Quantum communication complexity April 6, 2006 In this lecture we will discuss how quantum information can allow for a
More informationCS Foundations of Communication Complexity
CS 2429 - Foundations of Communication Complexity Lecturer: Sergey Gorbunov 1 Introduction In this lecture we will see how to use methods of (conditional) information complexity to prove lower bounds for
More informationProbabilistically Checkable Arguments
Probabilistically Checkable Arguments Yael Tauman Kalai Microsoft Research yael@microsoft.com Ran Raz Weizmann Institute of Science ran.raz@weizmann.ac.il Abstract We give a general reduction that converts
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationEE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16
EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt
More informationHilbert Space, Entanglement, Quantum Gates, Bell States, Superdense Coding.
CS 94- Bell States Bell Inequalities 9//04 Fall 004 Lecture Hilbert Space Entanglement Quantum Gates Bell States Superdense Coding 1 One qubit: Recall that the state of a single qubit can be written as
More informationStream Computation and Arthur- Merlin Communication
Stream Computation and Arthur- Merlin Communication Justin Thaler, Yahoo! Labs Joint Work with: Amit Chakrabarti, Dartmouth Graham Cormode, University of Warwick Andrew McGregor, Umass Amherst Suresh Venkatasubramanian,
More information3F1 Information Theory, Lecture 3
3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output
More information15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018
15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science
More informationGraph & Geometry Problems in Data Streams
Graph & Geometry Problems in Data Streams 2009 Barbados Workshop on Computational Complexity Andrew McGregor Introduction Models: Graph Streams: Stream of edges E = {e 1, e 2,..., e m } describe a graph
More informationSparse Johnson-Lindenstrauss Transforms
Sparse Johnson-Lindenstrauss Transforms Jelani Nelson MIT May 24, 211 joint work with Daniel Kane (Harvard) Metric Johnson-Lindenstrauss lemma Metric JL (MJL) Lemma, 1984 Every set of n points in Euclidean
More informationCommunications Theory and Engineering
Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of
More informationDeterministic Finite Automata (DFAs)
Algorithms & Models of Computation CS/ECE 374, Fall 27 Deterministic Finite Automata (DFAs) Lecture 3 Tuesday, September 5, 27 Sariel Har-Peled (UIUC) CS374 Fall 27 / 36 Part I DFA Introduction Sariel
More informationLecture 3 Sept. 4, 2014
CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.
More informationDeterministic Finite Automata (DFAs)
CS/ECE 374: Algorithms & Models of Computation, Fall 28 Deterministic Finite Automata (DFAs) Lecture 3 September 4, 28 Chandra Chekuri (UIUC) CS/ECE 374 Fall 28 / 33 Part I DFA Introduction Chandra Chekuri
More informationChapter 2: The Basics. slides 2017, David Doty ECS 220: Theory of Computation based on The Nature of Computation by Moore and Mertens
Chapter 2: The Basics slides 2017, David Doty ECS 220: Theory of Computation based on The Nature of Computation by Moore and Mertens Problem instances vs. decision problems vs. search problems Decision
More informationIntroduction to Cryptography
B504 / I538: Introduction to Cryptography Spring 2017 Lecture 12 Recall: MAC existential forgery game 1 n Challenger (C) k Gen(1 n ) Forger (A) 1 n m 1 m 1 M {m} t 1 MAC k (m 1 ) t 1 m 2 m 2 M {m} t 2
More informationComputing Longest Common Substrings Using Suffix Arrays
Computing Longest Common Substrings Using Suffix Arrays Maxim A. Babenko, Tatiana A. Starikovskaya Moscow State University Computer Science in Russia, 2008 Maxim A. Babenko, Tatiana A. Starikovskaya (MSU)Computing
More information