Lecture 1: Introduction to Sublinear Algorithms

Similar documents
Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1

Lecture 01 August 31, 2017

Randomized Algorithms

Lecture 2: Streaming Algorithms

The space complexity of approximating the frequency moments

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

CSE548, AMS542: Analysis of Algorithms, Spring 2014 Date: May 12. Final In-Class Exam. ( 2:35 PM 3:50 PM : 75 Minutes )

BINOMIAL DISTRIBUTION

6.1 Moment Generating and Characteristic Functions

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

Lecture 4: Hashing and Streaming Algorithms

14.1 Finding frequent elements in stream

CSE 312 Final Review: Section AA

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions

CPSC 467: Cryptography and Computer Security

Lecture 2 Sept. 8, 2015

Kousha Etessami. U. of Edinburgh, UK. Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 7) 1 / 13

Lecture 7: Chapter 7. Sums of Random Variables and Long-Term Averages

CSC 5170: Theory of Computational Complexity Lecture 5 The Chinese University of Hong Kong 8 February 2010

Lecture 1 September 3, 2013

COMS E F15. Lecture 22: Linearity Testing Sparse Fourier Transform

Tail Inequalities. The Chernoff bound works for random variables that are a sum of indicator variables with the same distribution (Bernoulli trials).

Homework 4 Solutions

Lecture 4. P r[x > ce[x]] 1/c. = ap r[x = a] + a>ce[x] P r[x = a]

CHAPTER 1: Functions

Lecture 2 Sep 5, 2017

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9

Lecture 6: Oracle TMs, Diagonalization Limits, Space Complexity

Sparse Johnson-Lindenstrauss Transforms

CS174 Final Exam Solutions Spring 2001 J. Canny May 16

CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014

Ad Placement Strategies

Notes 6 : First and second moment methods

Lecture 6 September 13, 2016

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

CSCI8980 Algorithmic Techniques for Big Data September 12, Lecture 2

12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters)

Lecture 2. Frequency problems

Math 564 Homework 1. Solutions.

Stanford University CS254: Computational Complexity Handout 8 Luca Trevisan 4/21/2010

Solutions to Problem Set 4

2 How many distinct elements are in a stream?

Section 3.7: Solving Radical Equations

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)

CSC 5170: Theory of Computational Complexity Lecture 9 The Chinese University of Hong Kong 15 March 2010

Lecture 4: Sampling, Tail Inequalities

20.1 2SAT. CS125 Lecture 20 Fall 2016

3 Multiple Discrete Random Variables

Lecture 7: Testing Distributions

Lecture 16: Communication Complexity

The Probabilistic Method

Lecture 7: More Arithmetic and Fun With Primes

STAT 200C: High-dimensional Statistics

UC Berkeley, CS 174: Combinatorics and Discrete Probability (Fall 2008) Midterm 1. October 7, 2008

1 Approximate Quantiles and Summaries

6.854 Advanced Algorithms

Lecture 2: Review of Basic Probability Theory

Computability Crib Sheet

Theory of Computation

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA)

1 Sequences of events and their limits

Lecture 3: Reductions and Completeness

Mining Data Streams. The Stream Model Sliding Windows Counting 1 s

1 Randomized complexity

CS 237: Probability in Computing

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

Tail and Concentration Inequalities

CPSC 467: Cryptography and Computer Security

Problem Points S C O R E Total: 120

Lower Bounds for Testing Bipartiteness in Dense Graphs

Lecture 4 February 2nd, 2017

25.2 Last Time: Matrix Multiplication in Streaming Model

CS261: Problem Set #3

Lecture 6. Today we shall use graph entropy to improve the obvious lower bound on good hash functions.

Approximate Counting and Markov Chain Monte Carlo

x 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2

Notes 1 : Measure-theoretic foundations I

1 Distributional problems

Lecture 8. The Second Law of Thermodynamics; Energy Exchange

CMPSCI 240: Reasoning Under Uncertainty

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Twelfth Problem Assignment

Hidden Markov Models Part 1: Introduction

CPSC 467b: Cryptography and Computer Security

Lecture 14: Cryptographic Hash Functions

CME 106: Review Probability theory

Sublinear Algorithms Lecture 3. Sofya Raskhodnikova Penn State University

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

6.842 Randomness and Computation March 3, Lecture 8

Algebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

STAT 414: Introduction to Probability Theory

Data Streams & Communication Complexity

Mark Scheme (Results) June 2008

b. ( ) ( ) ( ) ( ) ( ) 5. Independence: Two events (A & B) are independent if one of the conditions listed below is satisfied; ( ) ( ) ( )

Susceptible-Infective-Removed Epidemics and Erdős-Rényi random

CLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES

1 The Low-Degree Testing Assumption

Lecture 7: Passive Learning

Continuous-Valued Probability Review

Lecture 3 Sept. 4, 2014

Transcription:

CSE 522: Sublinear (and Streaming) Algorithms Spring 2014 Lecture 1: Introduction to Sublinear Algorithms March 31, 2014 Lecturer: Paul Beame Scribe: Paul Beame Too much data, too little time, space for it all. 1 Sublinear Time Why? It may be that not all data can be accessed, or the accessible data may simple be too large to spend time reading. Since the algorithms can t access all the data, the answers can no longer be exact. Some problems are easy to approximate: average, or median. What kinds of approximation? Produce an answer that is close to the true answer. For example, for an optimization problem produce a value that is close the optimum value OP T. E.g. OP T/c answer c OP T. Note: There is too little to write down a full solution. Produce an answer that is correct an input that is close to the given input. e.g. Property Testing for decision problems: Determine whether 1. the input has the property, or 2. the input is far from every input having the property. Don t care otherwise. The algorithm may make random choices with the probability of producing an incorrect answer at most δ. In general, randomization will be essential to the algorithms we consider. 1

Sublinear Time Models Random access queries Can access any specific word of the input in one step. Samples Can get a word of input from a random position in one step (with that position), or a random sample from an input distribution in one step. 2 Sublinear Space/Streaming Can view all the data but it is going by so fast that the algorithm cannot store it all. This is particular useful for analyzing: Network traffic Database transactions Sensor feeds Scientific data The relevant question is how to keep track of a summary of the data seen in order to answers, since there is no space to store it all. In general, space complexity is at most the time complexity, so sublinear space is more general than sublinear time, but we only deal with restricted version of sublinear space. Data Stream Model Single pass through the data: Stream x 1, x 2,..., x n [M] arriving in order. Fast, per element processing. Amount of storage will be the main parameter. The goal will be to have space at most n α, or even better, log c n. There also is a variant of the data model that is useful in another context: The data either resides has been streamed onto disk first. The storage is the amount of main memory needed to process it using fast sequential access. Multiple passes in the same order may make sense for this variant. Sketching for Data Streams Store a short sketch, or synposis, of the data for later use. Of particular use will be linear sketching which for input x = (x 1,..., x n ) stores the sketch Ax where A is an m n matrix for m n. Linear sketching is also useful for compressed sensing, sublinear measurement where the sketch happens during input collection. We will probably not have time to cover this, however. 2

Course Details In general, this course will focus on common approaches and tools. The course website is http://www.cs.washington.edu/522. There is no text but the website http://sublinear.info which is linked on the course website has many resources, ranging from links to surveys and papers in the area, other courses on sublinear algorithms, to a list of open problems. The course requirements will consist of problem sets (roughly 2) as well as a project presentation which consists of either presenting paper from a list that I will make available, or present results of experimental application of sublinear algorithm techniques. 3 Property Testing Example We consider the Element Distinctness problem: Given x 1,..., x n [M] for M n, are all x i distinct? The property testing version of this problem is to distinguish between the cases: 1. All x i are distinct 2. The number of distinct elements in x is < (1 ε)n Either answer is OK if neither case holds. To see how this fits into the usual property testing framework using the following definition: Definition 3.1. x, y [M] n are ε-far iff {i x i y i } > εn. Observe that the set of inputs that are ε-far from any distinct input is precise the set of inputs with < (1 ε)n distinct elements. Obvious algorithm Take s independent samples (remembering where they came from). If there is a duplicate in the sample, output FAIL else output PASS. This algorithm always answers correctly on distinct inputs. If the input is not distinct, how many samples are needed to detect this with probability 1/2? Consider input examples with ε 1/2: If input is 1, 1,..., 1, 2,..., n/2 in random order, only a constant number of samples suffice. However if input is 1, 1, 2, 2, 3, 3,..., n/2, n/2 in random order, what is the probability that a duplicate will be found? 3

For each i j, the probability that sample j is the unique match for sample i is precisely 1/n. Therefore ( s P[duplicate found] E[# duplicates found] )/n < s2 2 2n so we need s > n to get error < 1/2. Probability Tools: Tail Bounds Let µ = E(x). Recall that V ar(x) = E[(X µ) 2 ] = E(X 2 ) E(X) 2. We will use the following tail bounds repeatedly throughout the course. Linear Tails: Markov s Inequality If X is a random variable and we always have X 0, then for k > 0, P[X k] E(X)/k. Quadratic Tails: Chebyshev s Inequality For any random variable X, P[ X E(X) k] V ar(x)/k 2. This follows easily by applying Markov s inequality to (X µ) 2 for µ = E(X). We will find it particularly useful in the case of pairwise independent random variables. Exponential Tails: Bernstein/Chernoff/Hoeffding s Bounds These are used for the sums of independent bounded random variables. In particular, we will use the so-called Chernoff Bounds for X = X 1 + + X n where the X i are independent {0, 1} random variables. (The X i are not required to have the same distribution, so this is the more general case of Poisson trials rather than just the usual Bernoulli trials.) P[X (1 + δ)e(x)] e δ2 E(X)/3, P[X (1 δ)e(x)] e δ2 E(X)/2. and If we don t worry too much about the constants in the exponent, we can summarize this in a single inequality as P[ X E(X) δe(x)] < e δ2 E(X)/4. 4

Analyzing the Obvious Algorithm Since we know the algorithm is correct when the input is distinct, we consider the case when there are < (1 ε)n distinct elements. We target correctness with probability at least 2/3. MAIN IDEA: pair off the duplicates and argue that roughly n samples are likely to hit both members of some pair. For convenience of analysis we consider 2s elements chosen in two phases with s samples chosen in each phase and let S 1 and S 2 denote the samples from the respective phases. We will show that S 1 hits the 1st element from many pairs with probability at least 5/6 Given that the first phase succeeds, S 2 hits the 2nd element from some pair whose 1st element is hit by S 1 with probability at least 5/6. We can assume without loss of generality that s n and for convenience we assume that s εn/48. If the number of distinct elements is < (1 ε)n, then if we pair up duplicate values (ignoring one copy of any value that occurs an odd number of times) then we get > εn/2 pairs of duplicates. (If we list one copy of each of the < (1 ε)n distinct values first, if a value occurs k times among the remaining > εn elements, it will be part of k/2 pairs, one involving the occurrence of that value among the first list of distinct values.) Therefore P[a single sample hits 1st element of a pair] > ε/2. { 1 i th sample in S 1 hits 1st element of a pair Let indicator variable Y i = 0 otherwise. Then s E[# samples in S 1 that hit 1st element of a pair] = Y i > εs/2. The Y i are independent so by the Chernoff bound P[ s Y i < E( i=1 s i=1 i=1 Y i )/2] e E(P s i=1 Y i)/8 < e εs/16 1/12. Though we have P[ s i=1 Y i < εs/2] 1/12, this is not quite enough to count the pairs hit by S 1 since the sample may hit the same pair more than once. We bound this by subtracting off the contribution to s i=1 Y i from such doubly hit pairs. 5

{ 1 sample i and sample j of S 1 collide Let Z ij = 0 otherwise. As we saw earlier with the lower bound on s, we have E(Z ij ) = P[Z ij ] = 1/n and hence the expected total number of colliding pairs from S 1 is at most E( i j Z ij ) = ( s 2 )/n s2 2n < s εn 2n 48 = εn/96. Therefore by Markov s inequality we have P[ i j Z ij > 12 εs/96 = εs/8] 1/12. Since we have failure at most 1/12 for the two parts, hence, except with probability at most 1/6, the total number of pairs of duplicates whose first element is hit by S 1 is > εs/4 εs/8 = εs/8. Phase 2 The analysis of the second phase is now much easier assuming that the first phase succeeds; i.e., that there are > εs/8 pairs of duplicates whose 1st element is hit by some sample in S 1. Let H 1 be the set of pairs of duplicates whose 1st element is hit by some sample from S 1. Then It follows that P[j th sample of S 2 hits 2nd element of a pair in H 1 ] > εs 8n. ( P[no sample in S 2 hits 2nd element of a pair in H 1 ] < 1 εs 8n ( ) e εs s 8n ) s = e εs2 8n < 1/6, for εs2 2, where the second inequality follows using the fact that 1 x 8n e x for all real values of x. Observing that εs2 2 is equivalent to s 4 n/ε, we see that choosing s = 4 n/ε 8n suffices for the total probability that no duplicate is found to be at most 1/3. Therefore O( n/ε) samples suffice to test for Element Distinctness. First Problem Set #1: Generalize the lower bound argument to show that any algorithm that always accepts distinct inputs, but with probability at least 1/2 rejects all inputs with < (1 ε)n distinct values requires Ω( n/ε) samples. 6