Lecture 1: 01/22/2014

Similar documents
With P. Berman and S. Raskhodnikova (STOC 14+). Grigory Yaroslavtsev Warren Center for Network and Data Sciences

Lecture 5: February 16, 2012

Lecture 13: 04/23/2014

CS369E: Communication Complexity (for Algorithm Designers) Lecture #8: Lower Bounds in Property Testing

Lecture 5: Derandomization (Part II)

3 Finish learning monotone Boolean functions

An Optimal Lower Bound for Monotonicity Testing over Hypergrids

Lecture 9: March 26, 2014

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1

Testing Monotone High-Dimensional Distributions

Learning and Fourier Analysis

Lecture 10: Learning DNF, AC 0, Juntas. 1 Learning DNF in Almost Polynomial Time

Learning and Fourier Analysis

6.842 Randomness and Computation Lecture 5

Sublinear Algorithms Lecture 3. Sofya Raskhodnikova Penn State University

Lecture 7: Passive Learning

Lecture 24 Nov. 20, 2014

Testing that distributions are close

Electronic Colloquium on Computational Complexity, Report No. 17 (1999)

1 Last time and today

CS 151 Complexity Theory Spring Solution Set 5

Property Testing: A Learning Theory Perspective

Lecture 26: Arthur-Merlin Games

CSE200: Computability and complexity Space Complexity

Probability Revealing Samples

Settling the Query Complexity of Non-Adaptive Junta Testing

Lecture 4 : Quest for Structure in Counting Problems

More on NP and Reductions

Lecture 59 : Instance Compression and Succinct PCP s for NP

Computational Complexity of Bayesian Networks

CS264: Beyond Worst-Case Analysis Lecture #18: Smoothed Complexity and Pseudopolynomial-Time Algorithms

Lecture 8 (Notes) 1. The book Computational Complexity: A Modern Approach by Sanjeev Arora and Boaz Barak;

Lecture 22: Counting

CS173 Strong Induction and Functions. Tandy Warnow

Lecture Space Bounded and Random Turing Machines and RL

Design and Analysis of Algorithms

Admin NP-COMPLETE PROBLEMS. Run-time analysis. Tractable vs. intractable problems 5/2/13. What is a tractable problem?

Lecture 10: Hardness of approximating clique, FGLSS graph

distributed simulation and distributed inference

Homomorphism Testing

Lecture 15: A Brief Look at PCP

Lecture 4. 1 Estimating the number of connected components and minimum spanning tree

1 Computational problems

Testing Lipschitz Functions on Hypergrid Domains

P is the class of problems for which there are algorithms that solve the problem in time O(n k ) for some constant k.

A An Overview of Complexity Theory for the Algorithm Designer

1 Randomized Computation

Quantum Computing: Foundations to Frontier Fall Lecture 3

Tolerant Versus Intolerant Testing for Boolean Properties

CS264: Beyond Worst-Case Analysis Lecture #15: Smoothed Complexity and Pseudopolynomial-Time Algorithms

1 Maintaining a Dictionary

Notes for Lecture 2. Statement of the PCP Theorem and Constraint Satisfaction

TEL AVIV UNIVERSITY THE IBY AND ALADAR FLEISCHMAN FACULTY OF ENGINEERING Department of Electrical Engineering - Systems

2 Natural Proofs: a barrier for proving circuit lower bounds

Lecture 24: Approximate Counting

1 Introduction (January 21)

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

1 The Low-Degree Testing Assumption

Basic Probabilistic Checking 3

6.045: Automata, Computability, and Complexity (GITCS) Class 17 Nancy Lynch

Lecture 8: March 12, 2014

1 Reductions and Expressiveness

Testing Problems with Sub-Learning Sample Complexity

Lecture 2: January 18

ITCS:CCT09 : Computational Complexity Theory Apr 8, Lecture 7

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

Polynomial time Prediction Strategy with almost Optimal Mistake Probability

NP-Completeness I. Lecture Overview Introduction: Reduction and Expressiveness

Notes for Lecture 15

Tolerant Versus Intolerant Testing for Boolean Properties

Yale University Department of Computer Science

Lecture 7: Schwartz-Zippel Lemma, Perfect Matching. 1.1 Polynomial Identity Testing and Schwartz-Zippel Lemma

Testing equivalence between distributions using conditional samples

6.842 Randomness and Computation April 2, Lecture 14

Limits to Approximability: When Algorithms Won't Help You. Note: Contents of today s lecture won t be on the exam

Average Case Complexity: Levin s Theory

On the possibilities and limitations of pseudodeterministic algorithms

Lecture 18: P & NP. Revised, May 1, CLRS, pp

Lecture 6: September 22

Lecture 1: January 16

Lecture 6: Random Walks versus Independent Sampling

On the correlation of parity and small-depth circuits

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

6.896 Quantum Complexity Theory November 4th, Lecture 18

Lower Bounds for Testing Bipartiteness in Dense Graphs

Economics 204 Fall 2011 Problem Set 2 Suggested Solutions

Chapter 2. Reductions and NP. 2.1 Reductions Continued The Satisfiability Problem (SAT) SAT 3SAT. CS 573: Algorithms, Fall 2013 August 29, 2013

Lecture 21: Counting and Sampling Problems

Complexity Theory: The P vs NP question

Proclaiming Dictators and Juntas or Testing Boolean Formulae

Fast Approximate PCPs for Multidimensional Bin-Packing Problems

Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity

Lecture 13: Lower Bounds using the Adversary Method. 2 The Super-Basic Adversary Method [Amb02]

Algorithms: COMP3121/3821/9101/9801

Lecture 8: Linearity and Assignment Testing

Space Complexity vs. Query Complexity

Lecture 12: Lower Bounds for Element-Distinctness and Collision

U.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan August 30, Notes for Lecture 1

Preface These notes were prepared on the occasion of giving a guest lecture in David Harel's class on Advanced Topics in Computability. David's reques

Transcription:

COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 1: 01/22/2014 Spring 2014 Scribes: Clément Canonne and Richard Stark 1 Today High-level overview Administrative details Some content Relevant Readings: Ergün, Kannan, Kumar, Rubinfeld and Viswanathan, 1998: Spot-checkers. [EKK + 98] Ron, 2008: Property Testing: A Learning Theory Perspective. [Ron08] 2 High-level overview 2.1 What is this course about? Goal Get information from some massive data object so humongous we cannot possibly look at the whole object. Instead, we must use sublinear-time algorithms to have any hope of getting something done. This immediately triggers the first natural question is it even possible to do anything? As we shall see, the answer is perhaps surprisingly yes. What kind of objects? We will be mainly interested in 3 different sorts of ginormous objects: Boolean functions, graphs and probability distributions. Example 1. The object is a Boolean function f : {0, 1} n {0, 1}, of size 2 n, to which we have query access: 1

2 HIGH-LEVEL OVERVIEW 2 x {0, 1} n f(x) Example 2. The object is a graph G = (V, E) where V = [N] = {1,..., N}, for N = 2 n ; we have query access to its adjacency matrix: (i, j) V V Yes iff (i, j) E Example 3. The object is a probability distribution D over [N], and we have access to independent samples: f G Sample-delivering Button i D D What can we hope for? For most questions, it is impossible to get an exact answer in sublinear time. Think for instance of deciding whether a function is identically zero or not; until all 2 n points have been queried, there is no way to be certain of the answer. However, by accepting approximate answers, we can do a lot. Similarly, it is not hard to see that it is paramount that our algorithms are randomized. Deterministic ones are easy to fool. So, we will seek algorithms that give good approximations with high probability. randomization + approximation 2.2 Kinds of algorithmic problems We will consider two main families of problems over the three types of objects introduced above: learning and property testing.

2 HIGH-LEVEL OVERVIEW 3 2.2.1 Learning problems The goal is to output some high-quality approximation of the object O. Note that we can only do this if we know O is highly structured. For instance, learning a completely random Boolean function f defined by tossing a coin to generate a truth table (for each x {0, 1} n, f(x) is chosen uniformly, independently at random in {0, 1}) clearly cannot be done with an efficient number of queries. Example 4 (Learning decision trees). Assume f : {0, 1} n {0, 1} is computed by a poly(n)-size decision tree (DT). E.g., for x = (x 1,..., x 8 ), f(x) is given by the value at the leaf reached by going down the following tree (0: left, 1: right): x 7 x 5 x 4 0 1 x 3 x 5 1 x 1 0 x 6 0 1 1 0 (in this example, f(11001100) = 1). The distance measure used here will be the Hamming distance: for f, g : {0, 1} n {0, 1}, d(f, g) def = { x {0, 1}n : f(x) g(x) } = Pr [ f(x) g(x) ] (1) 2 n x U {0,1} n Question: Can we, given black-box access to f (promised to be computed by a poly(n)- size DT), run in poly(n, 1/ɛ) time and output some hypothesis h: {0, 1} n {0, 1} s.t. d(f, g) ɛ? Spoiler: yes we will cover this in future lectures.

2 HIGH-LEVEL OVERVIEW 4 Example 5 (Learning distributions). Distribution D over [N]. Assume D has structure; 1 more particularly, for this example, assume D is monotone (non-increasing): D(1) D(2) D(N) D(i) 1 2 3 4... N i Figure 1: Example of a monotone distribution D. The distance measure considered will be the total variation distance (TV) 2 : for D 1, D 2 distributions over [N], d TV (D 1, D 2 ) def = max S [N] (D 1(S) D 2 (S)) = 1 2 i [N] D 1 (i) D 2 (i) (2) where the second equality (known as Scheffé s Identity) is left as an exercise. It is not Hint: can be shown by hard to see that d TV (D 1, D 2 ) [0, 1] (where the upper bound is for instance achieved for D 1, D 2 with disjoint supports). 1 As we will show later in the class, if D is arbitrary, the number of samples are needed for learning is linear in N, specifically, Θ ( N/ɛ 2). 2 A (very stringent) metric also referred to as statistical distance, or (half) the L 1 distance. considering the set S = { i [N] : D 1 (i) > D 2 (i) }.

2 HIGH-LEVEL OVERVIEW 5 Question: Given access to independent samples from some D, unknown monotone distribution over [N], what are the sample and time complexity required to output (a succinct representation of) a hypothesis distribution D s.t. d TV (D, D ) ɛ? Good news: can do this with O(log N/ɛ 3 ) samples and runtime and this is optimal. 2.2.2 Property testing problems The object O is now arbitrary, and we are interested in some property P on the set of all objects. The goal is to distinguish whether: (a) O has the property P, of (b) O is far from every O with property P. We don t care about the in-between cases where O is close to having property P. In such cases we can give whatever answer we want. Equivalently, this setting can be seen as a promise problem, where we are promised that all objects will either have property P or be far from having property P. Far from P P For instance, for the property P (on Boolean functions) of being the identically zero function, f is ɛ-far from P if it takes value 1 on at least an ɛ fraction of the inputs. Example 6 (Testing f : {0, 1} n {0, 1} for monotonicity). Definition 7. A Boolean function f is monotone (non-decreasing) if x y implies f(x) f(y); where x y means x i y i for all i [n]. For instance, f defined by f(x) = x 1 x 17 is monotone; f(x) = x 1 is not. Taking our property P to be monotonicity (equivalently, P = { f : {0, 1} n {0, 1} : f is monotone }; the set of all functions with the property), we define the distance of f from P as follows: dist(f, P) def = min d(f, g) (3) g P

2 HIGH-LEVEL OVERVIEW 6 (where d(f, g) is the Hamming distance defined in Equation (1)). We define a testing algorithm T for monotonicity as follows: given parameter ɛ (0, 1], and query access to any f, if f is monotone, T should accept (with probability 9/10); if dist(f, P) > ɛ, T should reject (with probability 9/10). As it turns out, testing monotonicity of Boolean functions can be done with O ( ) n ɛ queries and runtime. Example 8 (Bipartiteness testing for graphs). Given an arbitrary graph G = ([N], E), we would like to design a testing algorithm T which, given oracle access to the adjacency matrix of G, if G is bipartite, accepts (with probability 9/10); if dist(g, Bip) > ɛ, rejects (with probability 9/10) where Bip is the set of all bipartite graphs over vertex set [N], and dist(g, Bip) = min G Bip d(g, G ) with d(g, G ) def = E(G) E(G ) the edit distance between G and G ( N 2). Somewhat surprisingly, testing bipartiteness can be done with poly( 1 ) queries and ɛ runtime, independent of N. 2.3 Summary Most of the topics covered in the class will fit into this 2 3 table: Boolean functions Graphs Probability distributions Learning eg, DTs eg, monotone Testing eg, Monotonicity eg, Bipartiteness Flavor of the course algorithms and lower bounds; Fourier analysis over { 1, 1} n ; probability; graph theory...

3 ADMINISTRATIVE DETAILS 7 3 Administrative details See webpage: http://www.cs.columbia.edu/ rocco/teaching/s14/ 4 Some content Testing Sortedness The first topic covered will be the leftmost top cell learning Boolean functions. However, before doing so, we will get a taste of property testing with the example of testing sortedness of a list (an example slightly out of the above summary table, yet which captures the spirit of many property testing results). Problem: Given access to a list ā = (a 1,..., a N ) Z N, figure out whether it is sorted that is, if a 1 a 2... a N. More precisely, we want to distinguish sorted lists from lists which are far from sorted. Definition 9. For ɛ [0, 1], we say a list of N integers ā = (a 1,..., a N ) is ɛ- approximately sorted if there exists b = (b 1,..., b N ) Z N such that (i) b is sorted; and (ii) { i [N] : a i b i } ɛn Remark 1. This definition is equivalent to saying that ā is ɛ-approximately sorted if it has a (1 ɛ)n-length sorted subsequence. Suggestion: write this in terms of some distance ( ) dist ā, b. Goal: Design an algorithm which queries few elements a i (here a query means providing the value i to an oracle, and being given a i as response) and if ā is sorted, accepts with high probability; if ā is not ɛ-approximately sorted, rejects with high probability. Remark 2. Deterministic algorithms will not work here; indeed, consider any deterministic algorithm which reads at most N/2 of the a i s; it is possible to change any sorted input on the unchecked spots to make it Θ(1)-far from sorted, and the algorithm cannot distinguish between the two cases. Theorem 10. There exists a O ( ) log N ɛ -query algorithm for ɛ-testing sortedness. Further, this tester is one-sided: if ā is sorted, it accepts with probability 1; if ā is not ɛ-approximately sorted, rejects with probability 2/3.

4 SOME CONTENT TESTING SORTEDNESS 8 4.1 First naive attempt Natural idea: read log N ɛ random spots, and accept iff the induced sublist is sorted. Why does it fail? Consider the (1/2-far from sorted) list ā = (11, 10, 21, 20, 31, 30,..., 10 100 + 1, 10 100 ). A violation will only be detected if certain consecutive elements e.g., 21, 20 are queried, which happens with probability o(1) if we make O ( 1 log N ) queries. (Think ɛ of ɛ as being a small constant like 1/100.) a i 1... N i 4.2 Second naive attempt Since the previous approach failed at finding local violations, let us focus on such violations: draw pairs of consecutive elements, checking if each pair is sorted, and accept iff all pairs drawn pass the test. Why does it fail? Consider the list ā = (1, 2, 3,..., N/2, 1, 2, 3,..., N/2). Once again, ā is 1/2-far from sorted, yet there is only one pair of consecutive elements that may show a violation.

4 SOME CONTENT TESTING SORTEDNESS 9 a i 1... N 2 N 2 + 1 N i 4.3 Third (and right) attempt Some preliminary setup: first, observe that we can assume without loss of generality that all a i s are distinct. Indeed, we can always ensure it is the case by replacing onthe-fly a i by b i = Na i + i. It is not hard to see that the b i s are now distinct, and def moreover that a i a j iff b i < b j. Furthermore, suppose x is a sorted list of N distinct integers. In such a list, one can use binary search to check in log N queries whether a given value x is in x. Definition 11. Given a (not necessarily sorted) list ā, we say that a i is well-positioned if a binary search on a i ends up at the i th location (where it successfully finds a i ). In particular, if ā is sorted, all its elements are well-positioned. For instance, in ā = (1, 2, 100, 4, 5, 6,..., 98, 99) a 3 = 100 is not well-positioned; while a 75 = 75 is. Note that with 1 + log N queries, one can query location i to get a i and then use binary search on a i to determine whether a i is well-positioned. The algorithm for 10 iterations do ɛ Pick i [N] uniformly at random. Query and get a i. Do binary search on a i ; if a i is not well-positioned, halt and return REJECT end for return ACCEPT makes 10 (1 + log N) = ) ɛ O( log N ɛ queries, as claimed.

REFERENCES 10 if ā is sorted, all elements are well-positioned and the tester accepts with probability 1. It remains to prove that if ā is ɛ-far from sorted, the algorithm will reject with probability at least 2/3. Equivalently, we will show the contrapositive if ā is such that Pr[ ACCEPT ] 1/3, then ā is ɛ-approximately sorted (i.e. has a sorted subsequence of size at least (1 ɛ)n). Suppose that Pr[ ACCEPT on ā ] 1/3, and define W [N] to be the set of all well-positioned indices. If we had W (1 ɛ)n, then the probability that the algorithm accepts would be at most Pr[ ACCEPT on ā ] (1 ɛ) 10/ɛ < 0.01 so it must be the case that W (1 ɛ)n. Therefore, it is sufficient to prove that the set W is sorted, that is, for all i, j W such that i < j, a i < a j. Fix any two such i, j; clearly, if there were an index k s.t. (1) i k j, (2) the binary search for a i visits k, and (3) the binary search for a j visits k, then this would entail that a i a k a j. So it is enough to argue such a k exists. To see why, note that both binary searches for a i and a j start at the same index N/2; and there must be a last common index to both searches, as they end on different indices i and j. Take k to be this last common index; as the two binary searches diverge at k, it has to be between i and j. References [EKK + 98] Funda Ergün, Sampath Kannan, S. Ravi Kumar, Ronitt Rubinfeld, and Mahesh Viswanathan. Spot-checkers. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC 98, pages 259 268, New York, NY, USA, 1998. ACM. [Ron08] Dana Ron. Property testing: A learning theory perspective. Foundations and Trends in Machine Learning, 1(3):307 402, 2008.