Lecture 5: February 16, 2012
|
|
- Nickolas Simon
- 5 years ago
- Views:
Transcription
1 COMS 6253: Advanced Computational Learning Theory Lecturer: Rocco Servedio Lecture 5: February 16, 2012 Spring 2012 Scribe: Igor Carboni Oliveira 1 Last time and today Previously: Finished first unit on PAC learning using PTF degree upper bounds. Started PAC learning under the uniform distribution. Introduced Fourier analysis over the Boolean hypercube. Today: Connection between Fourier analysis and learning under the uniform distribution. Low-Degree Algorithm (LDA) and Fourier concentration for some classes of Boolean functions. Applications of the LDA: learning decision trees and DNFs. Relevant Readings: Y. Mansour. An O(n log log n ) Learning Algorithm for DNF under the Uniform Distribution. In Proceeding of COLT (1992). [Journal of Computer and Systems Sciences 50(3): (1995)]. Y. Mansour. Learning Boolean Functions via the Fourier Transform. In Theoretical Advances in Neural Computation and Learning, (V.P. Roychodhury and K-Y. Siu and A. Orlitsky, ed.), (1994). J. Kobler, W. Lindner. Learning Boolean Functions under the Uniform Distribution via the Fourier Transform. In The Computational Complexity Column. 1
2 2 REVIEW OF FOURIER ANALYSIS 2 2 Review of Fourier Analysis Remember that any function f : { 1, 1} n R has a unique representation of the form 1 f(x) = ˆf(S)χ S (x), (1) where χ S (x) = x i. (2) i S In particular, we have: ˆf(S) = E x [f(x)χ S (x)]. (3) Therefore, if f : { 1, 1} n { 1, 1} is Boolean, it follows that ˆf(S) 1 for every S [n]. Actually, using the next proposition we can say something much stronger in this case: Proposition 1. [Plancherel s Identity] For any f, g : { 1, 1} n R, we have: E x [f(x)g(x)] = ˆf(S)ĝ(S). (4) Proof. Using the Fourier expansions of f and g, it follows that: E x [f(x)g(x)] = E x ˆf(S)χ S (x) ĝ(t )χ T (x) (5) T [n] = ˆf(S)ĝ(T ) E x [χ S (x)χ T (x)] (6) T [n] = ˆf(S)ĝ(S), (7) where the last equality follows from the fact that χ S (x)χ T (x) = χ S T (x) and E x [χ S T (x)] = 1 if S = T and is 0 otherwise. Corollary 2. [Parseval s Identity] For any Boolean f : { 1, 1} n { 1, 1}, we have: ˆf(S) 2 = 1. (8) 1 In contrast, it is easy to see that the representation of a function as a PTF is not unique.
3 2 REVIEW OF FOURIER ANALYSIS 3 Proof. Applying Plancherel s identity with f = g we get that [ ˆf(S) 2 = E ] x f(x) 2 = E x [1] = 1. (9) Definition 3. The Fourier degree deg(f) of a function f : { 1, 1} n R is the smallest integer d 0 such that for all S [n] with S > d, we have ˆf(S) = 0. For example, let s consider the Fourier degree of the AND function over n variables: { 1 if all xi = 1. AND(x 1,..., x n ) = 0 if some x i = 1. It is easy to see that: AND(x 1,..., x n ) = (1 x 1) (1 x 2)... (1 x n) (10) = ( 1) S χ 2 n S. (11) Thus we have deg(and) = n. In other words, the AND function has maximum Fourier degree. The following lemma will be useful later. Lemma 4. [Fourier degree of depth-d Decision Trees] Let f : { 1, 1} n { 1, 1} be a Boolean function computed by some decision tree of depth d. Then deg(f) d. Proof. Given a decision tree T of depth d representing f, it is easy to see that: f(x) = 1 P (x) f(p ), (12) Paths P in T where f(p ) is the value of function f at the end of path P and 1 P (x) is a function on at most d variables that is 1 if x follows P and 0 otherwise. Note that 1 P (x) has Fourier degree at most d, since it is a function that depends on at most d variables. By the linearity of the Fourier transform, it follows that deg(f) d.
4 3 FOURIER ANALYSIS AND LEARNING THEORY 4 3 Fourier Analysis and Learning Theory Intuitively, if we could find most of the heavy Fourier coefficients of an unknown Boolean function f then we should be able to come up with a good approximation for f. In this section we formalize this idea. First, we show how to use the example oracle EX(f) to approximate ˆf(S) for any S [n]. We emphasize that the learning theory results discussed in this note hold with respect to the uniform distribution. Lemma 5. There is an algorithm A that, given γ > 0, δ > 0, S [n] and oracle access to EX(f), where f : { 1, 1} n { 1, 1}, outputs with probability at least 1 δ a value c S such that c S ˆf(S) γ. The number of oracle calls made by A is O( 1 log 1) γ 2 δ and its running time is poly(n, 1, log 1)). γ 2 δ Proof. For X uniform over { 1, 1} n, let Z = f(x) χ S (X). Then it is clear that Z { 1, 1} and E[Z] = ˆf(S). In addition, note that an arbitrary number of independent pairs (X, f(x)) can be sampled using oracle EX(f) and that χ S (X) can be easily computed given any input X. It follows from the Chernoff-Hoeffding bound that the empirical estimate c S for E[Z] obtained using O( 1 log 1 ) draws from EX(f) is γ close γ 2 δ to ˆf(S) with probability at least 1 δ. Although the previous lemma can be used to approximate a small number of Fourier coefficients, an efficient learning algorithm cannot afford to approximate all coefficients of an unknown Boolean function. Actually, even if we are promised that there is some special Fourier coefficient ˆf(T ) such that ˆf(T ) > 0.99, it is not clear how to find such coefficient efficiently using queries to the random oracle EX(f) 2. We can still make use of lemma 5 if we restrict the concept class C to contain only functions that have most of their Fourier weight concentrated on a (fixed) small number of coefficients. For example, for many interesting Boolean functions f, at least 1 ɛ of the Fourier weight lives on low-degree coefficients, i.e., there exists a small d such that S >d ˆf(S) 2 ɛ. If this is the case, we only need to estimate O(n d ) coefficients to obtain a good description of the function. Definition 6. [Fourier Concentration] Let f : { 1, 1} n R. We say f has α(ɛ, n)-fourier concentration if ˆf(S) ɛ. (13) S >α(ɛ,n) 2 However, we will see in a few lectures that if we can query the unknown function f at arbitrary input positions, then we can actually find such coefficient efficiently.
5 3 FOURIER ANALYSIS AND LEARNING THEORY 5 For convenience, we associate to each function f : { 1, 1} n R its Fourier concentration function α f (ɛ, n) in the natural way, that is, for every ɛ we set α f (ɛ, n) to be the smallest integer k 0 such that ˆf(S) ɛ. (14) S >k Note that for Boolean f we have ˆf(S) 1 ɛ. (15) S α f (ɛ,n) The next algorithm demonstrates the connection between learning under the uniform distribution and estimating Fourier coefficients. Low-Degree Algorithm (LDA). This algorithm is used to approximate an unknown Boolean function f with d = α(ɛ, n)-fourier concentration. The LDA is given parameters τ > 0 (accuracy) and δ > 0 (confidence) and has access to the random oracle EX(f). It computes as follows: 1. The algorithm draws m = O( nd τ nd log ) random examples (x, f(x)) from EX(f). δ 2. It uses these examples to find estimates c S for ˆf(S) for each subset S [n] with S d as done in the proof of lemma Sets h(x) := S d c Sχ S (x). Note that this may not be a binary function. 4. Outputs hypothesis h(x) := sign(h(x)). Lemma 7. If f : { 1, 1} n { 1, 1} has α(ɛ, n)-fourier concentration, then with probability at least 1 δ the Low-Degree Algorithm constructs a function h such that E x [ (h(x) f(x)) 2 ] ɛ + τ. (16) Proof. Fix an S [n] with S d and let γ = τ and δ = δ. It follows from n d n d the proof of lemma 5 that with failure probability at most δ the estimate c S obtained during step 2 satisfies c S ˆf(S) γ. Therefore applying an union bound we get that with probability at least 1 δ, for every S d we have c S ˆf(S) γ. Let
6 3 FOURIER ANALYSIS AND LEARNING THEORY 6 g(x) = f(x) h(x). Clearly, for every S we have ĝ(s) = ˆf(S) ĥ(s). It follows using Plancherel s identity that E x [ (h(x) f(x)) 2 ] = E x [g(x) 2 ] (17) = ĝ(s) 2 (18) = ( ˆf(S) ĥ(s)) 2 (19) (by definition of h) = ( ˆf(S) c S ) 2 + ( ˆf(S) 0) 2 (20) S d (with probability 1 δ) (γ ) 2 + S d S >d S >d ˆf(S) 2 (21) τ + ɛ, (22) where the last inequality uses the Fourier concentration assumption about f (recall that d = α(ɛ, n)). Lemma 8. The hypothesis h = sign(h(x)) output by the Low-Degree Algorithm is (τ + ɛ)-close to f with probability at least 1 δ. Proof. Using the previous lemma it is enough to prove that Pr x [f(x) sign(h(x))] E x [(h(x) f(x)) 2 ]. But since f(x) { 1, 1}, we have which completes the argument. Pr x [f(x) sign(h(x))] = 1 2 n x { 1,1} n 1 f(x) sign(h(x)) (23) Altogether, we have shown: 1 2 n x { 1,1} n (f(x) h(x)) 2 (24) = E x [ (h(x) f(x)) 2 ], (25) Theorem 9. Let C be a class of n-variables Boolean functions such that every f C has d = α(ɛ, n)-fourier concentration. Then there is a poly(n d, 1, log 1 )-time uniform ɛ δ distribution PAC algorithm that learns any f C to accuracy 2ɛ. Proof. Run the LDA with τ = ɛ. The result follows from lemmas 7 and 8.
7 4 APPLICATIONS OF THE LOW-DEGREE ALGORITHM 7 4 Applications of the Low-Degree Algorithm 4.1 Learning depth-d Decision Trees It follows from lemma 4 that if f is an n-variable Boolean function computed by a depth-d decision tree, then α f (0, n) d. Combining this observation with theorem 9, we immediately obtain: Proposition 10. Let C be the class of depth-d decision trees over n variables. Then there is a poly(n d, 1, log 1 )-time uniform distribution PAC algorithm that learns any ɛ δ f C to accuracy ɛ with probability at least 1 δ. 4.2 Learning s-term DNFs In this subsection we present a Fourier-concentration result for DNFs. Theorem 11. If f : { 1, 1} n { 1, 1} is an s-term DNF, then α f (ɛ, n) = O(log s ɛ log 1 ɛ ). Using the Low-Degree Algorithm we get: Corollary 12. Let C be the class of s-term DNFs over n variables. Then there is a poly(n log s ɛ log 1 ɛ, log 1 )-time uniform distribution PAC algorithm that learns any f C δ to accuracy 2ɛ with probability at least 1 δ. To obtain theorem 11, we first argue that instead of proving a Fourier concentration result for s-term DNFs, it is enough to get a concentration result for width-w DNFs. Lemma 13. Every s-term DNF can be ɛ-approximated by a ( log s ɛ) -width DNF. Proof. Note that removing a term of size greater than log s from the original DNF can ɛ only change an ɛ -fraction of its output bits. Since there are at most s of these terms, s the result follows by an union bound. Lemma 14. Let f, f : { 1, 1} n { 1, 1} be ɛ-close Boolean functions. If f has α(ɛ, n)-fourier concentration, we have: ˆf(S) 2 9ɛ. (26) S >α(ɛ,n)
8 4 APPLICATIONS OF THE LOW-DEGREE ALGORITHM 8 Proof. Since f and f are ɛ-close Boolean functions, we have E x [(f(x) f (x)) 2 ] = 4 Pr x [f(x) f (x)] 4ɛ. (27) On the other hand, using Plancherel s identity: E x [(f(x) f (x)) 2 ] = ( ˆf(S) ˆf (S)) 2 (28) S >α(ɛ,n) ( ) 2 ˆf(S) ˆf (S) (29) (by definition) = v f v f 2 2, (30) where v f is a real-valued vector with coordinates over sets S with S > α(ɛ, n) such that v f S = ˆf(S) and v f is defined similarly. Using inequality 27, we get v f v f 2 4ɛ. In addition, since f has α(ɛ, n)-fourier concentration, we have v f 2 ɛ. Now using the fact that for any pair of vectors u, w we have u w 2 u 2 w 2, it follows that 4ɛ+ ɛ v f v f 2 + v f 2 v f 2, which is equivalent to v f 2 2 9ɛ. In other words: ˆf(S) 2 9ɛ. (31) S >α(ɛ,n) Therefore, it follows from lemmas 13 and 14 that to get theorem 11 it is enough to prove the following proposition: Proposition 15. Let f : { 1, 1} n { 1, 1} be a Boolean function and assume that f is computed by a width-w DNF. Then α f (ɛ, n) = O(w log 1 ɛ ). To achieve that, we first study how a width-w DNF simplifies under a random restriction over its input bits. Definition 16. A restriction is a pair (I, Z), with I [n] and Z { 1, 1} I, where I = [n]\i. Definition 17. For f : { 1, 1} n R, we write f I Z to denote the (I, Z)-restricted version of f, that is, the function f I Z : { 1, 1} I R obtained from f by setting the variables in I to Z. For example, if f(x 1, x 2,..., x 5 ) : { 1, 1} 5 R is a real-valued function and (I, Z) is a restriction with I = {2, 3} and Z = ( 1, 1, 1), then f I Z represents the function f( 1, x 2, x 3, 1, 1).
9 4 APPLICATIONS OF THE LOW-DEGREE ALGORITHM 9 Definition 18. A random restriction with -probability ρ (also called a ρ-random restriction) is a pair (I,Z) chosen by: For each i [n], put i I with probability ρ. Pick z { 1, 1} I uniformly at random. It is clear that functions simplify under a restriction. For a width-w DNF, we can actually show that the function simplifies a lot. Theorem 19. [Hastad Switching Lemma] Let f : { 1, 1} n { 1, 1} be a width-w DNF. Then for (I, Z) a ρ-random restriction: For example, if ρ = 1, we get that 10w Pr [DT-depth(f I Z ) > d] (5ρw) d. (32) (I,Z) Pr [DT-depth(f I Z ) > d] (I,Z) 2 d. (33) The proof of this theorem is beyond the scope of this note. The Switching Lemma is a powerful tool for our purposes because we understand the Fourier-concentration of decision trees pretty well. In particular, we know by lemma 4 that depth-d decision trees are concentrated on coefficients with S d. Intuitively, we want to argue that since after the random restriction the resulting function has good concentration, it must be the case that the original width-w DNF has some concentration as well. To formalize this argument we introduce additional notation. Definition 20. Let f : { 1, 1} n R be any function and let S, I [n]. We define F S I (Z) : { 1, 1} I R as follows: F S I (Z) = f I Z (S). (34) Proposition 21. Let f : { 1, 1} n R and suppose S I [n]. Then for any set T I. F S I (T ) = ˆf(S T ), (35)
10 4 APPLICATIONS OF THE LOW-DEGREE ALGORITHM 10 Proof. F S I (T ) = E Z { 1,1} I[F S I (Z) χ T (Z)] (36) = E Z { 1,1} I[ f I Z (S) χ T (Z)] (37) = E Z { 1,1} I[E Y { 1,1} I[f I Z (Y ) χ S (Y )] χ T (Z)] (38) = E X { 1,1} n[f(x) χ S (X) χ T (X)] (39) (since S T = ) = E X { 1,1} n[f(x) χ S T (X)] (40) = ˆf(S T ). (41) The proof of proposition 15 will be completed next class.
Lecture 4: LMN Learning (Part 2)
CS 294-114 Fine-Grained Compleity and Algorithms Sept 8, 2015 Lecture 4: LMN Learning (Part 2) Instructor: Russell Impagliazzo Scribe: Preetum Nakkiran 1 Overview Continuing from last lecture, we will
More information3 Finish learning monotone Boolean functions
COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 5: 02/19/2014 Spring 2014 Scribes: Dimitris Paidarakis 1 Last time Finished KM algorithm; Applications of KM
More informationLecture 10: Learning DNF, AC 0, Juntas. 1 Learning DNF in Almost Polynomial Time
Analysis of Boolean Functions (CMU 8-859S, Spring 2007) Lecture 0: Learning DNF, AC 0, Juntas Feb 5, 2007 Lecturer: Ryan O Donnell Scribe: Elaine Shi Learning DNF in Almost Polynomial Time From previous
More informationLecture 7: Passive Learning
CS 880: Advanced Complexity Theory 2/8/2008 Lecture 7: Passive Learning Instructor: Dieter van Melkebeek Scribe: Tom Watson In the previous lectures, we studied harmonic analysis as a tool for analyzing
More informationNotes for Lecture 15
U.C. Berkeley CS278: Computational Complexity Handout N15 Professor Luca Trevisan 10/27/2004 Notes for Lecture 15 Notes written 12/07/04 Learning Decision Trees In these notes it will be convenient to
More information6.842 Randomness and Computation April 2, Lecture 14
6.84 Randomness and Computation April, 0 Lecture 4 Lecturer: Ronitt Rubinfeld Scribe: Aaron Sidford Review In the last class we saw an algorithm to learn a function where very little of the Fourier coeffecient
More informationLearning and Fourier Analysis
Learning and Fourier Analysis Grigory Yaroslavtsev http://grigory.us Slides at http://grigory.us/cis625/lecture2.pdf CIS 625: Computational Learning Theory Fourier Analysis and Learning Powerful tool for
More informationLecture 1: 01/22/2014
COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 1: 01/22/2014 Spring 2014 Scribes: Clément Canonne and Richard Stark 1 Today High-level overview Administrative
More information1 Last time and today
COMS 6253: Advanced Computational Learning Spring 2012 Theory Lecture 12: April 12, 2012 Lecturer: Rocco Servedio 1 Last time and today Scribe: Dean Alderucci Previously: Started the BKW algorithm for
More informationLecture 8: Linearity and Assignment Testing
CE 533: The PCP Theorem and Hardness of Approximation (Autumn 2005) Lecture 8: Linearity and Assignment Testing 26 October 2005 Lecturer: Venkat Guruswami cribe: Paul Pham & Venkat Guruswami 1 Recap In
More informationLecture 29: Computational Learning Theory
CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational
More informationMore Efficient PAC-learning of DNF with Membership Queries. Under the Uniform Distribution
More Efficient PAC-learning of DNF with Membership Queries Under the Uniform Distribution Nader H. Bshouty Technion Jeffrey C. Jackson Duquesne University Christino Tamon Clarkson University Corresponding
More informationAPPROXIMATION RESISTANCE AND LINEAR THRESHOLD FUNCTIONS
APPROXIMATION RESISTANCE AND LINEAR THRESHOLD FUNCTIONS RIDWAN SYED Abstract. In the boolean Max k CSP (f) problem we are given a predicate f : { 1, 1} k {0, 1}, a set of variables, and local constraints
More informationLearning DNF Expressions from Fourier Spectrum
Learning DNF Expressions from Fourier Spectrum Vitaly Feldman IBM Almaden Research Center vitaly@post.harvard.edu May 3, 2012 Abstract Since its introduction by Valiant in 1984, PAC learning of DNF expressions
More informationOn Learning Monotone DNF under Product Distributions
On Learning Monotone DNF under Product Distributions Rocco A. Servedio Department of Computer Science Columbia University New York, NY 10027 rocco@cs.columbia.edu December 4, 2003 Abstract We show that
More informationLecture 5: February 21, 2017
COMS 6998: Advanced Complexity Spring 2017 Lecture 5: February 21, 2017 Lecturer: Rocco Servedio Scribe: Diana Yin 1 Introduction 1.1 Last Time 1. Established the exact correspondence between communication
More informationLecture 9: March 26, 2014
COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 9: March 26, 204 Spring 204 Scriber: Keith Nichols Overview. Last Time Finished analysis of O ( n ɛ ) -query
More informationLecture 2: Proof of Switching Lemma
Lecture 2: oof of Switching Lemma Yuan Li June 25, 2014 1 Decision Tree Instead of bounding the probability that f can be written as some s-dnf, we estimate the probability that f can be computed by a
More informationFourier analysis of boolean functions in quantum computation
Fourier analysis of boolean functions in quantum computation Ashley Montanaro Centre for Quantum Information and Foundations, Department of Applied Mathematics and Theoretical Physics, University of Cambridge
More informationApproximating MAX-E3LIN is NP-Hard
Approximating MAX-E3LIN is NP-Hard Evan Chen May 4, 2016 This lecture focuses on the MAX-E3LIN problem. We prove that approximating it is NP-hard by a reduction from LABEL-COVER. 1 Introducing MAX-E3LIN
More informationNew Results for Random Walk Learning
Journal of Machine Learning Research 15 (2014) 3815-3846 Submitted 1/13; Revised 5/14; Published 11/14 New Results for Random Walk Learning Jeffrey C. Jackson Karl Wimmer Duquesne University 600 Forbes
More information+ ɛ Hardness for Max-E3Lin
Advanced Approximation Algorithms (CMU 5-854B, Spring 008) Lecture : δ vs. + ɛ Hardness for Max-E3Lin April 8, 008 Lecturer: Ryan O Donnell Scribe: Jonah Sherman Recap In this lecture, we finish the hardness
More informationLecture 19: Simple Applications of Fourier Analysis & Convolution. Fourier Analysis
Lecture 19: Simple Applications of & Convolution Recall I Let f : {0, 1} n R be a function Let = 2 n Inner product of two functions is defined as follows f, g := 1 f (x)g(x) For S {0, 1} n, define the
More informationLearning and Fourier Analysis
Learning and Fourier Analysis Grigory Yaroslavtsev http://grigory.us CIS 625: Computational Learning Theory Fourier Analysis and Learning Powerful tool for PAC-style learning under uniform distribution
More informationQuantum boolean functions
Quantum boolean functions Ashley Montanaro 1 and Tobias Osborne 2 1 Department of Computer Science 2 Department of Mathematics University of Bristol Royal Holloway, University of London Bristol, UK London,
More informationAgnostic Learning of Disjunctions on Symmetric Distributions
Agnostic Learning of Disjunctions on Symmetric Distributions Vitaly Feldman vitaly@post.harvard.edu Pravesh Kothari kothari@cs.utexas.edu May 26, 2014 Abstract We consider the problem of approximating
More informationCSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds
CSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds Lecturer: Toniann Pitassi Scribe: Robert Robere Winter 2014 1 Switching
More informationNotes for Lecture 25
U.C. Berkeley CS278: Computational Complexity Handout N25 ofessor Luca Trevisan 12/1/2004 Notes for Lecture 25 Circuit Lower Bounds for Parity Using Polynomials In this lecture we prove a lower bound on
More informationLecture 7: ɛ-biased and almost k-wise independent spaces
Lecture 7: ɛ-biased and almost k-wise independent spaces Topics in Complexity Theory and Pseudorandomness (pring 203) Rutgers University wastik Kopparty cribes: Ben Lund, Tim Naumovitz Today we will see
More informationLearning Unions of ω(1)-dimensional Rectangles
Learning Unions of ω(1)-dimensional Rectangles Alp Atıcı 1 and Rocco A. Servedio Columbia University, New York, NY, USA {atici@math,rocco@cs}.columbia.edu Abstract. We consider the problem of learning
More informationDefining the Integral
Defining the Integral In these notes we provide a careful definition of the Lebesgue integral and we prove each of the three main convergence theorems. For the duration of these notes, let (, M, µ) be
More informationFOURIER CONCENTRATION FROM SHRINKAGE
FOURIER CONCENTRATION FROM SHRINKAGE Russell Impagliazzo and Valentine Kabanets March 29, 2016 Abstract. For a class F of formulas (general de Morgan or read-once de Morgan), the shrinkage exponent Γ F
More informationCS 151 Complexity Theory Spring Solution Set 5
CS 151 Complexity Theory Spring 2017 Solution Set 5 Posted: May 17 Chris Umans 1. We are given a Boolean circuit C on n variables x 1, x 2,..., x n with m, and gates. Our 3-CNF formula will have m auxiliary
More informationCS Foundations of Communication Complexity
CS 49 - Foundations of Communication Complexity Lecturer: Toniann Pitassi 1 The Discrepancy Method Cont d In the previous lecture we ve outlined the discrepancy method, which is a method for getting lower
More information2 Completing the Hardness of approximation of Set Cover
CSE 533: The PCP Theorem and Hardness of Approximation (Autumn 2005) Lecture 15: Set Cover hardness and testing Long Codes Nov. 21, 2005 Lecturer: Venkat Guruswami Scribe: Atri Rudra 1 Recap We will first
More informationLecture 23: Alternation vs. Counting
CS 710: Complexity Theory 4/13/010 Lecture 3: Alternation vs. Counting Instructor: Dieter van Melkebeek Scribe: Jeff Kinne & Mushfeq Khan We introduced counting complexity classes in the previous lecture
More informationNotes for Lecture 11
Stanford University CS254: Computational Complexity Notes 11 Luca Trevisan 2/11/2014 Notes for Lecture 11 Circuit Lower Bounds for Parity Using Polynomials In this lecture we prove a lower bound on the
More informationJunta Approximations for Submodular, XOS and Self-Bounding Functions
Junta Approximations for Submodular, XOS and Self-Bounding Functions Vitaly Feldman Jan Vondrák IBM Almaden Research Center Simons Institute, Berkeley, October 2013 Feldman-Vondrák Approximations by Juntas
More informationFOURIER ANALYSIS OF BOOLEAN FUNCTIONS
FOURIER ANALYSIS OF BOOLEAN FUNCTIONS SAM SPIRO Abstract. This paper introduces the technique of Fourier analysis applied to Boolean functions. We use this technique to illustrate proofs of both Arrow
More information1 Approximate Counting by Random Sampling
COMP8601: Advanced Topics in Theoretical Computer Science Lecture 5: More Measure Concentration: Counting DNF Satisfying Assignments, Hoeffding s Inequality Lecturer: Hubert Chan Date: 19 Sep 2013 These
More informationOn the Efficiency of Noise-Tolerant PAC Algorithms Derived from Statistical Queries
Annals of Mathematics and Artificial Intelligence 0 (2001)?? 1 On the Efficiency of Noise-Tolerant PAC Algorithms Derived from Statistical Queries Jeffrey Jackson Math. & Comp. Science Dept., Duquesne
More informationStanford University CS254: Computational Complexity Handout 8 Luca Trevisan 4/21/2010
Stanford University CS254: Computational Complexity Handout 8 Luca Trevisan 4/2/200 Counting Problems Today we describe counting problems and the class #P that they define, and we show that every counting
More informationbe the set of complex valued 2π-periodic functions f on R such that
. Fourier series. Definition.. Given a real number P, we say a complex valued function f on R is P -periodic if f(x + P ) f(x) for all x R. We let be the set of complex valued -periodic functions f on
More information10.1 The Formal Model
67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate
More informationLearning Juntas. Elchanan Mossel CS and Statistics, U.C. Berkeley Berkeley, CA. Rocco A. Servedio MIT Department of.
Learning Juntas Elchanan Mossel CS and Statistics, U.C. Berkeley Berkeley, CA mossel@stat.berkeley.edu Ryan O Donnell Rocco A. Servedio MIT Department of Computer Science Mathematics Department, Cambridge,
More informationLearning DNF from Random Walks
Learning DNF from Random Walks Nader Bshouty Department of Computer Science Technion bshouty@cs.technion.ac.il Ryan O Donnell Institute for Advanced Study Princeton, NJ odonnell@theory.lcs.mit.edu Elchanan
More informationIntroduction to Cryptography
B504 / I538: Introduction to Cryptography Spring 2017 Lecture 15 Assignment 3 is due! Assignment 4 is out and is due in three weeks! 1 Recall: One-way functions (OWFs) Intuitively, a one-way function (OWF)
More informationLecture 22. m n c (k) i,j x i x j = c (k) k=1
Notes on Complexity Theory Last updated: June, 2014 Jonathan Katz Lecture 22 1 N P PCP(poly, 1) We show here a probabilistically checkable proof for N P in which the verifier reads only a constant number
More informationYale University Department of Computer Science
Yale University Department of Computer Science Lower Bounds on Learning Random Structures with Statistical Queries Dana Angluin David Eisenstat Leonid (Aryeh) Kontorovich Lev Reyzin YALEU/DCS/TR-42 December
More informationSeparating Models of Learning from Correlated and Uncorrelated Data
Separating Models of Learning from Correlated and Uncorrelated Data Ariel Elbaz, Homin K. Lee, Rocco A. Servedio, and Andrew Wan Department of Computer Science Columbia University {arielbaz,homin,rocco,atw12}@cs.columbia.edu
More information1 The Probably Approximately Correct (PAC) Model
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #3 Scribe: Yuhui Luo February 11, 2008 1 The Probably Approximately Correct (PAC) Model A target concept class C is PAC-learnable by
More informationLecture 3 Small bias with respect to linear tests
03683170: Expanders, Pseudorandomness and Derandomization 3/04/16 Lecture 3 Small bias with respect to linear tests Amnon Ta-Shma and Dean Doron 1 The Fourier expansion 1.1 Over general domains Let G be
More information1 Review of The Learning Setting
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we
More information1 Randomized Computation
CS 6743 Lecture 17 1 Fall 2007 1 Randomized Computation Why is randomness useful? Imagine you have a stack of bank notes, with very few counterfeit ones. You want to choose a genuine bank note to pay at
More informationarxiv: v1 [cs.cc] 29 Feb 2012
On the Distribution of the Fourier Spectrum of Halfspaces Ilias Diakonikolas 1, Ragesh Jaiswal 2, Rocco A. Servedio 3, Li-Yang Tan 3, and Andrew Wan 4 arxiv:1202.6680v1 [cs.cc] 29 Feb 2012 1 University
More informationOn the correlation of parity and small-depth circuits
Electronic Colloquium on Computational Complexity, Report No. 137 (2012) On the correlation of parity and small-depth circuits Johan Håstad KTH - Royal Institute of Technology November 1, 2012 Abstract
More informationPROPERTY TESTING LOWER BOUNDS VIA COMMUNICATION COMPLEXITY
PROPERTY TESTING LOWER BOUNDS VIA COMMUNICATION COMPLEXITY Eric Blais, Joshua Brody, and Kevin Matulef February 1, 01 Abstract. We develop a new technique for proving lower bounds in property testing,
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationCIS 800/002 The Algorithmic Foundations of Data Privacy September 29, Lecture 6. The Net Mechanism: A Partial Converse
CIS 800/002 The Algorithmic Foundations of Data Privacy September 29, 20 Lecturer: Aaron Roth Lecture 6 Scribe: Aaron Roth Finishing up from last time. Last time we showed: The Net Mechanism: A Partial
More information5.1 Learning using Polynomial Threshold Functions
CS 395T Computational Learning Theory Lecture 5: September 17, 2007 Lecturer: Adam Klivans Scribe: Aparajit Raghavan 5.1 Learning using Polynomial Threshold Functions 5.1.1 Recap Definition 1 A function
More informationLecture 18: Inapproximability of MAX-3-SAT
CS 880: Advanced Complexity Theory 3/7/2008 Lecture 18: Inapproximability of MAX-3-SAT Instructor: Dieter van Melkebeek Scribe: Jeff Kinne In this lecture we prove a tight inapproximability result for
More informationOn the Structure of Boolean Functions with Small Spectral Norm
On the Structure of Boolean Functions with Small Spectral Norm Amir Shpilka Ben lee Volk Abstract In this paper we prove results regarding Boolean functions with small spectral norm (the spectral norm
More informationLearning Combinatorial Functions from Pairwise Comparisons
Learning Combinatorial Functions from Pairwise Comparisons Maria-Florina Balcan Ellen Vitercik Colin White Abstract A large body of work in machine learning has focused on the problem of learning a close
More informationDegree and Sensitivity: tails of two distributions *
Electronic Colloquium on Computational Complexity, Report No. 69 (2016) Degree and Sensitivity: tails of two distributions * Parikshit Gopalan Microsoft Research parik@microsoft.com Rocco A. Servedio Columbia
More informationLecture 5: Probabilistic tools and Applications II
T-79.7003: Graphs and Networks Fall 2013 Lecture 5: Probabilistic tools and Applications II Lecturer: Charalampos E. Tsourakakis Oct. 11, 2013 5.1 Overview In the first part of today s lecture we will
More informationHarmonic Analysis on the Cube and Parseval s Identity
Lecture 3 Harmonic Analysis on the Cube and Parseval s Identity Jan 28, 2005 Lecturer: Nati Linial Notes: Pete Couperus and Neva Cherniavsky 3. Where we can use this During the past weeks, we developed
More informationLecture 5: Derandomization (Part II)
CS369E: Expanders May 1, 005 Lecture 5: Derandomization (Part II) Lecturer: Prahladh Harsha Scribe: Adam Barth Today we will use expanders to derandomize the algorithm for linearity test. Before presenting
More informationCSC 5170: Theory of Computational Complexity Lecture 9 The Chinese University of Hong Kong 15 March 2010
CSC 5170: Theory of Computational Complexity Lecture 9 The Chinese University of Hong Kong 15 March 2010 We now embark on a study of computational classes that are more general than NP. As these classes
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, ). Then: Pr[err D (h A ) > ɛ] δ
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, 018 Review Theorem (Occam s Razor). Say algorithm A finds a hypothesis h A H consistent with
More informationICML '97 and AAAI '97 Tutorials
A Short Course in Computational Learning Theory: ICML '97 and AAAI '97 Tutorials Michael Kearns AT&T Laboratories Outline Sample Complexity/Learning Curves: nite classes, Occam's VC dimension Razor, Best
More informationMachine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning
More informationNotes for Lecture 7. 1 Increasing the Stretch of Pseudorandom Generators
UC Bereley Handout N7 CS294: Pseudorandomness and Combinatorial Constructions September 20, 2005 Professor Luca Trevisan Scribe: Constantinos Dasalais Notes for Lecture 7 1 Increasing the Stretch of Pseudorandom
More informationComputational Learning Theory
CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful
More informationLecture 10: Hardness of approximating clique, FGLSS graph
CSE 533: The PCP Theorem and Hardness of Approximation (Autumn 2005) Lecture 10: Hardness of approximating clique, FGLSS graph Nov. 2, 2005 Lecturer: Venkat Guruswami and Ryan O Donnell Scribe: Ioannis
More informationFourier Concentration from Shrinkage
Fourier Concentration from Shrinkage Russell Impagliazzo Valentine Kabanets December, 205 Abstract For a class F of formulas general de Morgan or read-once de Morgan), the shrinkage exponent Γ F is the
More information1 AC 0 and Håstad Switching Lemma
princeton university cos 522: computational complexity Lecture 19: Circuit complexity Lecturer: Sanjeev Arora Scribe:Tony Wirth Complexity theory s Waterloo As we saw in an earlier lecture, if PH Σ P 2
More informationUpper Bounds on Fourier Entropy
Upper Bounds on Fourier Entropy Sourav Chakraborty 1, Raghav Kulkarni 2, Satyanarayana V. Lokam 3, and Nitin Saurabh 4 1 Chennai Mathematical Institute, Chennai, India sourav@cmi.ac.in 2 Centre for Quantum
More information1 Maintaining a Dictionary
15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition
More information2 Upper-bound of Generalization Error of AdaBoost
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y
More informationTesting Linear-Invariant Function Isomorphism
Testing Linear-Invariant Function Isomorphism Karl Wimmer and Yuichi Yoshida 2 Duquesne University 2 National Institute of Informatics and Preferred Infrastructure, Inc. wimmerk@duq.edu,yyoshida@nii.ac.jp
More informationLecture Introduction. 2 Formal Definition. CS CTT Current Topics in Theoretical CS Oct 30, 2012
CS 59000 CTT Current Topics in Theoretical CS Oct 30, 0 Lecturer: Elena Grigorescu Lecture 9 Scribe: Vivek Patel Introduction In this lecture we study locally decodable codes. Locally decodable codes are
More informationLecture 13: 04/23/2014
COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 13: 04/23/2014 Spring 2014 Scribe: Psallidas Fotios Administrative: Submit HW problem solutions by Wednesday,
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationEXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018
EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018 While these notes are under construction, I expect there will be many typos. The main reference for this is volume 1 of Hörmander, The analysis of liner
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric
More information1 Continuous extensions of submodular functions
CS 369P: Polyhedral techniques in combinatorial optimization Instructor: Jan Vondrák Lecture date: November 18, 010 1 Continuous extensions of submodular functions Submodular functions are functions assigning
More informationUnconditional Lower Bounds for Learning Intersections of Halfspaces
Unconditional Lower Bounds for Learning Intersections of Halfspaces Adam R. Klivans Alexander A. Sherstov The University of Texas at Austin Department of Computer Sciences Austin, TX 78712 USA {klivans,sherstov}@cs.utexas.edu
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16
600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an
More informationCS6840: Advanced Complexity Theory Mar 29, Lecturer: Jayalal Sarma M.N. Scribe: Dinesh K.
CS684: Advanced Complexity Theory Mar 29, 22 Lecture 46 : Size lower bounds for AC circuits computing Parity Lecturer: Jayalal Sarma M.N. Scribe: Dinesh K. Theme: Circuit Complexity Lecture Plan: Proof
More informationComplexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler
Complexity Theory Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 15 May, 2018 Reinhard
More informationOutline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.
Complexity Theory Complexity Theory Outline Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität
More informationCS Communication Complexity: Applications and New Directions
CS 2429 - Communication Complexity: Applications and New Directions Lecturer: Toniann Pitassi 1 Introduction In this course we will define the basic two-party model of communication, as introduced in the
More informationLearning Pseudo-Boolean k-dnf and Submodular Functions
Learning Pseudo-Boolean k-dnf and ubmodular Functions ofya Raskhodnikova Pennsylvania tate University sofya@cse.psu.edu Grigory Yaroslavtsev Pennsylvania tate University grigory@cse.psu.edu Abstract We
More informationConditional Sparse Linear Regression
COMS 6998-4 Fall 07 November 3, 07 Introduction. Background Conditional Sparse Linear Regression esenter: Xingyu Niu Scribe: Jinyi Zhang Given a distribution D over R d R and data samples (y, z) D, linear
More information: Error Correcting Codes. December 2017 Lecture 10
0368307: Error Correcting Codes. December 017 Lecture 10 The Linear-Programming Bound Amnon Ta-Shma and Dean Doron 1 The LP-bound We will prove the Linear-Programming bound due to [, 1], which gives an
More informationFrom Batch to Transductive Online Learning
From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org
More informationarxiv: v2 [cs.lg] 17 Apr 2013
Learning using Local Membership Queries arxiv:1211.0996v2 [cs.lg] 17 Apr 2013 Pranjal Awasthi Carnegie Mellon University pawasthi@cs.cmu.edu Vitaly Feldman IBM Almaden Research Center vitaly@post.harvard.edu
More informationLecture 3: Boolean Functions and the Walsh Hadamard Code 1
CS-E4550 Advanced Combinatorics in Computer Science Autumn 2016: Selected Topics in Complexity Theory Lecture 3: Boolean Functions and the Walsh Hadamard Code 1 1. Introduction The present lecture develops
More informationLearning Coverage Functions and Private Release of Marginals
Learning Coverage Functions and Private Release of Marginals Vitaly Feldman vitaly@post.harvard.edu Pravesh Kothari kothari@cs.utexas.edu Abstract We study the problem of approximating and learning coverage
More informationLecture 21: Counting and Sampling Problems
princeton univ. F 14 cos 521: Advanced Algorithm Design Lecture 21: Counting and Sampling Problems Lecturer: Sanjeev Arora Scribe: Today s topic of counting and sampling problems is motivated by computational
More information