Agnostic Learning of Disjunctions on Symmetric Distributions

Size: px
Start display at page:

Download "Agnostic Learning of Disjunctions on Symmetric Distributions"

Transcription

1 Agnostic Learning of Disjunctions on Symmetric Distributions Vitaly Feldman Pravesh Kothari May 26, 2014 Abstract We consider the problem of approximating and learning disjunctions (or equivalently, conjunctions) on symmetric distributions over {0, 1} n. Symmetric distributions are distributions whose PDF is invariant under any permutation of the variables. We give a simple proof that for every symmetric distribution D, there exists a set of n O(log (1/ɛ)) functions S, such that for every disjunction c, there is function p, expressible as a linear combination of functions in S, such that p ɛ-approximates c in l 1 distance on D or Ex D[ c(x) p(x) ] ɛ. This directly gives an agnostic learning algorithm for disjunctions on symmetric distributions that runs in time n O(log (1/ɛ)). The best known previous bound is n O(1/ɛ4) and follows from approximation of the more general class of halfspaces [Wimmer, 2010]. We also show that there exists a symmetric distribution D, such that the minimum degree of a polynomial that 1/3-approximates the disjunction of all n variables is l 1 distance on D is Ω( n). Therefore the learning result above cannot be achieved using the commonly used approximation by polynomials. Our technique also gives a simple proof that for any product distribution D and every disjunction c, there exists a polynomial p of degree O(log (1/ɛ)) such that p ɛ-approximates c in l 1 distance on D. This was first proved by Blais et al. [2008] via a more involved and general argument. 1 Introduction The goal of an agnostic learning algorithm for a concept class C is to produce, for any distribution on examples, a hypothesis h whose error on a random example from the distribution is close to the best possible by a concept from C. This model reflects a common empirical approach to learning, where few or no assumptions are made on the process that generates the examples and a limited space of candidate hypothesis functions is searched in an attempt to find the best approximation to the given data. Agnostic learning of disjunctions (or, equivalently, conjunctions) is a fundamental question in learning theory and a key step in learning algorithms for other concept classes such as DNF formulas and decision trees. There is no known polynomial-time algorithm for the problem, in fact the fastest algorithm that does not make any distributional assumptions runs in 2Õ( n) time [Kalai et al., 2008]. While the problem appears to be hard, strong hardness results are known only if the hypothesis is restricted to be a disjunction or a linear threshold function [Feldman et al., 2009, 2012]. Weaker, quasi-polynomial lower bounds are known assuming hardness of learning sparse parities with noise (see Sec. 5). In this note we consider this problem with an additional assumption that example points are distributed according to a symmetric or a product distribution. Symmetric and product distributions are two incomparable classes of distributions that generalize the well-studied uniform distribution. 1

2 1.1 Our Results We prove that disjunctions (and conjunctions) are learnable agnostically over any symmetric distribution in time n O(log(1/ɛ)). This matches the well-known upper bound for the uniform distribution. Our proof is based on l 1 -approximation of any disjunction by a linear combination of functions from a fixed set of functions. Such approximation directly gives an agnostic learning algorithm via l 1 -regression based approach introduced by Kalai et al. [2008]. A natural and commonly used set of basis functions is the set of all monomials on {0, 1} n of some bounded degree. It is easy to see that on product distributions with constant bias, disjunctions longer than some constant multiple of log(1/ɛ) are ɛ-close to the constant function 1. Therefore, polynomials of degree O(log(1/ɛ)) suffice for l 1 (or l 2 ) approximation on such distributions. This simple argument does not work for general product distributions. However it was shown by Blais et al. [2008] that the same degree (up to a constant factor) still suffices in this case. Their argument is based on the analysis of noise sensitivity under product distributions and implies additional interesting results. Interestingly, it turns out that polynomials cannot be used obtain the same result for all symmetric distributions: there exists a symmetric distribution for which disjunctions are no longer l 1 -approximated by low degree polynomials. Theorem 1.1. There exists a symmetric distribution D such that for c = x 1 x 2 x n, any polynomial p that satisfies Ex D[ c(x) p(x) ] 1/3 is of degree Ω( n). To prove this we consider the standard linear program [see Klivans and Sherstov, 2007] to find the coefficients of a degree r polynomial that minimizes pointwise error with the disjunction c. The key idea is to observe that an optimal point for the dual can be used to obtain a distribution on which the l 1 error of the best fitting polynomial p for c is same as the value of minimum pointwise error of any degree r polynomial with respect to c. When c is a symmetric function, one can further observe that the distribution so obtained is in fact symmetric. Combined with the degree lower bound for uniform approximation by polynomials by Klivans and Sherstov [2007], we obtain the result. The details of the proof appear in Sec Our approximation for general symmetric distributions is based on a proof that for the special case of the uniform distribution on S r (the points from { 1, 1} n with Hamming weight r), low-degree polynomials still work, namely, for any disjunction c, there is a polynomial p of degree at most O(log (1/ɛ)) such that the l 1 error Ex S r [ c(x) p(x) ] ɛ. Lemma 1.2. For r {0,..., n}, let S r denote the set of points in {0, 1} n that have exactly r 1 s and let D r denote the uniform distribution on S r. For every disjunction c and ɛ > 0, there exists a polynomial p of degree at most O(log (1/ɛ)) such that ED r [ c(x) p(x) ] ɛ. This result can be easily converted to a basis for approximating disjunctions over arbitrary symmetric distributions. All we need is to partition the domain {0, 1} n into layers as 0 r n S r and use a (different) polynomial for each layer. Formally, the basis now contains functions of the form IND(r) χ, where IND is the indicator function of being in layer of Hamming weight r and χ is a monomial of degree O(log(1/ɛ)). We note that a related strategy, of constructing a collection of functions, one for each layer of the cube was used by Wimmer [2010] to give n O(1/ɛ4) time agnostic learning algorithm for the class of halfspaces on symmetric distributions. However, his proof technique is based on an involved use of representation theory of the symmetric group and is not related to ours. This result together with a standard application of l 1 regression yields an agnostic learning algorithm for the class of disjunctions running in time n O(log(1/ɛ)). 2

3 Corollary 1.3. There is an algorithm that agnostically learns the class of disjunctions on arbitrary symmetric distributions on {0, 1} n in time n O(log (1/ɛ)). This learning algorithm was extended to the class of all coverage functions in [Feldman and Kothari, 2014], and then applied to the problem of privately releasing answers to all conjunction queries with low average error. As a corollary of this agnostic learning algorithm and distribution-specific agnostic boosting [Kalai and Kanade, 2009, Feldman, 2010] we also obtain learning algorithm for DNF formulas and decision trees. Corollary DNF formulas with s terms are PAC learnable with error ɛ in time n O(log(s/ɛ)) over all symmetric distributions; 2. Decision trees with s leaves are agnostically learnable with excess error ɛ in time n O(log(s/ɛ)) over all symmetric distributions. We remark that any algorithm that agnostically learns the class of disjunction on the uniform distribution in time n o(log ( 1 ɛ )) would yield a faster algorithm for the notoriously hard problem of Learning Sparse Parities with Noise. This is implicit in prior work [Kalai et al., 2008, Feldman, 2012] and we provide additional details in Section 5. Dachman-Soled et al. [2014] recently showed that l 1 approximation by polynomials is necessary for agnostic learning over any product distribution (at least in the statistical query framework of Kearns [1998]. Our agnostic learner demonstrates that the restriction to product distributions is necessary in their result and it cannot be extended to symmetric distributions. Finally, our proof technique also gives a simpler proof for the result of Blais et al. [2008] that implies approximation of disjunction by low-degree polynomials on all product distributions. Theorem 1.5. For any disjunction c and product distribution D on {0, 1} n, there is a polynomial p of degree O(log (1/ɛ)) such that Ex D[ c(x) p(x) ] ɛ. 2 Preliminaries We use {0, 1} n to denote the n-dimensional Boolean hypercube. Let [n] denote the set {1, 2,..., n}. For S [n], we denote by OR S : {0, 1} n {0, 1}, the monotone Boolean disjunction on variables with indices in S, that is, for any x {0, 1} n, OR S (x) = 0 i S x i = 0. One can define norms and errors with respect to any distribution D on {0, 1} n. Thus, for f : {0, 1} n R, we write the l 1 and l 2 norms of f as f 1 = Ex D[ f(x) ] and f 2 = E[f(x) 2 ] respectively. The l 1 and l 2 error of f with respect to g are given by f g 1 and f g 2 respectively. 2.1 Agnostic Learning The agnostic learning model is formally defined as follows [Haussler, 1992, Kearns et al., 1994]. Definition 2.1. Let F be a class of Boolean functions and let D be any fixed distribution on {0, 1} n. For any distribution P over {0, 1} n {0, 1}, let opt(p, F) be defined as: opt(p, F) = inf f F E (x,y) P [ y f(x) ]. An algorithm A, is said to agnostically learn F on D if for every excess error ɛ > 0 and any distribution P on {0, 1} n {0, 1} such that the marginal of P on {0, 1} n is D, given access to random independent examples drawn from P, with probability at least 2 3, A outputs a hypothesis h : {0, 1}n [0, 1], such that E (x,y) P [ h(x) y ] opt(p, F) + ɛ. 3

4 It is easy to see that given a set of t examples {(x i, y i )} i t and a set of m functions φ 1, φ 2,..., φ m finding coefficients α 1,..., α m which minimize α j φ j (x i ) y i i t j m can be formulated as a linear program. This LP is referred to as Least-Absolute-Error (LAE) LP or Least- Absolute-Deviation LP, or l 1 linear regression. As observed in Kalai et al. [2008], l 1 linear regression gives a general technique for agnostic learning of Boolean functions. Theorem 2.2. Let C be a class of Boolean functions, D be distribution on {0, 1} n and φ 1, φ 2,..., φ m : {0, 1} n R be a set of functions that can be evaluated in time polynomial in n. Assume that there exists such that for each f C, there exist reals α 1, α 2,..., α m such that E α i φ i (x) f(x). x D i m Then there is an algorithm that for every ɛ > 0 and any distribution P on {0, 1} n {0, 1} such that the marginal of P on {0, 1} n is D, given access to random independent examples drawn from P, with probability at least 2/3, outputs a function h such that E [ h(x) y ] + ɛ. (x,y) P The algorithm uses O(m/ɛ 2 ) examples, runs in time polynomial in n, m, 1/ɛ and returns a linear combination of φ i s. The output of this LP is not necessarily a Boolean function but can be converted to a Boolean function with disagreement error of + 2ɛ using h(x) θ function as a hypothesis for an appropriately chosen θ [Kalai et al., 2008]. 3 l 1 Approximation on Symmetric Distributions In this section, we show how to approximate the class of all disjunctions on any symmetric distribution by a linear combination of a small set of basis functions. As discussed above, polynomials of degree O(log (1/ɛ)) can ɛ-approximate any disjunction in l 1 distance on any product distribution. This is equivalent to using low-degree monomials as basis functions. We first show that this basis would not suffice for approximating disjunctions on symmetric distributions. Indeed, we construct a symmetric distribution on {0, 1} n, on which, any polynomial that approximates the monotone disjunction c = x 1 x 2... x n within l 1 error of 1/3 must be of degree Ω( n). 3.1 Lower Bound on l 1 Approximation by Polynomials In this section we give the proof of Theorem

5 Proof of Theorem 1.1. Let d : [n] {0, 1} be the predicate corresponding to the disjunction x 1 x 2... x n, that is, d(0) = 0 and d(i) = 1 for each i > 0. Consider a natural linear program to find a univariate polynomial f of degree at most d such that d f = max 0 i n d(i) f(i) is minimized. This program (and its dual) often comes up in proving polynomial degree lower bounds for various function classes (see [Klivans and Sherstov, 2007], for example). min ɛ s.t. ɛ d(m) r α i m i m {0,..., n} i=0 α i R i {0,..., r} If {α 0, α 1,..., α n } is a solution for the program above that has value ɛ then f(m) = r i=0 α im i is a degree r polynomial that approximates d within an error of at most ɛ at every point in {0,..., n}. Klivans and Sherstov [2007] show that there exists an r = Θ( n), such that the optimal value of the program above for r = r is ɛ 1/3. Standard manipulations (see [Klivans and Sherstov, 2007]) can be used to produce the dual of the program. s.t. max n β m d(m) m=0 n β m m i = 0 i {0,..., r} m=0 n β m 1 m=0 β m R m {0,..., n} Let β = {βm} m {0,...,n} denote an optimal solution for the dual program with r = r. Then, by strong duality, the value of the dual is also ɛ. Observe that n m=0 β m = 1, since otherwise we can scale up all the βm by the same factor and increase the value of the program while still satisfying the constraints. Let ρ : {0,..., n} [0, 1] be defined by ρ(m) = βm. Then ρ can be viewed as a density function of a distribution on {0,..., n} and we use it to define a symmetric distribution D on { 1, 1} n as follows: D(x) = ρ(w(x))/ ( n w(x)), where w(x) = n i=1 x i is the Hamming weight of point x. We now show that any polynomial p of degree r satisfies Ex D[ c(x) p(x) ] 1/3. We now extract a univariate polynomial f p that approximates d on the distribution with the density function ρ using p. Let p avg : { 1, 1} n R be obtained by averaging p over every layer. That is, p avg (x) = Ez D w(x) [p(z)], where w(x) denotes the Hamming weight of x. It is easy to check that since c is symmetric, p avg is at least as close to c as p in l 1 distance. Further, p avg is a symmetric function computed by a multivariate polynomial of degree at most r on {0, 1} n. Thus, the function f p (m) that gives the value of p avg on points of Hamming weight m can be computed by a univariate polynomial of degree r. Further, E [ c(x) p(x) ] E [ c(x) p avg (x) ] = E [ d(m) f p (m) ]. x D x D m ρ 5

6 Let us now estimate the error of f p w.r.t d on the distribution ρ. Using the fact that f p is of degree at most r and thus n m=0 f p(m) β m = 0 (enforced by the dual constraints), we have: E [ d(m) f p(m) ] E [(d(m) f p (m)) sign(β m ρ m ρ m)] n n = d(m) βm f p (m) βm m=0 = ɛ 0 = ɛ 1/3. Thus, the degree of any polynomial that approximates c on the distribution D with error of at most 1/3 is Ω( n). 3.2 Upper Bound In this section, we describe how to approximate disjunctions on any symmetric distribution by using a linear combination of functions from a set of small size. Recall that S r denotes the set of all points from {0, 1} n with weight r. As we have seen above, symmetric distributions can behave very differently when compared to (constant bounded) product distributions. However, for the special case of the uniform distribution on S r, denoted by D r, we show that for every disjunction c, there is a polynomial of degree O(log (1/ɛ)) that ɛ-approximates it in l 1 distance on D r. As described in Section 1.1, one can stitch together polynomial approximations on each S r to build a set of basis functions S such that every disjunction is well approximated by some linear combination of functions in S. Thus, our goal is now reduced to constructing approximating polynomials on D r. Proof of Lemma 1.2. We first assume that c is monotone and without loss of generality c = x 1 x k. We will also prove a slightly stronger claim that ED r [ c(x) p(x) ] ED r [(c(x) p(x)) 2 ] ɛ in this case. Let d : {0,..., k} ( {0, 1} be the predicate associated with the disjunction, that is d(i) = 1 whenever i 1. ) Note that c(x) = d i [k] x i. Therefore our goal is to find a univariate polynomial f that approximates d ( ) and then substitute p f (x) = f i [k] x i. Such substitution preserves the total degree of the polynomial. We break our construction into several cases based on the relative magnitudes of r, k and ɛ. If k 2 ln (1/ɛ), then the univariate polynomial that exactly computes the predicate d satisfies the requirements. Thus assume that k > 2 ln(1/ɛ). If r > n k, then, c always takes the value 1 on S r and thus the constant polynomial 1 achieves zero error. If on the other hand, if r (n/k) ln (1/ɛ), then, Pr x D r [c(x) = 0] = ( n k r ) r 1 ( n = r) i=0 m=0 ( 1 k ) (1 k/n) r e kr/n ɛ. n i In this case, the constant polynomial 1 achieves an l 2 2 error of at most Pr x D r [c(x) = 0] 1 ɛ. Finally, observe that r (n/k) ln (1/ɛ) and k > 2 ln(1/ɛ) implies r n/2. Thus, for the remaining part of the proof, assume that r < min{n k, (n/k) ln (1/ɛ), n/2}. Consider the univariate polynomial f : {0,..., k} R of degree t (for some t to be chosen later) that computes the predicate d exactly on {0,..., t}. This polynomial is given by f(w) = 1 1 t! t (i w) = i=1 6 { 1 ( w t) for w > t 1 for 0<w t 0 for w=0

7 Let δ j = Pr x D r [ {i x i = 1} = j] = ( n k ) ) ( r j k j ( n. r) The l 2 2 error of p f (x) on c satisfies, p f c 2 2 = We denote the RHS of this equality by d f 2 2. We first upper bound δ j as follows: δ j = ( n k ) ) ( r j k j ( n = r) E [(c(x) p f (x)) 2 ] = x Dr (n k)! (n k r + j)!(r j)! δ j ( ) j 2. t k! (n r)!r! (k j)!j! n! = 1 j! r! (r j)! k! (n r)! (n k)! (k j)! n! (n k r + j)! 1 j! (n k) (n k 1) (n k r + j + 1) (rk)j n (n 1) (n r + 1) 1 j! (n ln 1 (1/ɛ))j (n r + j) (n r + j 1) (n r + 1), where, in the second to last inequality, we used that r < n/k ln (1/ɛ) to conclude that rk (n ln (1/ɛ)). Now, r < n/2 and thus (n r + 1) > n/2. Therefore, δ j 2j (n ln (1/ɛ)) j n j j! = (2 ln (1/ɛ))j, j! and thus: d f 2 2 ( ) j 2 (2 ln (1/ɛ)) j. t j! Set t = 8e 2 ln (1/ɛ). Using j! > (j/e) j > (t/e) j for every j t + 1, we obtain: d f 2 2 ( ) 2 ln (1/ɛ) j 2 2j 8e 2 ɛ ln (1/ɛ) 1/e j ɛ. (1) To see that ED r [ c(x) p(x) ] ED r [(c(x) p(x)) 2 ] we note that in all cases and for all x, p(x) c(x) is either 0 or 1. This completes the proof of the monotone case. We next consider the more general case when c = x 1 x 2... x k1 x k1 +1 x k x k1 +k 2. Let c 1 = x 1 x 2... x k1 and c 2 = x k1 +1 x k x k1 +k 2 and k = k 1 + k 2. Observe that c = 1 (1 c 1 ) (1 c 2 ) = c 1 + c 2 c 1 c 2. Let p 1 be a polynomial of degree O(log (1/ɛ)) such that c 1 p 1 1 c 1 p ɛ/3. Note that if we swap 0 and 1 in {0, 1} n then c 2 will be equal to a monotone disjunction c 2 = x k1 +1 x k x k1 +k 2 and D r will become D n r. Therefore by the argument for the monotone case, there exists a polynomial p 2 of degree O(log (1/ɛ)) such that c 2 p 2 1 ɛ/3. By renaming the variables back we will obtain a polynomial 7

8 p 2 of degree O(log (1/ɛ)) such that c 2 p 2 1 c 2 p ɛ/3. Now let p = p 1 + p 2 p 1 p 2. Clearly the degree of p is O(log (1/ɛ)). We now show that c p 1 ɛ: E [ c(x) p(x) ] = E [ (1 c(x)) (1 p(x)) ] x D r x Dr = E x Dr [ (1 c 1 )(1 c 2 ) (1 p 1 )(1 p 2 ) ] = E x Dr [ (1 c 1 )(p 2 c 2 ) + (1 c 2 )(p 1 c 1 ) (c 1 p 1 )(c 2 p 2 ) ] E [ (1 c 1 )(p 2 c 2 ) ] + E [ (1 c 2 )(p 1 c 1 ) ] + E [ (c 1 p 1 )(c 2 p 2 ) ] x Dr x Dr x Dr E [ p 2 c 2 ] + E [ p 1 c 1 ] + E [(c 1 p 1 ) 2 ] E [(c 2 p 2 ) 2 ] x Dr x Dr x D r x Dr ɛ/3 + ɛ/3 + ɛ/3 = ɛ. 4 Polynomial Approximation on Product Distributions In this section, we show that for every product distribution D = i [n] D i, ɛ > 0 and every disjunction (or conjunction) c of length k, there exists a polynomial p : {0, 1} n R of degree O(log (1/ɛ)) such that p ɛ-approximates c within an l 1 distance on D. Proof of Theorem 1.5. First we note that without loss of generality we can assume that the disjunction c is equal to x 1 x 2 x k for some k [n]. We can assume monotonicity since we can convert negated variables to un-negated variables by swapping the roles of 0 and 1 for that variable. The obtained distribution will remain product after this operation. Further we can assume that k = n since variables with indices i > k do not affect probabilities of variables with indices k or the value of c(x). We first note that we can assume that Pr x D [x = 0 k ] > ɛ since otherwise constant polynomial 1 gives the desired approximation. Let µ i = Pr xi D i[x i = 1]. Since c is a symmetric function, its value at any x {0, 1} k depends only on the Hamming weight of x that we denote by w(x). Thus, we can equivalently work with the univariate predicate d : [k] {0, 1}, where d(i) = 1 for i > 0 and d(0) = 0. As in the proof of Lemma 1.2, we will approximate d by a univariate polynomial f and then use the polynomial p f (x) = f(w(x)) to approximate c. Let f : [k] R be the univariate polynomial of degree t that matches d on all points in {0, 1,..., t}. Thus, f(w) = 1 1 t { 1 ( w t! (w i) = t) for w > t 1 for 0<w t 0 for w=0 We have, i=1 E [(c(x) p f (x)) 2 ] = x D r Pr [w(x) = j] d(j) f(j) x D j=0 and we denote the RHS of this equation by d f 1. 8

9 Then: d f 1 = Let us now estimate Pr D [w(x) = j]. = Pr D [w(x) = j] = Pr D [w(x) = j] 1 f(j) Pr D [w(x) = j] S [n], S =j i S S [n], S =j i S ( ) j. (2) t µ i (1 µ i ) µ i i/ S Observe that in the expansion of ( k i=1 µ i) j, the term i S µ i occurs exactly j! times. Thus, Set µ avg = 1 k k i=1 µ i. We have: ɛ Pr x D [x = 0k ] = S [n], S =j i S µ i ( k i=1 µ i) j. j! ( k (1 µ i ) 1 1 k i=1 ) k µ i = (1 µ avg ) k. Thus, µ avg = c/k for some c 2 ln (1/ɛ) whenever k k 0 where k 0 is some universal constant. In what follows, assume that k k 0. (Otherwise, we can use the polynomial of degree equal to k that exactly computes the predicate d on all points). We are now ready to upper bound the error d f 1. From Equation (2), we have: d f 1 = Pr D [w(x) = j] ( ) j (2 ln(1/ɛ))j t j! ( ) j t i=1 ( k i=1 µ i) j j! Setting t = 4e 2 ln (1/ɛ) and using the calculation from Equation (1) in the proof of Lem. 1.2, we obtain that the error d f 1 ɛ. ( ) j t 5 Agnostic Learning of Disjunctions Combining Thm. 2.2 with the results of the previous section (and the discussion in Section 1.1), we obtain an agnostic learning algorithm for the class of all disjunctions on product and symmetric distributions running in time n O(log (1/ɛ)). 9

10 Corollary 5.1. There is an algorithm that agnostically learns the class of disjunctions on any product or symmetric distribution on {0, 1} n with excess error of at most ɛ in time n O(log (1/ɛ)). We now remark that any algorithm that agnostically learns the class of coverage functions on n inputs on the uniform distribution on {0, 1} n in time n o(log ( 1 ɛ )) would yield a faster algorithm for the notoriously hard problem of Learning Sparse Parities with Noise(SLPN). The reduction is based on the technique implicit in the work of Kalai et al. [2008], Feldman [2012]. For S [n], we use χ S to denote the parity of inputs with indices in S. Let U denote the uniform distribution on {0, 1} n. We say that random examples of a Boolean function f have noise of rate η if the label of a random example equals f(x) with probability 1 η and 1 f(x) with probability η. Problem (Learning Sparse Parities with Noise). For η (0, 1/2) and k n the problem of learning k-sparse parities with noise η is the problem of finding (with probability at least 2/3) the set S [n], S k, given access to random examples with noise of rate η of parity function χ S. The fastest known algorithm for learning k-sparse parities with noise η is a recent breakthrough result of Valiant which runs in time O(n 0.8k 1 poly( 1 2η )) [Valiant, 2012]. Kalai et al. [2008] and Feldman [2012] prove hardness of agnostic learning of majorities and conjunctions, respectively, based on correlation of concepts in these classes with parities. We state below this general relationship between correlation with parities and reduction to SLPN, a simple proof of which appears in [Feldman et al., 2013]. Lemma 5.2. Let C be a class of Boolean functions on {0, 1} n. Suppose, there exist γ > 0 and k N such that for every S [n], S k, there exists a function, f S C, such that Ex U[f S (x)χ S (x)] γ(k). If there exists an algorithm A that learns the class C agnostically to accuracy ɛ in time T (n, 1 ɛ ) then, there exists an algorithm A 1 that learns k-sparse parities with noise η < 1/2 in time poly(n, (1 2η)γ(k) )+2T (n, 2 (1 2η)γ(k) ). The correlation between a disjunction and a parity is easy to estimate. Fact 5.3. For any S [n], Ex U[OR S (x)χ S (x)] = 1 2 S 1. We thus immediately obtain the following simple corollary. Theorem 5.4. Suppose there exists an algorithm that learns the class of Boolean disjunctions over the uniform distribution agnostically to an accuracy of ɛ > 0 in time T (n, 1 ɛ ). Then there exists an algorithm that learns k-sparse parities with noise η < 1 2k 1 2 in time poly(n, 1 2η T (n, 1 ɛ ) = no(log (1/ɛ)), then, there exists an algorithm to solve k-slpn in time n o(k). 2k 1 ) + 2T (n, 1 2η ). In particular, if Thus, any algorithm that is asymptotically faster than the one from Cor. 5.1 yields a faster algorithm for k-slpn. References E. Blais, R. O Donnell, and K. Wimmer. Polynomial regression under arbitrary product distributions. In COLT, pages , Dana Dachman-Soled, Vitaly Feldman, Li-Yang Tan, Andrew Wan, and Karl Wimmer. Approximate resilience, monotonicity, and the complexity of agnostic learning. arxiv, CoRR, abs/ ,

11 V. Feldman. Distribution-specific agnostic boosting. In Proceedings of Innovations in Computer Science, pages , V. Feldman. A complete characterization of statistical query learning with applications to evolvability. Journal of Computer System Sciences, 78(5): , V. Feldman, P. Gopalan, S. Khot, and A. Ponuswami. On agnostic learning of parities, monomials and halfspaces. SIAM Journal on Computing, 39(2): , V. Feldman, P. Kothari, and J. Vondrák. Representation, approximation and learning of submodular functions using low-rank decision trees. In COLT, pages 30: , Vitaly Feldman and Pravesh Kothari. Learning coverage functions and private release of marginals. In COLT, Vitaly Feldman, Venkatesan Guruswami, Prasad Raghavendra, and Yi Wu. Agnostic learning of monomials by halfspaces is hard. SIAM J. Comput., 41(6): , D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1):78 150, ISSN A. Kalai and V. Kanade. Potential-based agnostic boosting. In Proceedings of NIPS, pages , A. Kalai, A. Klivans, Y. Mansour, and R. Servedio. Agnostically learning halfspaces. SIAM J. Comput., 37 (6): , M. Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM, 45(6): , M. Kearns, R. Schapire, and L. Sellie. Toward efficient agnostic learning. Machine Learning, 17(2-3): , A. Klivans and A. Sherstov. A lower bound for agnostically learning disjunctions. In COLT, pages , G. Valiant. Finding correlations in subquadratic time, with applications to learning parities and juntas. In FOCS, Karl Wimmer. Agnostically learning under permutation invariant distributions. In FOCS, pages ,

Embedding Hard Learning Problems Into Gaussian Space

Embedding Hard Learning Problems Into Gaussian Space Embedding Hard Learning Problems Into Gaussian Space Adam Klivans and Pravesh Kothari The University of Texas at Austin, Austin, Texas, USA {klivans,kothari}@cs.utexas.edu Abstract We give the first representation-independent

More information

Learning Coverage Functions and Private Release of Marginals

Learning Coverage Functions and Private Release of Marginals Learning Coverage Functions and Private Release of Marginals Vitaly Feldman vitaly@post.harvard.edu Pravesh Kothari kothari@cs.utexas.edu Abstract We study the problem of approximating and learning coverage

More information

Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees

Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees JMLR: Workshop and Conference Proceedings vol 30:1 30, 2013 Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees Vitaly Feldman IBM Research - Almaden, San Jose,

More information

3 Finish learning monotone Boolean functions

3 Finish learning monotone Boolean functions COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 5: 02/19/2014 Spring 2014 Scribes: Dimitris Paidarakis 1 Last time Finished KM algorithm; Applications of KM

More information

On Noise-Tolerant Learning of Sparse Parities and Related Problems

On Noise-Tolerant Learning of Sparse Parities and Related Problems On Noise-Tolerant Learning of Sparse Parities and Related Problems Elena Grigorescu, Lev Reyzin, and Santosh Vempala School of Computer Science Georgia Institute of Technology 266 Ferst Drive, Atlanta

More information

Junta Approximations for Submodular, XOS and Self-Bounding Functions

Junta Approximations for Submodular, XOS and Self-Bounding Functions Junta Approximations for Submodular, XOS and Self-Bounding Functions Vitaly Feldman Jan Vondrák IBM Almaden Research Center Simons Institute, Berkeley, October 2013 Feldman-Vondrák Approximations by Juntas

More information

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center On the power and the limits of evolvability Vitaly Feldman Almaden Research Center Learning from examples vs evolvability Learnable from examples Evolvable Parity functions The core model: PAC [Valiant

More information

Optimal Hardness Results for Maximizing Agreements with Monomials

Optimal Hardness Results for Maximizing Agreements with Monomials Optimal Hardness Results for Maximizing Agreements with Monomials Vitaly Feldman Harvard University Cambridge, MA 02138 vitaly@eecs.harvard.edu Abstract We consider the problem of finding a monomial (or

More information

Learning DNF Expressions from Fourier Spectrum

Learning DNF Expressions from Fourier Spectrum Learning DNF Expressions from Fourier Spectrum Vitaly Feldman IBM Almaden Research Center vitaly@post.harvard.edu May 3, 2012 Abstract Since its introduction by Valiant in 1984, PAC learning of DNF expressions

More information

Unconditional Lower Bounds for Learning Intersections of Halfspaces

Unconditional Lower Bounds for Learning Intersections of Halfspaces Unconditional Lower Bounds for Learning Intersections of Halfspaces Adam R. Klivans Alexander A. Sherstov The University of Texas at Austin Department of Computer Sciences Austin, TX 78712 USA {klivans,sherstov}@cs.utexas.edu

More information

1 Last time and today

1 Last time and today COMS 6253: Advanced Computational Learning Spring 2012 Theory Lecture 12: April 12, 2012 Lecturer: Rocco Servedio 1 Last time and today Scribe: Dean Alderucci Previously: Started the BKW algorithm for

More information

Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas

Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas Vitaly Feldman IBM Research - Almaden San Jose, CA, USA Email: vitaly@post.harvard.edu Jan Vondrák IBM Research - Almaden San Jose,

More information

Learning and Fourier Analysis

Learning and Fourier Analysis Learning and Fourier Analysis Grigory Yaroslavtsev http://grigory.us Slides at http://grigory.us/cis625/lecture2.pdf CIS 625: Computational Learning Theory Fourier Analysis and Learning Powerful tool for

More information

5.1 Learning using Polynomial Threshold Functions

5.1 Learning using Polynomial Threshold Functions CS 395T Computational Learning Theory Lecture 5: September 17, 2007 Lecturer: Adam Klivans Scribe: Aparajit Raghavan 5.1 Learning using Polynomial Threshold Functions 5.1.1 Recap Definition 1 A function

More information

Lecture 5: February 16, 2012

Lecture 5: February 16, 2012 COMS 6253: Advanced Computational Learning Theory Lecturer: Rocco Servedio Lecture 5: February 16, 2012 Spring 2012 Scribe: Igor Carboni Oliveira 1 Last time and today Previously: Finished first unit on

More information

Learning large-margin halfspaces with more malicious noise

Learning large-margin halfspaces with more malicious noise Learning large-margin halfspaces with more malicious noise Philip M. Long Google plong@google.com Rocco A. Servedio Columbia University rocco@cs.columbia.edu Abstract We describe a simple algorithm that

More information

Yale University Department of Computer Science

Yale University Department of Computer Science Yale University Department of Computer Science Lower Bounds on Learning Random Structures with Statistical Queries Dana Angluin David Eisenstat Leonid (Aryeh) Kontorovich Lev Reyzin YALEU/DCS/TR-42 December

More information

Distribution Free Learning with Local Queries

Distribution Free Learning with Local Queries Distribution Free Learning with Local Queries Galit Bary-Weisberg Amit Daniely Shai Shalev-Shwartz March 14, 2016 arxiv:1603.03714v1 [cs.lg] 11 Mar 2016 Abstract The model of learning with local membership

More information

Learning and 1-bit Compressed Sensing under Asymmetric Noise

Learning and 1-bit Compressed Sensing under Asymmetric Noise JMLR: Workshop and Conference Proceedings vol 49:1 39, 2016 Learning and 1-bit Compressed Sensing under Asymmetric Noise Pranjal Awasthi Rutgers University Maria-Florina Balcan Nika Haghtalab Hongyang

More information

arxiv: v2 [cs.lg] 17 Apr 2013

arxiv: v2 [cs.lg] 17 Apr 2013 Learning using Local Membership Queries arxiv:1211.0996v2 [cs.lg] 17 Apr 2013 Pranjal Awasthi Carnegie Mellon University pawasthi@cs.cmu.edu Vitaly Feldman IBM Almaden Research Center vitaly@post.harvard.edu

More information

Testing by Implicit Learning

Testing by Implicit Learning Testing by Implicit Learning Rocco Servedio Columbia University ITCS Property Testing Workshop Beijing January 2010 What this talk is about 1. Testing by Implicit Learning: method for testing classes of

More information

Learning symmetric non-monotone submodular functions

Learning symmetric non-monotone submodular functions Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru

More information

New Results for Random Walk Learning

New Results for Random Walk Learning Journal of Machine Learning Research 15 (2014) 3815-3846 Submitted 1/13; Revised 5/14; Published 11/14 New Results for Random Walk Learning Jeffrey C. Jackson Karl Wimmer Duquesne University 600 Forbes

More information

Separating Models of Learning with Faulty Teachers

Separating Models of Learning with Faulty Teachers Separating Models of Learning with Faulty Teachers Vitaly Feldman a,,1 Shrenik Shah b a IBM Almaden Research Center, San Jose, CA 95120, USA b Harvard University, Cambridge, MA 02138, USA Abstract We study

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex bodies is hard Navin Goyal Microsoft Research India navingo@microsoft.com Luis Rademacher Georgia Tech lrademac@cc.gatech.edu Abstract We show that learning a convex body in R d, given

More information

Lecture 4: LMN Learning (Part 2)

Lecture 4: LMN Learning (Part 2) CS 294-114 Fine-Grained Compleity and Algorithms Sept 8, 2015 Lecture 4: LMN Learning (Part 2) Instructor: Russell Impagliazzo Scribe: Preetum Nakkiran 1 Overview Continuing from last lecture, we will

More information

Hardness Results for Agnostically Learning Low-Degree Polynomial Threshold Functions

Hardness Results for Agnostically Learning Low-Degree Polynomial Threshold Functions Hardness Results for Agnostically Learning Low-Degree Polynomial Threshold Functions Ilias Diakonikolas Columbia University ilias@cs.columbia.edu Rocco A. Servedio Columbia University rocco@cs.columbia.edu

More information

Learning Functions of Halfspaces Using Prefix Covers

Learning Functions of Halfspaces Using Prefix Covers JMLR: Workshop and Conference Proceedings vol (2010) 15.1 15.10 25th Annual Conference on Learning Theory Learning Functions of Halfspaces Using Prefix Covers Parikshit Gopalan MSR Silicon Valley, Mountain

More information

Fourier analysis of boolean functions in quantum computation

Fourier analysis of boolean functions in quantum computation Fourier analysis of boolean functions in quantum computation Ashley Montanaro Centre for Quantum Information and Foundations, Department of Applied Mathematics and Theoretical Physics, University of Cambridge

More information

Learning Juntas. Elchanan Mossel CS and Statistics, U.C. Berkeley Berkeley, CA. Rocco A. Servedio MIT Department of.

Learning Juntas. Elchanan Mossel CS and Statistics, U.C. Berkeley Berkeley, CA. Rocco A. Servedio MIT Department of. Learning Juntas Elchanan Mossel CS and Statistics, U.C. Berkeley Berkeley, CA mossel@stat.berkeley.edu Ryan O Donnell Rocco A. Servedio MIT Department of Computer Science Mathematics Department, Cambridge,

More information

Introduction to Computational Learning Theory

Introduction to Computational Learning Theory Introduction to Computational Learning Theory The classification problem Consistent Hypothesis Model Probably Approximately Correct (PAC) Learning c Hung Q. Ngo (SUNY at Buffalo) CSE 694 A Fun Course 1

More information

Discriminative Learning can Succeed where Generative Learning Fails

Discriminative Learning can Succeed where Generative Learning Fails Discriminative Learning can Succeed where Generative Learning Fails Philip M. Long, a Rocco A. Servedio, b,,1 Hans Ulrich Simon c a Google, Mountain View, CA, USA b Columbia University, New York, New York,

More information

CS446: Machine Learning Spring Problem Set 4

CS446: Machine Learning Spring Problem Set 4 CS446: Machine Learning Spring 2017 Problem Set 4 Handed Out: February 27 th, 2017 Due: March 11 th, 2017 Feel free to talk to other members of the class in doing the homework. I am more concerned that

More information

CSE 291: Fourier analysis Chapter 2: Social choice theory

CSE 291: Fourier analysis Chapter 2: Social choice theory CSE 91: Fourier analysis Chapter : Social choice theory 1 Basic definitions We can view a boolean function f : { 1, 1} n { 1, 1} as a means to aggregate votes in a -outcome election. Common examples are:

More information

Faster Algorithms for Privately Releasing Marginals

Faster Algorithms for Privately Releasing Marginals Faster Algorithms for Privately Releasing Marginals Justin Thaler Jonathan Ullman Salil Vadhan School of Engineering and Applied Sciences & Center for Research on Computation and Society Harvard University,

More information

Learning and Testing Submodular Functions

Learning and Testing Submodular Functions Learning and Testing Submodular Functions Grigory Yaroslavtsev http://grigory.us Slides at http://grigory.us/cis625/lecture3.pdf CIS 625: Computational Learning Theory Submodularity Discrete analog of

More information

Polynomial regression under arbitrary product distributions

Polynomial regression under arbitrary product distributions Polynomial regression under arbitrary product distributions Eric Blais and Ryan O Donnell and Karl Wimmer Carnegie Mellon University Abstract In recent work, Kalai, Klivans, Mansour, and Servedio [KKMS05]

More information

Polynomial time Prediction Strategy with almost Optimal Mistake Probability

Polynomial time Prediction Strategy with almost Optimal Mistake Probability Polynomial time Prediction Strategy with almost Optimal Mistake Probability Nader H. Bshouty Department of Computer Science Technion, 32000 Haifa, Israel bshouty@cs.technion.ac.il Abstract We give the

More information

Lecture 29: Computational Learning Theory

Lecture 29: Computational Learning Theory CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational

More information

1 Differential Privacy and Statistical Query Learning

1 Differential Privacy and Statistical Query Learning 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose

More information

Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions

Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions Adam R Klivans UT-Austin klivans@csutexasedu Philip M Long Google plong@googlecom April 10, 2009 Alex K Tang

More information

Attribute-Efficient Learning and Weight-Degree Tradeoffs for Polynomial Threshold Functions

Attribute-Efficient Learning and Weight-Degree Tradeoffs for Polynomial Threshold Functions JMLR: Workshop and Conference Proceedings vol 23 (2012) 14.1 14.19 25th Annual Conference on Learning Theory Attribute-Efficient Learning and Weight-Degree Tradeoffs for Polynomial Threshold Functions

More information

Computational Lower Bounds for Statistical Estimation Problems

Computational Lower Bounds for Statistical Estimation Problems Computational Lower Bounds for Statistical Estimation Problems Ilias Diakonikolas (USC) (joint with Daniel Kane (UCSD) and Alistair Stewart (USC)) Workshop on Local Algorithms, MIT, June 2018 THIS TALK

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son

More information

Learning Combinatorial Functions from Pairwise Comparisons

Learning Combinatorial Functions from Pairwise Comparisons Learning Combinatorial Functions from Pairwise Comparisons Maria-Florina Balcan Ellen Vitercik Colin White Abstract A large body of work in machine learning has focused on the problem of learning a close

More information

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning 0.1 Linear Functions So far, we have been looking at Linear Functions { as a class of functions which can 1 if W1 X separate some data and not

More information

Boosting and Hard-Core Set Construction

Boosting and Hard-Core Set Construction Machine Learning, 51, 217 238, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Boosting and Hard-Core Set Construction ADAM R. KLIVANS Laboratory for Computer Science, MIT, Cambridge,

More information

arxiv: v1 [cs.cc] 29 Feb 2012

arxiv: v1 [cs.cc] 29 Feb 2012 On the Distribution of the Fourier Spectrum of Halfspaces Ilias Diakonikolas 1, Ragesh Jaiswal 2, Rocco A. Servedio 3, Li-Yang Tan 3, and Andrew Wan 4 arxiv:1202.6680v1 [cs.cc] 29 Feb 2012 1 University

More information

A Noisy-Influence Regularity Lemma for Boolean Functions Chris Jones

A Noisy-Influence Regularity Lemma for Boolean Functions Chris Jones A Noisy-Influence Regularity Lemma for Boolean Functions Chris Jones Abstract We present a regularity lemma for Boolean functions f : {, } n {, } based on noisy influence, a measure of how locally correlated

More information

Optimal Cryptographic Hardness of Learning Monotone Functions

Optimal Cryptographic Hardness of Learning Monotone Functions Optimal Cryptographic Hardness of Learning Monotone Functions Dana Dachman-Soled, Homin K. Lee, Tal Malkin, Rocco A. Servedio, Andrew Wan, and Hoeteck Wee {dglasner,homin,tal,rocco,atw,hoeteck}@cs.columbia.edu

More information

On Learning Monotone DNF under Product Distributions

On Learning Monotone DNF under Product Distributions On Learning Monotone DNF under Product Distributions Rocco A. Servedio Department of Computer Science Columbia University New York, NY 10027 rocco@cs.columbia.edu December 4, 2003 Abstract We show that

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

Agnostic Learning of Monomials by Halfspaces is Hard

Agnostic Learning of Monomials by Halfspaces is Hard Agnostic Learning of Monomials by Halfspaces is Hard Vitaly Feldman Venatesan Guruswami Prasad Raghavendra Yi Wu IBM Almaden Research Center San Jose, CA Email: vitaly@post.harvard.edu Computer Science

More information

On the Sample Complexity of Noise-Tolerant Learning

On the Sample Complexity of Noise-Tolerant Learning On the Sample Complexity of Noise-Tolerant Learning Javed A. Aslam Department of Computer Science Dartmouth College Hanover, NH 03755 Scott E. Decatur Laboratory for Computer Science Massachusetts Institute

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7: Computational Complexity of Learning Agnostic Learning Hardness of Learning via Crypto Assumption: No poly-time algorithm

More information

3.1 Decision Trees Are Less Expressive Than DNFs

3.1 Decision Trees Are Less Expressive Than DNFs CS 395T Computational Complexity of Machine Learning Lecture 3: January 27, 2005 Scribe: Kevin Liu Lecturer: Adam Klivans 3.1 Decision Trees Are Less Expressive Than DNFs 3.1.1 Recap Recall the discussion

More information

Tight Bounds on Proper Equivalence Query Learning of DNF

Tight Bounds on Proper Equivalence Query Learning of DNF JMLR: Workshop and Conference Proceedings vol (2012) 1 19 Tight Bounds on Proper Equivalence Query Learning of DNF Lisa Hellerstein Devorah Kletenik Linda Sellie Polytechnic Institute of NYU Rocco Servedio

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,

More information

Computational Learning Theory. Definitions

Computational Learning Theory. Definitions Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational

More information

Mansour s Conjecture is True for Random DNF Formulas

Mansour s Conjecture is True for Random DNF Formulas Mansour s Conjecture is True for Random DNF Formulas Adam Klivans University of Texas at Austin klivans@cs.utexas.edu Homin K. Lee University of Texas at Austin homin@cs.utexas.edu March 9, 2010 Andrew

More information

Statistical Query Learning (1993; Kearns)

Statistical Query Learning (1993; Kearns) Statistical Query Learning (1993; Kearns) Vitaly Feldman, IBM Research - Almaden researcher.ibm.com/view.php?person=us-vitaly entry editor: Rocco A. Servedio INDEX TERMS: Statistical query, PAC learning,

More information

Lecture 10: Learning DNF, AC 0, Juntas. 1 Learning DNF in Almost Polynomial Time

Lecture 10: Learning DNF, AC 0, Juntas. 1 Learning DNF in Almost Polynomial Time Analysis of Boolean Functions (CMU 8-859S, Spring 2007) Lecture 0: Learning DNF, AC 0, Juntas Feb 5, 2007 Lecturer: Ryan O Donnell Scribe: Elaine Shi Learning DNF in Almost Polynomial Time From previous

More information

Separating Models of Learning from Correlated and Uncorrelated Data

Separating Models of Learning from Correlated and Uncorrelated Data Separating Models of Learning from Correlated and Uncorrelated Data Ariel Elbaz, Homin K. Lee, Rocco A. Servedio, and Andrew Wan Department of Computer Science Columbia University {arielbaz,homin,rocco,atw12}@cs.columbia.edu

More information

On the Efficiency of Noise-Tolerant PAC Algorithms Derived from Statistical Queries

On the Efficiency of Noise-Tolerant PAC Algorithms Derived from Statistical Queries Annals of Mathematics and Artificial Intelligence 0 (2001)?? 1 On the Efficiency of Noise-Tolerant PAC Algorithms Derived from Statistical Queries Jeffrey Jackson Math. & Comp. Science Dept., Duquesne

More information

Polynomial regression under arbitrary product distributions

Polynomial regression under arbitrary product distributions Mach Learn (2010) 80: 273 294 DOI 10.1007/s10994-010-5179-6 Polynomial regression under arbitrary product distributions Eric Blais Ryan O Donnell Karl Wimmer Received: 15 March 2009 / Accepted: 1 November

More information

Algorithms and hardness results for parallel large margin learning

Algorithms and hardness results for parallel large margin learning Algorithms and hardness results for parallel large margin learning Philip M. Long Google plong@google.com Rocco A. Servedio Columbia University rocco@cs.columbia.edu Abstract We study the fundamental problem

More information

Maximum Margin Algorithms with Boolean Kernels

Maximum Margin Algorithms with Boolean Kernels Maximum Margin Algorithms with Boolean Kernels Roni Khardon and Rocco A. Servedio 2 Department of Computer Science, Tufts University Medford, MA 0255, USA roni@cs.tufts.edu 2 Department of Computer Science,

More information

8.1 Polynomial Threshold Functions

8.1 Polynomial Threshold Functions CS 395T Computational Learning Theory Lecture 8: September 22, 2008 Lecturer: Adam Klivans Scribe: John Wright 8.1 Polynomial Threshold Functions In the previous lecture, we proved that any function over

More information

Learning Sparse Perceptrons

Learning Sparse Perceptrons Learning Sparse Perceptrons Jeffrey C. Jackson Mathematics & Computer Science Dept. Duquesne University 600 Forbes Ave Pittsburgh, PA 15282 jackson@mathcs.duq.edu Mark W. Craven Computer Sciences Dept.

More information

Chebyshev Polynomials, Approximate Degree, and Their Applications

Chebyshev Polynomials, Approximate Degree, and Their Applications Chebyshev Polynomials, Approximate Degree, and Their Applications Justin Thaler 1 Georgetown University Boolean Functions Boolean function f : { 1, 1} n { 1, 1} AND n (x) = { 1 (TRUE) if x = ( 1) n 1 (FALSE)

More information

Learning DNF from Random Walks

Learning DNF from Random Walks Learning DNF from Random Walks Nader Bshouty Department of Computer Science Technion bshouty@cs.technion.ac.il Ryan O Donnell Institute for Advanced Study Princeton, NJ odonnell@theory.lcs.mit.edu Elchanan

More information

CS Foundations of Communication Complexity

CS Foundations of Communication Complexity CS 49 - Foundations of Communication Complexity Lecturer: Toniann Pitassi 1 The Discrepancy Method Cont d In the previous lecture we ve outlined the discrepancy method, which is a method for getting lower

More information

Chebyshev Polynomials and Approximation Theory in Theoretical Computer Science and Algorithm Design

Chebyshev Polynomials and Approximation Theory in Theoretical Computer Science and Algorithm Design Chebyshev Polynomials and Approximation Theory in Theoretical Computer Science and Algorithm Design (Talk for MIT s Danny Lewin Theory Student Retreat, 2015) Cameron Musco October 8, 2015 Abstract I will

More information

Computational Learning Theory

Computational Learning Theory 1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number

More information

Faster Private Release of Marginals on Small Databases

Faster Private Release of Marginals on Small Databases Faster Private Release of Marginals on Small Databases Karthekeyan Chandrasekaran School of Engineering and Applied Sciences Harvard University karthe@seasharvardedu Justin Thaler School of Engineering

More information

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6 CS 395T Computational Learning Theory Lecture 3: September 0, 2007 Lecturer: Adam Klivans Scribe: Mike Halcrow 3. Decision List Recap In the last class, we determined that, when learning a t-decision list,

More information

Uniform-Distribution Attribute Noise Learnability

Uniform-Distribution Attribute Noise Learnability Uniform-Distribution Attribute Noise Learnability Nader H. Bshouty Technion Haifa 32000, Israel bshouty@cs.technion.ac.il Christino Tamon Clarkson University Potsdam, NY 13699-5815, U.S.A. tino@clarkson.edu

More information

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity Universität zu Lübeck Institut für Theoretische Informatik Lecture notes on Knowledge-Based and Learning Systems by Maciej Liśkiewicz Lecture 5: Efficient PAC Learning 1 Consistent Learning: a Bound on

More information

Upper Bounds on Fourier Entropy

Upper Bounds on Fourier Entropy Upper Bounds on Fourier Entropy Sourav Chakraborty 1, Raghav Kulkarni 2, Satyanarayana V. Lokam 3, and Nitin Saurabh 4 1 Chennai Mathematical Institute, Chennai, India sourav@cmi.ac.in 2 Centre for Quantum

More information

Linear Classifiers: Expressiveness

Linear Classifiers: Expressiveness Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture outline Linear classifiers: Introduction What functions do linear classifiers express?

More information

On Learning µ-perceptron Networks with Binary Weights

On Learning µ-perceptron Networks with Binary Weights On Learning µ-perceptron Networks with Binary Weights Mostefa Golea Ottawa-Carleton Institute for Physics University of Ottawa Ottawa, Ont., Canada K1N 6N5 050287@acadvm1.uottawa.ca Mario Marchand Ottawa-Carleton

More information

From Batch to Transductive Online Learning

From Batch to Transductive Online Learning From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org

More information

Learning Hurdles for Sleeping Experts

Learning Hurdles for Sleeping Experts Learning Hurdles for Sleeping Experts Varun Kanade EECS, University of California Berkeley, CA, USA vkanade@eecs.berkeley.edu Thomas Steinke SEAS, Harvard University Cambridge, MA, USA tsteinke@fas.harvard.edu

More information

Lecture 7: Passive Learning

Lecture 7: Passive Learning CS 880: Advanced Complexity Theory 2/8/2008 Lecture 7: Passive Learning Instructor: Dieter van Melkebeek Scribe: Tom Watson In the previous lectures, we studied harmonic analysis as a tool for analyzing

More information

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory

More information

Boosting and Hard-Core Sets

Boosting and Hard-Core Sets Boosting and Hard-Core Sets Adam R. Klivans Department of Mathematics MIT Cambridge, MA 02139 klivans@math.mit.edu Rocco A. Servedio Ý Division of Engineering and Applied Sciences Harvard University Cambridge,

More information

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016 Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four

More information

Learning Pseudo-Boolean k-dnf and Submodular Functions

Learning Pseudo-Boolean k-dnf and Submodular Functions Learning Pseudo-Boolean k-dnf and ubmodular Functions ofya Raskhodnikova Pennsylvania tate University sofya@cse.psu.edu Grigory Yaroslavtsev Pennsylvania tate University grigory@cse.psu.edu Abstract We

More information

More Efficient PAC-learning of DNF with Membership Queries. Under the Uniform Distribution

More Efficient PAC-learning of DNF with Membership Queries. Under the Uniform Distribution More Efficient PAC-learning of DNF with Membership Queries Under the Uniform Distribution Nader H. Bshouty Technion Jeffrey C. Jackson Duquesne University Christino Tamon Clarkson University Corresponding

More information

Classes of Boolean Functions

Classes of Boolean Functions Classes of Boolean Functions Nader H. Bshouty Eyal Kushilevitz Abstract Here we give classes of Boolean functions that considered in COLT. Classes of Functions Here we introduce the basic classes of functions

More information

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model TTIC 325 An Introduction to the Theory of Machine Learning Learning from noisy data, intro to SQ model Avrim Blum 4/25/8 Learning when there is no perfect predictor Hoeffding/Chernoff bounds: minimizing

More information

Simple Learning Algorithms for. Decision Trees and Multivariate Polynomials. xed constant bounded product distribution. is depth 3 formulas.

Simple Learning Algorithms for. Decision Trees and Multivariate Polynomials. xed constant bounded product distribution. is depth 3 formulas. Simple Learning Algorithms for Decision Trees and Multivariate Polynomials Nader H. Bshouty Department of Computer Science University of Calgary Calgary, Alberta, Canada Yishay Mansour Department of Computer

More information

Testing Booleanity and the Uncertainty Principle

Testing Booleanity and the Uncertainty Principle Testing Booleanity and the Uncertainty Principle Tom Gur Weizmann Institute of Science tom.gur@weizmann.ac.il Omer Tamuz Weizmann Institute of Science omer.tamuz@weizmann.ac.il Abstract Let f : { 1, 1}

More information

CS 6375: Machine Learning Computational Learning Theory

CS 6375: Machine Learning Computational Learning Theory CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty

More information

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Avrim Blum Lecture 9: October 7, 2015 1 Computational Hardness of Learning Today we will talk about some computational hardness results

More information

Boosting the Area Under the ROC Curve

Boosting the Area Under the ROC Curve Boosting the Area Under the ROC Curve Philip M. Long plong@google.com Rocco A. Servedio rocco@cs.columbia.edu Abstract We show that any weak ranker that can achieve an area under the ROC curve slightly

More information

Learning Kernel-Based Halfspaces with the Zero-One Loss

Learning Kernel-Based Halfspaces with the Zero-One Loss Learning Kernel-Based Halfspaces with the Zero-One Loss Shai Shalev-Shwartz The Hebrew University shais@cs.huji.ac.il Ohad Shamir The Hebrew University ohadsh@cs.huji.ac.il Karthik Sridharan Toyota Technological

More information

Computational Learning Theory

Computational Learning Theory 0. Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 7 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell 1. Main Questions

More information