Simple Learning Algorithms for. Decision Trees and Multivariate Polynomials. xed constant bounded product distribution. is depth 3 formulas.

Size: px
Start display at page:

Download "Simple Learning Algorithms for. Decision Trees and Multivariate Polynomials. xed constant bounded product distribution. is depth 3 formulas."

Transcription

1 Simple Learning Algorithms for Decision Trees and Multivariate Polynomials Nader H. Bshouty Department of Computer Science University of Calgary Calgary, Alberta, Canada Yishay Mansour Department of Computer Science Tel-Aviv University Tel-Aviv, Israel Abstract In this paper we develop a new approach for learning decision trees and multivariate polynomials via interpolation of multivariate polynomials. This new approach yields simple learning algorithms for multivariate polynomials and decision trees over nite elds under any constant bounded product distribution. The output hypothesis is a (single) multivariate polynomial that is an -approximation of the target under any constant bounded product distribution. The new approach demonstrates the learnability of many classes under any constant bounded product distribution and using membership queries, such as j-disjoint DNF and multivariate polynomial with bounded degree over any eld. The technique shows how to interpolate multivariate polynomials with bounded term size from membership queries only. This in particular gives a learning algorithm for O(log n)-depth decision tree from membership queries only and a new learning algorithm of any multivariate polynomial over suciently large elds from membership queries only. We show that our results for learning from membership queries only are the best possible. Introduction Two techniques were used in the literature for PAClearning decision tree with membership queries, the Fourier transform technique and the Lattice based techniques. Kushilevitz and Mansour [KM93] gave a technique for learning decision trees under the uniform distribution via the Fourier Spectrum. Jackson [J94] extended the result to learning DNF under the uniform distribution. The output hypothesis is a majority of parities. Jackson [J95] generalizes his DNF learning algorithm from uniform distribution to any xed constant bounded product distribution. Denition A product distribution is xed constant bounded if there is a constant 0 < c < =2, that is independent of the number of variables n, such that for any variable x i, c P rob[x i = ]? c. Bshouty [Bs93] gave a technique for learning decision trees under any distribution via the Monotone Theory. Schapire and Sellie [SS93] gave a Lattice based algorithm for learning multivariate polynomials over the binary eld under any distribution. In the former the output hypothesis for the decision tree is depth 3 formulas. Both techniques, the Fourier Spectrum and the Lattice based algorithms give also learnability of many other classes such as learning decision trees over parities (nodes contains parities) under constant bounded product distributions and learning CDNF (poly size DNF that has poly size CNF) under any distribution. In this paper we develop a new approach to learning decision trees and multivariate polynomials via interpolation of multivariate polynomials over GF (2). This new approach leads to simple learning algorithms for decision trees over the uniform and constant bounded product distributions, where the output hypotheses is a multivariate polynomial (parity of monotone terms). The algorithm we develop gives a single hypothesis that approximate the target with respect to any constant bounded product distribution. In fact the hypothesis is a good hypothesis under any distribution that supports small terms. Denition 2 A distribution D supports small terms of T if for every T 2 T, of size!(log n), satises Pr D [T = ] = =!(poly(n)), where n is the number of variables.

2 The new approach also solves learnability of other problems, some of which where not known to be learnable. These problems include: () PAC-learning with membership queries of multivariate polynomials over the binary eld with nonmonotone terms under any distribution that support small terms. In particular disjoint DNF (the conjunction of every two terms is 0), j- disjoint DNF (the conjunction of any j terms is 0) for constant j, decision trees and any polynomial number of Xor of them are PAC-learnable with membership queries under any distribution that supports small terms. The output hypotheses of the learning algorithm is a multivariate polynomial. Learning multivariate polynomials (with monotone terms) with membership and equivalence queries is shown in [SS93], thus multivariate polynomials are PAC-learnable under any distribution. Our contribution is to show the learnability when the terms are not monotone. It is also known that any DNF is PAC-learnable with membership queries under constant bounded product distribution [J95], where the output hypothesis is a majority of parities. Our contribution for j- disjoint DNF is to use an output hypothesis that is a parity of terms and to show that the output hypothesis is an approximation of the target against any constant bounded distribution. Our technique demonstrates the learnability of a Xor of these classes, which was not achieved using previous approaches. We also have an extension of () to any eld, showing: (2) PAC-learning with membership queries of decision trees with leaves from some eld F and any polynomial sum of them under any distribution that support small terms. We also study the learnability of multivariate polynomials from membership queries only. We show (3) Learning multivariate polynomials over n variables with maximal degree d < c for each variable, where c is constant, and with terms of size k = O d (log n + log d) from membership queries only. This result implies learning decision trees of depth O(log n) with leaves from a eld F from membership queries only. We also show that the above term size is tight, i.e., there is no membership query algorithm that learns multivariate polynomials over n variables with maximal degree d that contain terms of size greater than k. The rst result in (3) is a generalization of the result in [B95b]. In [B95b] the learning algorithm uses membership and equivalence queries. The second result is a generalization of the result in [KM93] for learning boolean decision tree from membership queries. Result (3) also gives: (4) An algorithm for learning any multivariate polynomial over elds of size q = n=(d(log n + log d)) from membership queries only. We also show that the above eld size q is tight, i.e., if =!(q) then no polynomial time learning algorithm exists. This result is a generalization of the results in [BT88, CDG+9, Z90] for learning multivariate polynomials under any eld. Previous algorithms for learning multivariate polynomial over nite elds F require asking membership queries with assignments in some extension of the eld F [CDG+9]. In [CDG+9] it is shown that an extension n of the eld is sucient to interpolate any multivariate polynomial (when membership queries with assignments from an extension eld are allowed). Our result in (4.) improves this extension bound to + log(n)= log + log n. As in many other interpolation problems, testing for zero polynomials plays a crucial rule in our algorithms. We show that: (5) The following problems for multivariate polynomials are equivalent () Distinguishing a multivariate polynomial from zero. (2) Distinguishing any two multivariate polynomials. (3) Deciding whether a multivariate polynomial depend on some variable. (4) Learning a multivariate polynomial. 2 Simple Algorithm for the Boolean Domain Let MUL(n; t; k) be the set of all multivariate polynomials over the binary eld over n variables with t terms where each term is of size k, we will assume that d = O(log n). It is not hard to see that a decision tree of depth d can be represented in MUL(n; 2 d ; d), and that a j-disjoint d-dnf can be represented in

3 MUL(n; n j ; d). So for constant j and d = O(logn) the number of terms is polynomial. We rst show how to zero-test elements in MUL(n; t; k). Let f 2 MUL(n; t; k). Choose a term T = x i x ik of maximal size in f. Randomly and uniformly choose values from f0; g for the variables not in T. The projection will not be the zero function because the term T will stay alive in the projection. Since the projection is a nonzero function with k = O(log n) variables there is at least one assignment for x i ; : : :; x ik that gives value for the function. This show that for a random and uniform assignment a, f(a) = with probability at least =2 k = =poly(n), for k = O(logn). So to zero test a function in f 2 MUL(n; t; k) randomly and uniformly choose polynomial number of assignments a i. If f(a i ) is zero for all the assignments then with high probability we have f 0. Claim 3 For f 2 M U L(n; t; O(logn)), there is a polynomial time probabilistic zero testing algorithm, that succeeds with high probability. We now show how to reduce zero-test to learning. Let f 0 = f. Since we can zero-test we can nd the minimal i such that f 0 j x 0;:::;:::;x i 0 0. This implies that f x 0;:::;:::;x i? 0 = x i f (x i+ ; : : :; x n ) for some multivariate polynomial f. We continue recursively with f = f x 0;:::;:::;x i? 0;xi until f k, in this case x i x ik is a term in f. Now dene ^f = f + x i x ik. This removes a term from f, and thus ^f 2 MUL(n; t? ; k). We continue recursively with ^f until we recover all the terms of f. Claim 4 For f 2 M U L(n; t; O(logn)), there is a polynomial time probabilistic interpolation algorithm, that succeeds with high probability. Now let MUL NEG(n; t) be the set of all boolean multivariate polynomials with t nonmonotone terms. (This class includes the class of decision trees and j- disjoint DNF.) Let f 2 MUL NEG(n; t). To PAC-learn f we randomly choose an assignment a and dene f 0 = f(x + a). A term in f of size k will have on average k=2 positive literals in f 0, and terms with k = (log n) variable will have with high probability (k) positive literals. We perform a zero-restriction, i.e. for each i, with probability =2 we substitute x i 0 in f 0. This ensures that with high probability the projection f 00 is in M U L(n; poly(n); O(logn)), i.e. a polynomial with poly(n) monotone terms. We can use the previous algorithm to learn f 00. Since we performed a zero restriction, we only deleted monotone terms from f 0, therefore, the terms of f 00 are terms in f 0. We continue to take zero-restrictions and collect terms of f 0 until the set of terms that we have denes a polynomial which is a good approximation of f 0. We get a good approximation of f 0 since we collect all the small (i.e. O(log n)) size terms. A complete analysis of this algorithm is given in section 6. Claim 5 For f 2 MUL NEG(n; t), there is a polynomial time probabilistic learning algorithm, that succeeds with high probability. 3 Multivariate Interpolation Let X f = a x xn n 2I be a multivariate polynomial over the eld F where a 2 F and ; : : :; n are integers. We will denote the class of all multivariate polynomials over the eld F and over the variables x ; : : :; x n by F[x ; : : :; x n ]. The number of terms of f is denoted by jfj. We have jfj = jij when all a are not zero. When f = 0 then jfj = 0 and when f = c 2 Fnf0g then jfj =. Let d be the maximal degree of variables in f, i.e., I [d] n where [d] = f0; ; : : :; dg. Suppose F 0 = f 0 ; : : :; d g F are d + distinct eld constants where 0 = 0 is the zero of the eld. A univariate polynomial f(x ) 2 F[x ] over the eld F of degree at most d can be interpolated from membership queries as follows. Suppose f(x ) = (d) (f)x d + + () (f)x + (0) (f) where (i) (f) is the coecient of x i in f in its polynomial representation. Then 8 >< >: f( 0 ) = (d) (f) d () (f) 0 + (0) (f) f( ) = (d) (f) d + + () (f) + (0) (f)... f( d ) = (d) (f) d + + d () (f) d + (0) (f): This is a linear system of equations and can be solved for (i) (f), as follows, det d 0 i+ 0 f( 0 ) i? 0 0 d i+ f( ) i? d d i+ d f( d ) i? d d detjv ( 0 ; : : :; d )j ()

4 where V ( 0 ; : : :; d ) is the Vandermonde matrix. If f is a multivariate polynomial then f can be written as f(x ; : : :; x n ) = (d) (f)x d + + () (f)x + (0) (f) where (i) (f) is a multivariate polynomial over the variables x 2 ; : : :; x n. We can still use to nd (i) (f). Just replace each f( i ) with f( i ; x 2 ; : : :; x n ). Notice that from the rst equation in the system, since 0 = 0, we have (0) (f) = f(0; x ; : : :; x n ): (2) >From a membership query for (i) can be simulated using d + membership queries to f. From (2), a membership query to (0) can be simulated using one membership query to f. We now extend the operators as follows: for i = (i ; : : :; i k ) 2 [d] k i = ik ik? i : Here always operates on the variable with the smallest index. So i operates on x in f to give a function f 0 that depends on x 2 ; : : :; x n. Then i2 operates on x 2 in f 0 and so on. We will also write x i for the term x i xi2 2 xik k. The weight of i, denoted by wt(i), is the number of nonzero entries in i. The operator i (f) gives the coecient of x i in f when represented in F[x 2 ; : : :; x n ] [x ], the operator i (f) gives the coecient of x i when f is represented in F[x k+ ; : : :; x n ][x ; : : :; x k ]: Suppose I [d] k be such that i f 6= 0 for all i 2 I and i f = 0 for all i 62 I, that is, x i for i 2 I are the k-suxes of all terms of f. Here the k-sux of a term x i xin n is x i xik. Since k i 2 I if and only if xi is a k-sux of some term in f, it is clear that jij jfj and we must have 8 >< >: f = X i2i( i f)x i : We now will show how to simulate membership queries for ( i f)(x k+ ; : : :; x n ), i 2 I, using a polynomial number (in n and jfj) of membership queries to f. Suppose we want to nd ( i f)(c) for some c 2 F n?k using membership query to f. We take r assignments ^ ; : : :; ^ r 2 F k and ask membership queries for (^ i ; c) for all i = ; : : :; r. If f(^ i ; c) =! i then P i2i (i f)(c)^ i =!... P i2i (i f)(c)^ r i =! r : Now if I = fi; : : :; irg and detjm[^ j ; ij]j 6= 0 for M[^ j ; ij] = ^ i ^ ir... ^ r i ^ r ir then the above linear system of equations can be solved in time poly(r) = poly(jij) poly(jfj). The solution gives ( i f)(c). The existence of ^ i where the above determinant is not zero will be proven in the next section. 4 From Zero-testing to Learning for any Field In this section we show how to use the results from the previous section to learn multivariate polynomials. Let MUL F (n; k; t; d) be the set of all multivariate polynomial over the eld F over n variables with t terms where each term is of size k and the maximal degree of each variable is at most d. We would like to answer the following questions. Let f 2 MUL F (n; k; t; d).. Is there a polynomial time algorithm that uses membership queries to f and decides whether f 0? 2. Given i n. Is there a polynomial time algorithm that uses membership queries to f and decides whether f depends on x i? 3. Given fi; : : :; irg [d] n where wt(ij) k for all j and r t. Is there an algorithm that runs in polynomial time and nds ^ ; : : :; ^ r 2 F k such that ^ i ^ ir... 6= 0: ^ r i ^ r ir 4. Is there a polynomial time algorithm that uses membership queries to f and identies f. When we say polynomial time we usually mean polynomial time in n; k; t and d but all the results of this section are also true for any time complexity T except that to solve 4 we get a blow up of poly(n; t) in the complexity. We show that,2 and 4 are equivalent and ) 3. Obviously 2 ), 4 ) and 4 ) 2. We will show ) 2, ) 3, and )

5 To prove ) 2 notice that f 2 MUL F (n; k; t; d) is independent of x i if and only if g = fj xi?fj xi 0 0. Since g is the coecient of x i in f we have g 2 MUL F (n; k; t; d). Therefore we can zero-test g in polynomial time. To prove ) 3, let ^ ; : : :; ^ s be a zero-test for functions in MUL F (n; k; t; d), that is, run the algorithm that zero-test for the input 0 and take all the membership queries in the algorithm ^ ; : : :; ^ s. We now have f 2 MUL F (n; k; t; d) is 0 if and only if f(^ i ) = 0 for all i = ; : : :; s. Consider the s r matrix with rows [^ i j ; : : :; ^ir j ]. If this matrix have rank r then we choose r linearly independent rows. If the rank is less than r then its columns are dependent and therefore there are constants c i, i = ; : : :; r such that rx i= c i^ ii j = 0 for j = ; : : :; s: This shows that the multivariate P P r polynomial c i= ix ii is 0 for r all ^ ; : : :; ^ s. Since c i= ix ii 2 MUL F (n; k; t; d) we get a contradiction. Now we show that ) 4. This will use results from the previous section. The algorithm rst checks whether f depends on x, and if yes it generates a tree with a root labeled with x that has d children. The ith child is the tree for i (f). If the function is independent of x it builds a tree with one child for the root. The child is 0 (f). We then recursively build the tree for the children. The previous section shows how to simulate membership queries at each level in polynomial time. This algorithm obviously works. It correctness follows immediately from the previous section and ()-(3). The complexity of the algorithm is the size of the tree times the membership query simulation. Since the size of the tree at each level is bounded by the number terms in f and since the depth of the tree is bounded by n, the tree has at most O(nt) nonzero nodes. The total number of nodes is at most a factor of d from the nonzero nodes. Thus the algorithm have complexity the same as zero testing with a blow up of poly(n; t; d) queries and time. Now that we have reduced the problem to zero testing we will investigate in the next section the complexity of zero testing of MUL F (n; k; t; d). 5 Zero-test of M U L F (n; k; t; d) In this section we will study the zero testing of MUL F (n; k;?; d) when the number of terms is unknown and might be exponentially large. The time complexity for the zero testing should be polynomial in n and d (we have k < n so it is also polynomial in k). We will show the following Theorem. We have MUL F (n; k;?; d), where d c, is zero testable in randomized polynomial time in n and d for some constant c if and only if k = O d (log n + log d) : The algorithm for the zero testing is simply to randomly and uniformly choose poly(n; d) points a i from F n and nd f(a i ). If for all the points a i, f is zero then with high probability f 0. The condition d < c for some constant d is a necessary condition for ecient learning. If d is close to (say d =? ) then k = O(log n= log ) < for =!(n). This bound is also tight. This theorem implies Theorem 2. The class MUL F (n; k; t; d) where d < c for some constant c is learnable in randomized polynomial time (in n, d and t) from membership queries only if and only if k = O d (log n + log d) : The proofs are given in the Appendix. 6 PAC-learning multivariate polynomials with membership queries In this section we give an algorithm that PAClearns with membership queries any multivariate polynomial with nonmonotone terms under distributions that support small terms. We will rst consider the boolean multivariate polynomials and later in this section show how to generalize it to any eld. For the analysis of the correctness of the algorithm we rst need to formalize the notion of distributions that support small terms. The following is one way to dene this notion. Denition. Let D c;t; be the set of distributions that satisfy the following: For every D 2 D c;t; and any DNF h with t terms of size greater than c log(t=) we have Pr[f = ] : D Notice that all the constant bounded product distributions D where? c 0 Pr D [x i = ] c 0 for all i are in D = log(=c 0 );t;.

6 A very rough analysis is given here to show that polynomial time algorithm exists. In the full paper a more careful analysis will be done to get the best possible constants. Let f = T + + T t be a multivariate polynomial. Suppose jt j jt 2 j jt t j. Our algorithm starts by choosing a random assignment a and denes f 0 (x) = f(x + a). All terms that are of size s (in f 0 ) will contain on average s=2 positive literals. Therefore by Cherno bound with high probability all the terms of size more than 64c log(t=) will contain at least 6c log(t=) positive literals. Also all terms of size 4c log(t=) with high probability will contain at least c log(t=) positive literals. Now we split the function f 0 into 3 functions f, f 2 and f 3. The function f = T + +T t will contain all terms that are of size at most 4c log(t=). The function f 2 = T t+ + +T t2 will contain all terms of size between 4c log(t=) and 64c log(t=) and the function f 3 = T t T t will contain all terms of size more than 64c log(t=). Now change f to a multivariate polynomial with monotone terms. Since the size of each term in f is at most 4c log(t=) the number of monotone terms in f will be at most t(t=) 4c. Now do the same for f 2 to get a multivariate polynomial with at most t(t=) 64c terms. Our algorithm will nd all the terms in f, some of the terms in f 2 and none of the terms in f 3. Therefore we will need the following claim. Claim. Let g = f + h where h is a multivariate polynomial that contains some of the terms in f 2. Then for any D 2 D c;t; we have Let Proof. The error is Pr D Pr[g 6= f] : D [(f + h) + f = ] = Pr[h + f 2 + f 3 = ]: f 2 = ^Tt+ ~ Tt+ + + ^Tt2 ~ Tt2 where T i = ^T ti Tti ~, ^T ti is the part of the term that contains positive literals and T ~ ti is the part that contains the negative literals. When we change the terms of f 2 to monotone terms every monotone term in f 2 will contain one of the terms ^Ti, t + t t 2. Therefore we can write f 2 = ^T t+ f 2; + + ^T t2 f 2;t2?t where f 2;i are multivariate polynomial with monotone terms. Since h is a multivariate polynomial that contains some of the terms in f 2 we have f 2 + h = ^T t+ h 2; + + ^Tt2 h 2;t2?t. Since j ^Ti j c log(t=) for t + i t 2 and jt i j c log(t=) for i t 2 +, by D the denition of distribution that support small terms we have Pr[(h + f 2 ) + f 3 = ] Pr[ ^T t+ ^T t2 _ T t2+ T t ] : The algorithm will proceed as follows. We randomly and uniformly choose a zero restriction p of f 0. That is, we substitute x i 0 with probability =2 for each x i. This will on average leave n=2 variables alive in f 0. Since terms in f 3 have at least 6c log(t=) positive literals with high probability we will have f 3 (p) = 0. In f + f 2 some of the terms will vanish and some will stay alive. Since the projection f 0 (p) is in MUL F (n; 64c log(t=); t 64c+ ; ) we can use the algorithm from the previous section to learn the terms from membership queries only in time (t=) co(). After we learn g = f 0 (p) we dene the function f 00 = f 0 + g. This function have fewer terms in f. Now we do the same for f 00. That is, we take another zero restriction and collect terms g 2 that stay alive after the projection. We do that until f 0 + g + g g r is -close to 0. Notice that if M i is a term in f then a zero restriction will keep M i alive with probability =(t=) 4c so on average we need (t=) 4c projections to catch this term and t(t=) 8c projections to catch all the t(t=) 4c terms in f and t 0c projections to catch them with high probability. Notice also that since each term in f 3 has 6c log t positive literals each projection will make f 3 = 0 with probability at least? t=(t=) 6c and therefore the probability that f 3 = 0 in all the projections is at least? =(t=) 5c. Therefore we can make the probability of success for collecting all the terms of f to be greater than =4. This completes the description and the correctness of the algorithm. The above analysis algorithm can also be used to learn functions of the form f = T + + t T t where i 2 F, T i are boolean terms and + is the addition of a eld F. This gives the learnability of decision trees with leaves that contain elements from the eld F. References [A88] D. Angluin. Queries and concept learning. Machine Learning, 2(4):39{342, 988. [BT88] M. Ben-Or, P. Tiwari. A deterministic algorithm for sparse multivariate polynomial interpolation In Proceedings of the 20th Annual ACM 2

7 Symposium on Theory of Computing. pages 30{ 309, May 988. [Bl92] A. Blum. Learning boolean functions in an innite attribute space. Machine Learning 9(4), pages 373{ [Bs93] N. H. Bshouty. Exact learning via the monotone theory. In Proceedings of the 34th Symposium on Foundations of Computer Science. pages 302{3, November 993. [B95a] N. H. Bshouty. Simple Learning Algorithms Using Divide and Conquer. In Proceedings of the Annual ACM Workshop on Computational Learning Theory [B95b] N. H. Bshouty. A Note on Learning Multivariate Polynomials under the Uniform Distribution. In Proceedings of the Annual ACM Workshop on Computational Learning Theory [CDG+9] M. Clausen, A. Dress, J. Grabmeier, M. Karpinski. On zero-testing and interpolation of k- sparse multivariate polynomials over nite elds. Theoretical Computer Science. 84. pages 5{64, 99. [GKS90] D. Yu. Grigoriev, M. Karpinski, M. F. Singers. Fast parallel algorithms for sparse multivariate polynomial interpolation over nite elds. SIAM J. of Comp. 9(6). pages 059{063, 990. [J94] J. Jackson. An ecient membership-query algorithm for learning DNF with respect to the uniform distribution. In Proceeding of the 35th Annual Symposium on Foundations of Computer Science, 994. [J95] J. Jackson. On Learning DNF and related circuit classes from helpfull and not-so-helpful teachers, Ph.D. thesis, CMU, 995. [KM93] E. Kushilevitz and Y. Mansour. Learning decision trees using the Fourier spectrum. SIAM J. Computing, 22(6):33{348, 993. [Ma92] Y. Mansour. Randomized interpolation and approximation of sparse polynomials. In Automata, Languages and Programming: 9th International Colloquim. pages 26{272, July 992. (Also: Siam J. on Computing, vol. 2, num. 4, 995.) [RB89] M. Ron Roth and G. Benedek. Interpolation and approximation of sparse multivariate polynomials over GF(2). SIAM J. Computing, 20(2):29{34, 99. [SS93] R. E. Schapire, L. M. Sellie. Learning sparse multivariate polynomial over a eld with queries and counterexamples. In Proceedings of the Sixth Annual ACM Workshop on Computational Learning Theory. July, 993. [Val84] L. Valiant. A theory of the learnable. Communications of the ACM, 27():34{42, November 984. [Z90] R. Zippel. Interpolating polynomials from their values. Journal of Symbolic Computation, 9. pages. 375{ Appendix. Proof of Theorem Upper Bound. ; : : :; d 2 F and dene the function f k;d = ky dy j= i= (x j? i ): Let Denote by Z(k; d) the number of zeros of f k;d. It is easy to see that Z satises the recursive equation Z(k; d) = d k? + (? d)z(k? ; d) and Z(; d) = d. It is also easy to see that Z(k; d) = k? (? d) k : Now let (n; k; d) the maximal possible number of roots of a multivariate polynomial in MUL F (n; k;?; d). We will show the following facts. (n; k; d) n?k (k; k; d): 2. (k; k; d) Z(k; d). Both facts implies that if f 6 0 we randomly uniformly choose an assignment a 2 F n we get (n; k; d) Pr[f(a) 6= 0]? a n (k; k; d)? k Z(k; d)? k? d k? e?o dk poly(n; d) : d c

8 Therefore the expected running time to detect that f is not 0 is poly(n; d). It remain to prove conditions () and (2). To prove () let f 2 MUL F (n; k;?; d) with maximal number of roots. Let m be a term in f with a maximal number of variables. Suppose, without loss of generality, m = x i xik. For any substitution a k k+; : : :; a n of the variables x k+ ; : : :; x n the term m will stay alive in the projection g = fj xi a i;i=k+;:::;n because it is maximal in f. Since g has at most (k; k; d) roots the result () follows. To prove (2) let f 2 MUL F (k; k;?; d). Write f as polynomial in F[x 2 ; : : :; x d ][x ], f = f d x d + f d?x d? + + f 0 : Let t be the number of roots of f d. Since f d 2 MUL F (k? ; k? ;?; d) we have f? (a i ) 6= 0 and 0 otherwise. Then E f?e (8i)ai2 Di f0;g " _ i I[f? (a i ) 6= 0] l E f?e ai 0 2Di 0 f0;gn[i[f? (a i0 ) 6= 0]] = le ai 0 2Di f0;gne f?[i[f? (a i0 ) 6= 0]] 0 " = le ai? pd # 0 2Di 0 f0;gn 2 3 : This shows that there exists f? 6 0 such that running algorithm A for f? it will answer the wrong answer \YES" with probability more than 2=3. This is a contradiction.2 # : t (k? ; k? ; d): For k?? t assignments a for x 2 ; : : :; x d we have f d (a) 6= 0. For those assignments we get a polynomial in x of degree d that has at most d roots for x. For t assignments a for x 2 ; : : :; x k we have f d is zero and then the possible values of x (to get a root for f) is bounded by. This implies (k; k; d) d( k?? t) + t = d k? + (? d)t d k? + (? d)(k? ; k? ; d): Now the result follows by induction on k.2 Proof of Theorem 2 Lower Bound Let A be a randomized algorithm that zero tests f 2 MUL F (n; k;?; d). Algorithm A asks membership queries to f and if f 6 0 it returns with probability at least 2=3 the answer \NO". If all the membership queries in the algorithm returns 0 the algorithm returns the answer \YES" indicating that f 0. We run the algorithm for f 0. Let D ; : : :; D l, l = (dn) k be the distribution that the membership assignments a ; : : :; a l are chosen to zero test f. Notice that since all membership answers are 0 running the algorithm again for f 0 it will again choose membership according to the distributions D ; : : :; D l. Now randomly and uniformly choose i;j 2 F, i = ; : : :; p; j = ; : : :; d and dene f? = py dy i= j= (x i? i;j ) where p = 2k (ln n + ln d). Let d I[f(a i)] = if

Polynomial time Prediction Strategy with almost Optimal Mistake Probability

Polynomial time Prediction Strategy with almost Optimal Mistake Probability Polynomial time Prediction Strategy with almost Optimal Mistake Probability Nader H. Bshouty Department of Computer Science Technion, 32000 Haifa, Israel bshouty@cs.technion.ac.il Abstract We give the

More information

Classes of Boolean Functions

Classes of Boolean Functions Classes of Boolean Functions Nader H. Bshouty Eyal Kushilevitz Abstract Here we give classes of Boolean functions that considered in COLT. Classes of Functions Here we introduce the basic classes of functions

More information

ICML '97 and AAAI '97 Tutorials

ICML '97 and AAAI '97 Tutorials A Short Course in Computational Learning Theory: ICML '97 and AAAI '97 Tutorials Michael Kearns AT&T Laboratories Outline Sample Complexity/Learning Curves: nite classes, Occam's VC dimension Razor, Best

More information

Distribution Free Learning with Local Queries

Distribution Free Learning with Local Queries Distribution Free Learning with Local Queries Galit Bary-Weisberg Amit Daniely Shai Shalev-Shwartz March 14, 2016 arxiv:1603.03714v1 [cs.lg] 11 Mar 2016 Abstract The model of learning with local membership

More information

CSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds

CSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds CSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds Lecturer: Toniann Pitassi Scribe: Robert Robere Winter 2014 1 Switching

More information

Yale University Department of Computer Science

Yale University Department of Computer Science Yale University Department of Computer Science Lower Bounds on Learning Random Structures with Statistical Queries Dana Angluin David Eisenstat Leonid (Aryeh) Kontorovich Lev Reyzin YALEU/DCS/TR-42 December

More information

Read-Once Threshold Formulas, Justifying Assignments, Nader H. Bshouty 1. Thomas R. Hancock 2. Lisa Hellerstein 3. Marek Karpinski 4 TR

Read-Once Threshold Formulas, Justifying Assignments, Nader H. Bshouty 1. Thomas R. Hancock 2. Lisa Hellerstein 3. Marek Karpinski 4 TR Read-Once Threshold Formulas, Justifying Assignments, and Generic Tranformations Nader H. Bshouty 1 Thomas R. Hancock 2 Lisa Hellerstein 3 Marek Karpinski 4 TR-92-020 March, 1992 Abstract We present a

More information

Uniform-Distribution Attribute Noise Learnability

Uniform-Distribution Attribute Noise Learnability Uniform-Distribution Attribute Noise Learnability Nader H. Bshouty Dept. Computer Science Technion Haifa 32000, Israel bshouty@cs.technion.ac.il Jeffrey C. Jackson Math. & Comp. Science Dept. Duquesne

More information

Learning Restricted Models of Arithmetic Circuits

Learning Restricted Models of Arithmetic Circuits Learning Restricted Models of Arithmetic Circuits Adam R. Klivans Department of Computer Science University of Texas at Austin Austin, TX 78712-1188 klivans@cs.utexas.edu Amir Shpilka Faculty of Computer

More information

Testing by Implicit Learning

Testing by Implicit Learning Testing by Implicit Learning Rocco Servedio Columbia University ITCS Property Testing Workshop Beijing January 2010 What this talk is about 1. Testing by Implicit Learning: method for testing classes of

More information

The idea is that if we restrict our attention to any k positions in x, no matter how many times we

The idea is that if we restrict our attention to any k positions in x, no matter how many times we k-wise Independence and -biased k-wise Indepedence February 0, 999 Scribe: Felix Wu Denitions Consider a distribution D on n bits x x x n. D is k-wise independent i for all sets of k indices S fi ;:::;i

More information

Boosting and Hard-Core Sets

Boosting and Hard-Core Sets Boosting and Hard-Core Sets Adam R. Klivans Department of Mathematics MIT Cambridge, MA 02139 klivans@math.mit.edu Rocco A. Servedio Ý Division of Engineering and Applied Sciences Harvard University Cambridge,

More information

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6 CS 395T Computational Learning Theory Lecture 3: September 0, 2007 Lecturer: Adam Klivans Scribe: Mike Halcrow 3. Decision List Recap In the last class, we determined that, when learning a t-decision list,

More information

Tight Bounds on Proper Equivalence Query Learning of DNF

Tight Bounds on Proper Equivalence Query Learning of DNF JMLR: Workshop and Conference Proceedings vol (2012) 1 19 Tight Bounds on Proper Equivalence Query Learning of DNF Lisa Hellerstein Devorah Kletenik Linda Sellie Polytechnic Institute of NYU Rocco Servedio

More information

Lecture 8 - Algebraic Methods for Matching 1

Lecture 8 - Algebraic Methods for Matching 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) February 1, 2018 Lecture 8 - Algebraic Methods for Matching 1 In the last lecture we showed that

More information

1 Last time and today

1 Last time and today COMS 6253: Advanced Computational Learning Spring 2012 Theory Lecture 12: April 12, 2012 Lecturer: Rocco Servedio 1 Last time and today Scribe: Dean Alderucci Previously: Started the BKW algorithm for

More information

On Exact Learning from Random Walk

On Exact Learning from Random Walk On Exact Learning from Random Walk Iddo Bentov On Exact Learning from Random Walk Research Thesis Submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science

More information

CS6840: Advanced Complexity Theory Mar 29, Lecturer: Jayalal Sarma M.N. Scribe: Dinesh K.

CS6840: Advanced Complexity Theory Mar 29, Lecturer: Jayalal Sarma M.N. Scribe: Dinesh K. CS684: Advanced Complexity Theory Mar 29, 22 Lecture 46 : Size lower bounds for AC circuits computing Parity Lecturer: Jayalal Sarma M.N. Scribe: Dinesh K. Theme: Circuit Complexity Lecture Plan: Proof

More information

Learning Juntas. Elchanan Mossel CS and Statistics, U.C. Berkeley Berkeley, CA. Rocco A. Servedio MIT Department of.

Learning Juntas. Elchanan Mossel CS and Statistics, U.C. Berkeley Berkeley, CA. Rocco A. Servedio MIT Department of. Learning Juntas Elchanan Mossel CS and Statistics, U.C. Berkeley Berkeley, CA mossel@stat.berkeley.edu Ryan O Donnell Rocco A. Servedio MIT Department of Computer Science Mathematics Department, Cambridge,

More information

Merrick Furst. Yishay Mansour z. Tel-Aviv University. respect to the uniform input distribution in polynomial time

Merrick Furst. Yishay Mansour z. Tel-Aviv University. respect to the uniform input distribution in polynomial time Weakly Learning DNF and Characterizing Statistical Query Learning Using Fourier Analysis Avrim Blum Carnegie Mellon University Michael Kearns y AT&T Bell Laboratories Merrick Furst Carnegie Mellon University

More information

Optimal Hardness Results for Maximizing Agreements with Monomials

Optimal Hardness Results for Maximizing Agreements with Monomials Optimal Hardness Results for Maximizing Agreements with Monomials Vitaly Feldman Harvard University Cambridge, MA 02138 vitaly@eecs.harvard.edu Abstract We consider the problem of finding a monomial (or

More information

Tensor-Rank and Lower Bounds for Arithmetic Formulas

Tensor-Rank and Lower Bounds for Arithmetic Formulas Electronic Colloquium on Computational Complexity, Report No. 2 (2010) Tensor-Rank and Lower Bounds for Arithmetic Formulas Ran Raz Weizmann Institute Abstract We show that any explicit example for a tensor

More information

Uniform-Distribution Attribute Noise Learnability

Uniform-Distribution Attribute Noise Learnability Uniform-Distribution Attribute Noise Learnability Nader H. Bshouty Technion Haifa 32000, Israel bshouty@cs.technion.ac.il Christino Tamon Clarkson University Potsdam, NY 13699-5815, U.S.A. tino@clarkson.edu

More information

Lecture 10: Learning DNF, AC 0, Juntas. 1 Learning DNF in Almost Polynomial Time

Lecture 10: Learning DNF, AC 0, Juntas. 1 Learning DNF in Almost Polynomial Time Analysis of Boolean Functions (CMU 8-859S, Spring 2007) Lecture 0: Learning DNF, AC 0, Juntas Feb 5, 2007 Lecturer: Ryan O Donnell Scribe: Elaine Shi Learning DNF in Almost Polynomial Time From previous

More information

Extracted from a working draft of Goldreich s FOUNDATIONS OF CRYPTOGRAPHY. See copyright notice.

Extracted from a working draft of Goldreich s FOUNDATIONS OF CRYPTOGRAPHY. See copyright notice. 106 CHAPTER 3. PSEUDORANDOM GENERATORS Using the ideas presented in the proofs of Propositions 3.5.3 and 3.5.9, one can show that if the n 3 -bit to l(n 3 ) + 1-bit function used in Construction 3.5.2

More information

Agnostic Learning of Disjunctions on Symmetric Distributions

Agnostic Learning of Disjunctions on Symmetric Distributions Agnostic Learning of Disjunctions on Symmetric Distributions Vitaly Feldman vitaly@post.harvard.edu Pravesh Kothari kothari@cs.utexas.edu May 26, 2014 Abstract We consider the problem of approximating

More information

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework Computational Learning Theory - Hilary Term 2018 1 : Introduction to the PAC Learning Framework Lecturer: Varun Kanade 1 What is computational learning theory? Machine learning techniques lie at the heart

More information

Tensor-Rank and Lower Bounds for Arithmetic Formulas

Tensor-Rank and Lower Bounds for Arithmetic Formulas Tensor-Rank and Lower Bounds for Arithmetic Formulas Ran Raz Weizmann Institute Abstract We show that any explicit example for a tensor A : [n] r F with tensor-rank n r (1 o(1)), where r = r(n) log n/

More information

November 17, Recent Results on. Amir Shpilka Technion. PIT Survey Oberwolfach

November 17, Recent Results on. Amir Shpilka Technion. PIT Survey Oberwolfach 1 Recent Results on Polynomial Identity Testing Amir Shpilka Technion Goal of talk Survey known results Explain proof techniques Give an interesting set of `accessible open questions 2 Talk outline Definition

More information

Convexity and logical analysis of data

Convexity and logical analysis of data Theoretical Computer Science 244 (2000) 95 116 www.elsevier.com/locate/tcs Convexity and logical analysis of data Oya Ekin a, Peter L. Hammer b, Alexander Kogan b;c; a Department of Industrial Engineering,

More information

From Batch to Transductive Online Learning

From Batch to Transductive Online Learning From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org

More information

3 Finish learning monotone Boolean functions

3 Finish learning monotone Boolean functions COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 5: 02/19/2014 Spring 2014 Scribes: Dimitris Paidarakis 1 Last time Finished KM algorithm; Applications of KM

More information

Lecture 7: Schwartz-Zippel Lemma, Perfect Matching. 1.1 Polynomial Identity Testing and Schwartz-Zippel Lemma

Lecture 7: Schwartz-Zippel Lemma, Perfect Matching. 1.1 Polynomial Identity Testing and Schwartz-Zippel Lemma CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 7: Schwartz-Zippel Lemma, Perfect Matching Lecturer: Shayan Oveis Gharan 01/30/2017 Scribe: Philip Cho Disclaimer: These notes have not

More information

On Learning Monotone DNF under Product Distributions

On Learning Monotone DNF under Product Distributions On Learning Monotone DNF under Product Distributions Rocco A. Servedio Department of Computer Science Columbia University New York, NY 10027 rocco@cs.columbia.edu December 4, 2003 Abstract We show that

More information

Harvard University, like reasoning, language recognition, object identication. be studied separately from learning (See (Kirsh 1991)

Harvard University, like reasoning, language recognition, object identication. be studied separately from learning (See (Kirsh 1991) To appear in AAAI 1994 Learning to Reason Roni Khardon Dan Roth y Aiken Computation Laboratory, Harvard University, Cambridge, MA 02138. froni,danrg@das.harvard.edu Abstract We introduce a new framework

More information

Lower Bounds for Cutting Planes Proofs. with Small Coecients. Abstract. We consider small-weight Cutting Planes (CP ) proofs; that is,

Lower Bounds for Cutting Planes Proofs. with Small Coecients. Abstract. We consider small-weight Cutting Planes (CP ) proofs; that is, Lower Bounds for Cutting Planes Proofs with Small Coecients Maria Bonet y Toniann Pitassi z Ran Raz x Abstract We consider small-weight Cutting Planes (CP ) proofs; that is, Cutting Planes (CP ) proofs

More information

Deterministic Approximation Algorithms for the Nearest Codeword Problem

Deterministic Approximation Algorithms for the Nearest Codeword Problem Deterministic Approximation Algorithms for the Nearest Codeword Problem Noga Alon 1,, Rina Panigrahy 2, and Sergey Yekhanin 3 1 Tel Aviv University, Institute for Advanced Study, Microsoft Israel nogaa@tau.ac.il

More information

Multi-Linear Formulas for Permanent and Determinant are of Super-Polynomial Size

Multi-Linear Formulas for Permanent and Determinant are of Super-Polynomial Size Multi-Linear Formulas for Permanent and Determinant are of Super-Polynomial Size Ran Raz Weizmann Institute ranraz@wisdom.weizmann.ac.il Abstract An arithmetic formula is multi-linear if the polynomial

More information

Lecture 29: Computational Learning Theory

Lecture 29: Computational Learning Theory CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational

More information

RECONSTRUCTING ALGEBRAIC FUNCTIONS FROM MIXED DATA

RECONSTRUCTING ALGEBRAIC FUNCTIONS FROM MIXED DATA RECONSTRUCTING ALGEBRAIC FUNCTIONS FROM MIXED DATA SIGAL AR, RICHARD J. LIPTON, RONITT RUBINFELD, AND MADHU SUDAN Abstract. We consider a variant of the traditional task of explicitly reconstructing algebraic

More information

Computational Learning Theory (COLT)

Computational Learning Theory (COLT) Computational Learning Theory (COLT) Goals: Theoretical characterization of 1 Difficulty of machine learning problems Under what conditions is learning possible and impossible? 2 Capabilities of machine

More information

Finding Low Degree Annihilators for a Boolean Function Using Polynomial Algorithms

Finding Low Degree Annihilators for a Boolean Function Using Polynomial Algorithms Finding Low Degree Annihilators for a Boolean Function Using Polynomial Algorithms Vladimir Bayev Abstract. Low degree annihilators for Boolean functions are of great interest in cryptology because of

More information

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity Universität zu Lübeck Institut für Theoretische Informatik Lecture notes on Knowledge-Based and Learning Systems by Maciej Liśkiewicz Lecture 5: Efficient PAC Learning 1 Consistent Learning: a Bound on

More information

Separating Models of Learning from Correlated and Uncorrelated Data

Separating Models of Learning from Correlated and Uncorrelated Data Separating Models of Learning from Correlated and Uncorrelated Data Ariel Elbaz, Homin K. Lee, Rocco A. Servedio, and Andrew Wan Department of Computer Science Columbia University {arielbaz,homin,rocco,atw12}@cs.columbia.edu

More information

k-protected VERTICES IN BINARY SEARCH TREES

k-protected VERTICES IN BINARY SEARCH TREES k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from

More information

On the Sample Complexity of Noise-Tolerant Learning

On the Sample Complexity of Noise-Tolerant Learning On the Sample Complexity of Noise-Tolerant Learning Javed A. Aslam Department of Computer Science Dartmouth College Hanover, NH 03755 Scott E. Decatur Laboratory for Computer Science Massachusetts Institute

More information

Preface These notes were prepared on the occasion of giving a guest lecture in David Harel's class on Advanced Topics in Computability. David's reques

Preface These notes were prepared on the occasion of giving a guest lecture in David Harel's class on Advanced Topics in Computability. David's reques Two Lectures on Advanced Topics in Computability Oded Goldreich Department of Computer Science Weizmann Institute of Science Rehovot, Israel. oded@wisdom.weizmann.ac.il Spring 2002 Abstract This text consists

More information

The subject of this paper is nding small sample spaces for joint distributions of

The subject of this paper is nding small sample spaces for joint distributions of Constructing Small Sample Spaces for De-Randomization of Algorithms Daphne Koller Nimrod Megiddo y September 1993 The subject of this paper is nding small sample spaces for joint distributions of n Bernoulli

More information

Problem Set 2. Assigned: Mon. November. 23, 2015

Problem Set 2. Assigned: Mon. November. 23, 2015 Pseudorandomness Prof. Salil Vadhan Problem Set 2 Assigned: Mon. November. 23, 2015 Chi-Ning Chou Index Problem Progress 1 SchwartzZippel lemma 1/1 2 Robustness of the model 1/1 3 Zero error versus 1-sided

More information

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler Complexity Theory Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 15 May, 2018 Reinhard

More information

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181. Complexity Theory Complexity Theory Outline Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität

More information

Learning Functions of Halfspaces Using Prefix Covers

Learning Functions of Halfspaces Using Prefix Covers JMLR: Workshop and Conference Proceedings vol (2010) 15.1 15.10 25th Annual Conference on Learning Theory Learning Functions of Halfspaces Using Prefix Covers Parikshit Gopalan MSR Silicon Valley, Mountain

More information

Comp487/587 - Boolean Formulas

Comp487/587 - Boolean Formulas Comp487/587 - Boolean Formulas 1 Logic and SAT 1.1 What is a Boolean Formula Logic is a way through which we can analyze and reason about simple or complicated events. In particular, we are interested

More information

6.842 Randomness and Computation April 2, Lecture 14

6.842 Randomness and Computation April 2, Lecture 14 6.84 Randomness and Computation April, 0 Lecture 4 Lecturer: Ronitt Rubinfeld Scribe: Aaron Sidford Review In the last class we saw an algorithm to learn a function where very little of the Fourier coeffecient

More information

The Partial Derivative method in Arithmetic Circuit Complexity

The Partial Derivative method in Arithmetic Circuit Complexity The Partial Derivative method in Arithmetic Circuit Complexity A thesis submitted in partial fulllment of the requirements for the award of the degree of Master of Science by Yadu Vasudev Junior Research

More information

Partitions and Covers

Partitions and Covers University of California, Los Angeles CS 289A Communication Complexity Instructor: Alexander Sherstov Scribe: Dong Wang Date: January 2, 2012 LECTURE 4 Partitions and Covers In previous lectures, we saw

More information

More Efficient PAC-learning of DNF with Membership Queries. Under the Uniform Distribution

More Efficient PAC-learning of DNF with Membership Queries. Under the Uniform Distribution More Efficient PAC-learning of DNF with Membership Queries Under the Uniform Distribution Nader H. Bshouty Technion Jeffrey C. Jackson Duquesne University Christino Tamon Clarkson University Corresponding

More information

Improved Polynomial Identity Testing for Read-Once Formulas

Improved Polynomial Identity Testing for Read-Once Formulas Improved Polynomial Identity Testing for Read-Once Formulas Amir Shpilka Ilya Volkovich Abstract An arithmetic read-once formula (ROF for short) is a formula (a circuit whose underlying graph is a tree)

More information

The Power of Random Counterexamples

The Power of Random Counterexamples Proceedings of Machine Learning Research 76:1 14, 2017 Algorithmic Learning Theory 2017 The Power of Random Counterexamples Dana Angluin DANA.ANGLUIN@YALE.EDU and Tyler Dohrn TYLER.DOHRN@YALE.EDU Department

More information

Learning and Fourier Analysis

Learning and Fourier Analysis Learning and Fourier Analysis Grigory Yaroslavtsev http://grigory.us Slides at http://grigory.us/cis625/lecture2.pdf CIS 625: Computational Learning Theory Fourier Analysis and Learning Powerful tool for

More information

1 Introduction The j-state General Markov Model of Evolution was proposed by Steel in 1994 [14]. The model is concerned with the evolution of strings

1 Introduction The j-state General Markov Model of Evolution was proposed by Steel in 1994 [14]. The model is concerned with the evolution of strings Evolutionary Trees can be Learned in Polynomial Time in the Two-State General Markov Model Mary Cryan Leslie Ann Goldberg Paul W. Goldberg. July 20, 1998 Abstract The j-state General Markov Model of evolution

More information

On-line Bin-Stretching. Yossi Azar y Oded Regev z. Abstract. We are given a sequence of items that can be packed into m unit size bins.

On-line Bin-Stretching. Yossi Azar y Oded Regev z. Abstract. We are given a sequence of items that can be packed into m unit size bins. On-line Bin-Stretching Yossi Azar y Oded Regev z Abstract We are given a sequence of items that can be packed into m unit size bins. In the classical bin packing problem we x the size of the bins and try

More information

Learning Boolean Formulae. Ming Li z. University ofwaterloo. Abstract

Learning Boolean Formulae. Ming Li z. University ofwaterloo. Abstract Learning Boolean Formulae Michael Kearns y AT&T Bell Laboratories Ming Li z University ofwaterloo Leslie Valiant x Harvard University Abstract Ecient distribution-free learning of Boolean formulae from

More information

Learning DNF Expressions from Fourier Spectrum

Learning DNF Expressions from Fourier Spectrum Learning DNF Expressions from Fourier Spectrum Vitaly Feldman IBM Almaden Research Center vitaly@post.harvard.edu May 3, 2012 Abstract Since its introduction by Valiant in 1984, PAC learning of DNF expressions

More information

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................

More information

Learning DNF from Random Walks

Learning DNF from Random Walks Learning DNF from Random Walks Nader Bshouty Department of Computer Science Technion bshouty@cs.technion.ac.il Ryan O Donnell Institute for Advanced Study Princeton, NJ odonnell@theory.lcs.mit.edu Elchanan

More information

3.1 Decision Trees Are Less Expressive Than DNFs

3.1 Decision Trees Are Less Expressive Than DNFs CS 395T Computational Complexity of Machine Learning Lecture 3: January 27, 2005 Scribe: Kevin Liu Lecturer: Adam Klivans 3.1 Decision Trees Are Less Expressive Than DNFs 3.1.1 Recap Recall the discussion

More information

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 236756 Prof. Nir Ailon Lecture 4: Computational Complexity of Learning & Surrogate Losses Efficient PAC Learning Until now we were mostly worried about sample complexity

More information

A An Overview of Complexity Theory for the Algorithm Designer

A An Overview of Complexity Theory for the Algorithm Designer A An Overview of Complexity Theory for the Algorithm Designer A.1 Certificates and the class NP A decision problem is one whose answer is either yes or no. Two examples are: SAT: Given a Boolean formula

More information

Read-Once Polynomial Identity Testing

Read-Once Polynomial Identity Testing Read-Once Polynomial Identity Testing Amir Shpilka Ilya Volkovich Abstract An arithmetic read-once formula (ROF for short) is a formula (a circuit whose underlying graph is a tree) in which the operations

More information

Decidability of Existence and Construction of a Complement of a given function

Decidability of Existence and Construction of a Complement of a given function Decidability of Existence and Construction of a Complement of a given function Ka.Shrinivaasan, Chennai Mathematical Institute (CMI) (shrinivas@cmi.ac.in) April 28, 2011 Abstract This article denes a complement

More information

Separating Models of Learning with Faulty Teachers

Separating Models of Learning with Faulty Teachers Separating Models of Learning with Faulty Teachers Vitaly Feldman a,,1 Shrenik Shah b a IBM Almaden Research Center, San Jose, CA 95120, USA b Harvard University, Cambridge, MA 02138, USA Abstract We study

More information

Lecture 1: 01/22/2014

Lecture 1: 01/22/2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 1: 01/22/2014 Spring 2014 Scribes: Clément Canonne and Richard Stark 1 Today High-level overview Administrative

More information

A Lower Bound of 2 n Conditional Jumps for Boolean Satisfiability on A Random Access Machine

A Lower Bound of 2 n Conditional Jumps for Boolean Satisfiability on A Random Access Machine A Lower Bound of 2 n Conditional Jumps for Boolean Satisfiability on A Random Access Machine Samuel C. Hsieh Computer Science Department, Ball State University July 3, 2014 Abstract We establish a lower

More information

Coins with arbitrary weights. Abstract. Given a set of m coins out of a collection of coins of k unknown distinct weights, we wish to

Coins with arbitrary weights. Abstract. Given a set of m coins out of a collection of coins of k unknown distinct weights, we wish to Coins with arbitrary weights Noga Alon Dmitry N. Kozlov y Abstract Given a set of m coins out of a collection of coins of k unknown distinct weights, we wish to decide if all the m given coins have the

More information

Circuits. Lecture 11 Uniform Circuit Complexity

Circuits. Lecture 11 Uniform Circuit Complexity Circuits Lecture 11 Uniform Circuit Complexity 1 Recall 2 Recall Non-uniform complexity 2 Recall Non-uniform complexity P/1 Decidable 2 Recall Non-uniform complexity P/1 Decidable NP P/log NP = P 2 Recall

More information

Notes on Iterated Expectations Stephen Morris February 2002

Notes on Iterated Expectations Stephen Morris February 2002 Notes on Iterated Expectations Stephen Morris February 2002 1. Introduction Consider the following sequence of numbers. Individual 1's expectation of random variable X; individual 2's expectation of individual

More information

APPROXIMATING THE COMPLEXITY MEASURE OF. Levent Tuncel. November 10, C&O Research Report: 98{51. Abstract

APPROXIMATING THE COMPLEXITY MEASURE OF. Levent Tuncel. November 10, C&O Research Report: 98{51. Abstract APPROXIMATING THE COMPLEXITY MEASURE OF VAVASIS-YE ALGORITHM IS NP-HARD Levent Tuncel November 0, 998 C&O Research Report: 98{5 Abstract Given an m n integer matrix A of full row rank, we consider the

More information

Property Testing: A Learning Theory Perspective

Property Testing: A Learning Theory Perspective Property Testing: A Learning Theory Perspective Dana Ron School of EE Tel-Aviv University Ramat Aviv, Israel danar@eng.tau.ac.il Abstract Property testing deals with tasks where the goal is to distinguish

More information

The Tensor Product of Two Codes is Not Necessarily Robustly Testable

The Tensor Product of Two Codes is Not Necessarily Robustly Testable The Tensor Product of Two Codes is Not Necessarily Robustly Testable Paul Valiant Massachusetts Institute of Technology pvaliant@mit.edu Abstract. There has been significant interest lately in the task

More information

Chapter 1. Comparison-Sorting and Selecting in. Totally Monotone Matrices. totally monotone matrices can be found in [4], [5], [9],

Chapter 1. Comparison-Sorting and Selecting in. Totally Monotone Matrices. totally monotone matrices can be found in [4], [5], [9], Chapter 1 Comparison-Sorting and Selecting in Totally Monotone Matrices Noga Alon Yossi Azar y Abstract An mn matrix A is called totally monotone if for all i 1 < i 2 and j 1 < j 2, A[i 1; j 1] > A[i 1;

More information

New Results for Random Walk Learning

New Results for Random Walk Learning Journal of Machine Learning Research 15 (2014) 3815-3846 Submitted 1/13; Revised 5/14; Published 11/14 New Results for Random Walk Learning Jeffrey C. Jackson Karl Wimmer Duquesne University 600 Forbes

More information

are based on 3-CNF-SAT are: Not-All-Equal-SAT [Scha78], where it is asked for a satisfying assignment of a given 3-CNF formula that, for any clause, d

are based on 3-CNF-SAT are: Not-All-Equal-SAT [Scha78], where it is asked for a satisfying assignment of a given 3-CNF formula that, for any clause, d Satisability Problems Manindra Agrawal Dept. of Computer Science Indian Institute of Technology Kanpur 208016, India manindra@iitk.ernet.in Thomas Thierauf y Abt. Theoretische Informatik Universitat Ulm

More information

CS154, Lecture 15: Cook-Levin Theorem SAT, 3SAT

CS154, Lecture 15: Cook-Levin Theorem SAT, 3SAT CS154, Lecture 15: Cook-Levin Theorem SAT, 3SAT Definition: A language B is NP-complete if: 1. B NP 2. Every A in NP is poly-time reducible to B That is, A P B When this is true, we say B is NP-hard On

More information

How many rounds can Random Selection handle?

How many rounds can Random Selection handle? How many rounds can Random Selection handle? Shengyu Zhang Abstract The construction of zero-knowledge proofs can be greatly simplified if the protocol is only required be secure against the honest verifier.

More information

arxiv: v2 [cs.ds] 3 Oct 2017

arxiv: v2 [cs.ds] 3 Oct 2017 Orthogonal Vectors Indexing Isaac Goldstein 1, Moshe Lewenstein 1, and Ely Porat 1 1 Bar-Ilan University, Ramat Gan, Israel {goldshi,moshe,porately}@cs.biu.ac.il arxiv:1710.00586v2 [cs.ds] 3 Oct 2017 Abstract

More information

2-LOCAL RANDOM REDUCTIONS TO 3-VALUED FUNCTIONS

2-LOCAL RANDOM REDUCTIONS TO 3-VALUED FUNCTIONS 2-LOCAL RANDOM REDUCTIONS TO 3-VALUED FUNCTIONS A. Pavan and N. V. Vinodchandran Abstract. Yao (in a lecture at DIMACS Workshop on structural complexity and cryptography, 1990) showed that if a language

More information

Lecture Introduction. 2 Formal Definition. CS CTT Current Topics in Theoretical CS Oct 30, 2012

Lecture Introduction. 2 Formal Definition. CS CTT Current Topics in Theoretical CS Oct 30, 2012 CS 59000 CTT Current Topics in Theoretical CS Oct 30, 0 Lecturer: Elena Grigorescu Lecture 9 Scribe: Vivek Patel Introduction In this lecture we study locally decodable codes. Locally decodable codes are

More information

On the Gap Between ess(f) and cnf size(f) (Extended Abstract)

On the Gap Between ess(f) and cnf size(f) (Extended Abstract) On the Gap Between and (Extended Abstract) Lisa Hellerstein and Devorah Kletenik Polytechnic Institute of NYU, 6 Metrotech Center, Brooklyn, N.Y., 11201 Abstract Given a Boolean function f, denotes the

More information

Online Learning versus Offline Learning*

Online Learning versus Offline Learning* Machine Learning, 29, 45 63 (1997) c 1997 Kluwer Academic Publishers. Manufactured in The Netherlands. Online Learning versus Offline Learning* SHAI BEN-DAVID Computer Science Dept., Technion, Israel.

More information

Lecture 3: Error Correcting Codes

Lecture 3: Error Correcting Codes CS 880: Pseudorandomness and Derandomization 1/30/2013 Lecture 3: Error Correcting Codes Instructors: Holger Dell and Dieter van Melkebeek Scribe: Xi Wu In this lecture we review some background on error

More information

Computational Learning Theory. Definitions

Computational Learning Theory. Definitions Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational

More information

Unconditional Lower Bounds for Learning Intersections of Halfspaces

Unconditional Lower Bounds for Learning Intersections of Halfspaces Unconditional Lower Bounds for Learning Intersections of Halfspaces Adam R. Klivans Alexander A. Sherstov The University of Texas at Austin Department of Computer Sciences Austin, TX 78712 USA {klivans,sherstov}@cs.utexas.edu

More information

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science

More information

Mansour s Conjecture is True for Random DNF Formulas

Mansour s Conjecture is True for Random DNF Formulas Mansour s Conjecture is True for Random DNF Formulas Adam Klivans University of Texas at Austin klivans@cs.utexas.edu Homin K. Lee University of Texas at Austin homin@cs.utexas.edu March 9, 2010 Andrew

More information

Web-Mining Agents Computational Learning Theory

Web-Mining Agents Computational Learning Theory Web-Mining Agents Computational Learning Theory Prof. Dr. Ralf Möller Dr. Özgür Özcep Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Exercise Lab) Computational Learning Theory (Adapted)

More information

Being Taught can be Faster than Asking Questions

Being Taught can be Faster than Asking Questions Being Taught can be Faster than Asking Questions Ronald L. Rivest Yiqun Lisa Yin Abstract We explore the power of teaching by studying two on-line learning models: teacher-directed learning and self-directed

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

EQUIVALENCES AND SEPARATIONS BETWEEN QUANTUM AND CLASSICAL LEARNABILITY

EQUIVALENCES AND SEPARATIONS BETWEEN QUANTUM AND CLASSICAL LEARNABILITY EQUIVALENCES AND SEPARATIONS BETWEEN QUANTUM AND CLASSICAL LEARNABILITY ROCCO A. SERVEDIO AND STEVEN J. GORTLER Abstract. We consider quantum versions of two well-studied models of learning Boolean functions:

More information