3 Finish learning monotone Boolean functions

Size: px
Start display at page:

Download "3 Finish learning monotone Boolean functions"

Transcription

1 COMS : Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 5: 02/19/2014 Spring 2014 Scribes: Dimitris Paidarakis 1 Last time Finished KM algorithm; Applications of KM algorithm: learning decision trees;learning functions with sparse Fourier representation (in particular k-juntas of parities); started learning monotone Boolean functions (via their influence). 2 Today Finish learning monotone Boolean functions (using Inf i [f])), and AC 0 circuits (via Fourier concentration on low-degree coefficients no membership queries); Lower bounds for learning monotone Boolean functions; learning k-juntas of half spaces in poly(( nk )k ) time. No Fourier here! (but membership queries). Relevant Readings: Mansour [Man94]: Learning Boolean Functions via the Fourier Transform. Gopalan, Klivans and Meka [GKM12]: Learning Functions of Halfspaces Using Prefix Covers. 3 Finish learning monotone Boolean functions Recall: f : { 1, 1} n { 1, 1} is monotone if x y implies f(x) f(y) Influence: Inf i [f] = Pr x [ f(x i 1 ) f(x i 1 ) ] for i [n] Total influence: Inf[f] = n Inf i[f]; an important example is Inf[MAJ] n (for the majority function MAJ(x) = sign(x x n )) 1

2 3 FINISH LEARNING MONOTONE BOOLEAN FUNCTIONS 2 for all Boolean functions f, Inf i [f] = S i ˆf(S) 2 ; and thus Inf[f] = S S ˆf(S) 2. Claim 1. If f is a monotone Boolean function, we have Inf i [f] = ˆf(i) (where ˆf(i) is short for ˆf({i})). Proof. Without loss of generality, we consider the case i = 1: Inf 1 [f] = Pr [ f(1x ) f( 1x ) ] = { x { 1, 1} n 1 : f(1x ) = 1, f( 1x ) = 1 } x { 1,1} n 1 2 n 1 with the last equality holding because of the monotonicity of f. But on the other hand, ˆf(1) = E[f(x)x 1 ] = 1 (f(1x ) f( 1x )) 2 n x { 1,1} n 1 = { x { 1, 1} n 1 : f(1x ) = 1, f( 1x ) = 1 } 2 n 1 where the second equality again uses the monotonicity of f. Lemma 2. For any monotone Boolean function f, Inf[f] Inf[MAJ] n. Proof. A first approach: we can prove that Inf[f] n using the Cauchy-Schwarz inequality: n Inf[f] = ˆf(i) 1 n 1 2 n ˆf(i) 2 n (Cauchy Schwarz) since n ˆf(i) 2 S [n] ˆf(S) 2 = 1. Or we can also show the stronger statement that Inf[f] Inf[MAJ]: [ ] n n n Inf[f] = ˆf(i) = E[f(x)x i ] = E f(x) x i = E[f(x) (x x n )] E[MAJ(x) (x x n )] = Inf[MAJ] (where the inequality comes from observing that since f(x) { 1, 1}, the quantity f(x) (x x n ) is at most x x n, which happens for f = MAJ, i.e. when f(x) = sign(x x n )).

3 4 LOWER BOUNDS FOR LEARNING MONOTONE BOOLEAN FUNCTIONS 3 We can combine the facts above to obtain Fourier concentration for monotone Boolean functions; more generally: Theorem 3. Let f be any monotone Boolean function. Then ˆf(S) 2. S Inf[f] Proof. By contradiction, assume S Inf[f] Inf[f] = S [n] S ˆf(S) 2 leading to a contradiction. S Inf[f] ˆf(S) 2 > ; then S ˆf(S) 2 Inf[f] S Inf[f] ˆf(S) 2 > Inf[f] = Inf[f] Remark 1. One can also see this proof as an application of Markov s Inequality, by seeing the Fourier weights ˆf(S) 2 as the probability distribution ˆf 2 they induce over subsets of [n]. The theorem now can be rephrased as as Inf[f] = E S ˆf(S) 2 S. [ Pr S Inf[f] ] S ˆf(S) 2 E S ˆf(S) 2 S Inf[f] Corollary { 4. Suppose f : { 1, 1} n { 1, 1} is monotone. Then f is -concentrated on S def = S [n] : S } n. 4 Lower bounds for learning monotone Boolean functions As a direct consequence, the LMN algorithm will learn any monotone Boolean function n in time poly(n ) = 2 O( n(log n)/). While this constitutes a huge saving comparison to the 2 Ω(n) general bound, this is still it is a lot! Hence, an immediate question is Can we do better?. =

4 4 LOWER BOUNDS FOR LEARNING MONOTONE BOOLEAN FUNCTIONS 4 One may first ask whether there is a better analysis of the LMN algorithm for monotone Boolean functions which would yield significantly better performance. However, the answer to this is negative; it is known that there exist monotone Boolean functions such that S n/100 ˆf(S) , which imply that no low-degree learning algorithm such as LMN can do better than to deal with the n Ω( n) Fourier coefficients up to degree Ω( n). ( ) Learning to high accuracy Clearly, the 2 O n log n becomes trivial for 1/ n; hence, this range of accuracy seems like a good regime to look for a lower bound. And indeed, one can show that we cannot efficiently learn monotone Boolean functions to high accuracy, which we do below. Claim 5. There is a class C of monotone Boolean functions such that, if the target function f is drawn uniformly from C, then any learning algorithm A making less than 1 2n 10 n membership queries will output a hypothesis h such that E[dist(f, h)] 1 5 n (where the expectation is taken over the draw of f). Proof. For simplicity consider n even (the case n odd is similar, up to technicalities). Define C to be the class of monotone Boolean functions f such that +1 if n x i > 0 f(x) = 1 if x i < 0 ±1 if x i = 0 (arbitrarily) Equivalently, drawing a function from this class amounts to tossing ( ) n n independent 2 fair coins that specify the value of f on the middle layer of the hypercube (where i x i = 0). Yet, the learning algorithm makes at most 1 2n 10 n MQ in this middle layer, which contains between 1 2n 2 n and 2n n different points. So A sees less than a 1 fraction 5 of the values of the inputs that are define f, and misses at least a 4 fraction of them. 5 Each unseen point contributes on expectation 1 1 to the error of its hypothesis h. 2 2 n Therefore, E[error(h)] n 1 n 2 = 1 n 5 n. In fact, a stronger lower bound can be proven:

5 5 MAIN CONTRIBUTION OF LMN: LEARNING AC 0 CIRCUITS 5 Theorem 6 ([BBL98]). There is a (different) class C of monotone Boolean functions n such that any algorithm that makes at most membership queries outputs, when the target function f is drawn uniformly from C, a hypothesis h such that E[dist(f, h)] High-level sketch of proof. Each f C n is a term monotone DNF, f = T 1 T 2 n/50, where each term T 1,..., T 2 n/50 is drawn independently from the set of all c n -length 50 conjunctions (for an appropriately chosen constant c so that the function is balanced with high probability). The argument then goes roughly as follows: every time a query x satisfies one of the terms, the algorithm is given for free all the variables of the term. But even with this overly generous assumption, there are at most 2 n/100 positive examples, hence at most 2 n/100 terms out of the 2 n/50 total terms are shown to the algorithm. Intuitively, this means that the algorithm does not see anything about almost all terms (with high probability); further, each negative example eliminates (again, with high probability) very few possible terms, so that negative examples do not help either. 5 Main contribution of LMN: learning AC 0 circuits x 1 x 3 x 4 x 7 Above we see a size-6 and depth-3 constant depth circuit. Linial, Mansour, Nisan showed { that if f is computed by a size-m, depth-d circuit then f is -concentrated on S = S : S ( O ( )) } log M D. That is we can learn the class AC 0 of constant-depth, polynomial-size Boolean circuits in n poly(log n ) -time. see HW page for related problem: there exists a depth-d, size-m circuit with no Fourier weight on any subset S of coefficients such that S log D 1 M.

6 6 LEARNING HALFSPACES 6 6 Learning halfspaces Definition 7. A Boolean function f : {0, 1} n { 1, 1} is said to be a halfspace (or Linear Threshold Function (LTF)) if there exist weights w 1,..., w n R and threshold θ R such that f(x) = sign(w x θ) for all x {0, 1} n. Fact 8 (PAC-learning Halfspaces). There is an algorithm that can learn any unknown halfspace over {0, 1} n in poly(n, 1, log 1 )-time, using only random independent and identically distributed examples under arbitrary distribution D over {0, 1} n. The algorithm δ outputs with probability at least 1 δ an hypothesis h such that Pr x D [ f(x) h(x) ]. This algorithm is based on polynomial-time linear programming. It works when f is a halfspace, but breaks down completely if f is a function of halfspaces, such as f = h 1 h 2. Indeed, in the arbitrary distribution D model, even if we allow membership queries no algorithm faster than 2 n time is known for f = h 1 h 2. So we will restrict our attention (as usual) to the uniform-distribution setting. One question to ask is whether one can use Fourier analysis to learn (under the uniform distribution) a single halfspace or more ambitiously a function g(h 1,... h k ) of halfspaces, where g : {0, 1} k { 1, 1}. The following results are known here: Let h = MAJ. If S is such that S S ĥ(s)2 1, then it must be the case that S = n Ω( 1 2 ). In particular, the KM algorithm will not work well. There exists f = g(h 1,..., h k ) such that if S is such that ˆf(S) S S 2 1, then it must be the case that S = (n/k) Ω(k2 / 2 ). For k small compared to n this is n Ω(k2 / 2 ). These are bad-news results for Fourier concentration. The good news is this is as bad as the bad news gets; it is known that any f = g(h 1,..., h k ) satisfies ˆf(S) 2 1 S O(k2 ) 2 ( ) and thus can be learnt with LMN in n O k 2 2 time (without membership queries). However, with membership queries, it is possible to achieve much better running time namely, polynomial in n and 1/ for any fixed constant k: Theorem 9 ([GKM12]). The class of k-juntas of halfspaces (functions of the form f = g(h 1,..., h k ) with the h i s being halfspaces) can be learnt under the uniform distribution in poly(( nk )k )-time, using membership queries.

7 6 LEARNING HALFSPACES 7 Idea of the proof The algorithm will use a hypothesis that is a Read Once Branching Program (ROBP). Definition 10. A width-w ROBP M is a layered digraph with layers 0, 1,..., n and at most W nodes in each layer. L(M, i) is the set of nodes in layer i, with L(M, 0) = {v 0 } (v 0 being the start node). Moreover, each node in L(M, n) is labeled 0 or 1 (respectively, ACCEPT or REJECT); for i {0, 1,..., n 1}, each v L(M, i) has two out-edges, one labeled 0 and the other labeled 1, both going to nodes in L(M, i + 1); for z {0, 1} i and node v, M(v, z) denotes the node reached starting from v and following i edges according to z. We can view a ROBP as a Boolean function M : {0, 1} n {0, 1} by setting, for z {0, 1} n, M(z) def = { 0 if M(v 0, z) is labeled 0 1 if M(v 0, z) is labeled 1. Notation We will write U i for the uniform distribution over {0, 1} i. For a prefix x {0, 1} i (with i n), we define f x : {0, 1} n i {0, 1} by f x (z) = f(x z), where stands for concatenation. Note that dist(f x, f y ) = Pr z {0,1} n 1 [ f(x z) f(y z) ]. Definition 11. A function f : {0, 1} n {0, 1} is said to be (, W )-prefix coverable if for all i [n] there exists S i {0, 1} i with S i W such that y {0, 1} i, x S i such that dist(f x, f y ). The collection (S 1,..., S n ) is then called an (, W )-prefix cover of f. The two building blocks of the proofs will be the following lemmas: Lemma 12. Every k-junta of LTFs g(h 1,..., h k ) is (, ( 4k )k )-prefix coverable. Lemma 13. There is a membership-query algorithm which, given, W, and MQ(f) for some (, W )-prefix coverable function f, outputs (a W -ROBP) h such that dist(h, f) 4n. Furthermore, the algorithm runes in time poly(n, W, 1, log 1 δ ). Remark 2. The two lemmas above combined yield the theorem: to learn a k-junta of LTFs to accuracy, set def =, W def = ( 4k 4n )k = ( 16kn ) k and run the algorithm of Lemma 13.

8 7 NEXT TIME 8 7 Next time Lemma 2 is a direct consequence of the following two claims, which we will prove next time. Claim 14. If h is any LTF, h is (, 2 )-prefix coverable. Claim 15. Let f 1,..., f k be any (, W )-prefix coverable functions, and fix any g : {0, 1} k {0, 1}. Then g(f 1,..., f k ) is (2k, W k )-prefix coverable. References [BBL98] A. Blum, C. Burch, and J. Langford. On learning monotone boolean functions. In Proceedings of the Thirty-Ninth Annual Symposium on Foundations of Computer Science, pages , [GKM12] Parikshit Gopalan, Adam R. Klivans, and Raghu Meka. Learning functions of halfspaces using prefix covers. In Mannor et al. [MSW12], pages [Man94] Yishay Mansour. Learning boolean functions via the fourier transform. In Vwani Roychowdhury, Kai-Yeung Siu, and Alon Orlitsky, editors, Theoretical Advances in Neural Computation and Learning, pages Springer US, [MSW12] Shie Mannor, Nathan Srebro, and Robert C. Williamson, editors. COLT The 25th Annual Conference on Learning Theory, June 25-27, 2012, Edinburgh, Scotland, volume 23 of JMLR Proceedings. JMLR.org, 2012.

Lecture 5: February 16, 2012

Lecture 5: February 16, 2012 COMS 6253: Advanced Computational Learning Theory Lecturer: Rocco Servedio Lecture 5: February 16, 2012 Spring 2012 Scribe: Igor Carboni Oliveira 1 Last time and today Previously: Finished first unit on

More information

Learning Functions of Halfspaces Using Prefix Covers

Learning Functions of Halfspaces Using Prefix Covers JMLR: Workshop and Conference Proceedings vol (2010) 15.1 15.10 25th Annual Conference on Learning Theory Learning Functions of Halfspaces Using Prefix Covers Parikshit Gopalan MSR Silicon Valley, Mountain

More information

6.842 Randomness and Computation April 2, Lecture 14

6.842 Randomness and Computation April 2, Lecture 14 6.84 Randomness and Computation April, 0 Lecture 4 Lecturer: Ronitt Rubinfeld Scribe: Aaron Sidford Review In the last class we saw an algorithm to learn a function where very little of the Fourier coeffecient

More information

1 Last time and today

1 Last time and today COMS 6253: Advanced Computational Learning Spring 2012 Theory Lecture 12: April 12, 2012 Lecturer: Rocco Servedio 1 Last time and today Scribe: Dean Alderucci Previously: Started the BKW algorithm for

More information

Lecture 9: March 26, 2014

Lecture 9: March 26, 2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 9: March 26, 204 Spring 204 Scriber: Keith Nichols Overview. Last Time Finished analysis of O ( n ɛ ) -query

More information

Lecture 4: LMN Learning (Part 2)

Lecture 4: LMN Learning (Part 2) CS 294-114 Fine-Grained Compleity and Algorithms Sept 8, 2015 Lecture 4: LMN Learning (Part 2) Instructor: Russell Impagliazzo Scribe: Preetum Nakkiran 1 Overview Continuing from last lecture, we will

More information

Learning and Fourier Analysis

Learning and Fourier Analysis Learning and Fourier Analysis Grigory Yaroslavtsev http://grigory.us Slides at http://grigory.us/cis625/lecture2.pdf CIS 625: Computational Learning Theory Fourier Analysis and Learning Powerful tool for

More information

Testing by Implicit Learning

Testing by Implicit Learning Testing by Implicit Learning Rocco Servedio Columbia University ITCS Property Testing Workshop Beijing January 2010 What this talk is about 1. Testing by Implicit Learning: method for testing classes of

More information

Agnostic Learning of Disjunctions on Symmetric Distributions

Agnostic Learning of Disjunctions on Symmetric Distributions Agnostic Learning of Disjunctions on Symmetric Distributions Vitaly Feldman vitaly@post.harvard.edu Pravesh Kothari kothari@cs.utexas.edu May 26, 2014 Abstract We consider the problem of approximating

More information

Lecture 1: 01/22/2014

Lecture 1: 01/22/2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 1: 01/22/2014 Spring 2014 Scribes: Clément Canonne and Richard Stark 1 Today High-level overview Administrative

More information

CSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds

CSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds CSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds Lecturer: Toniann Pitassi Scribe: Robert Robere Winter 2014 1 Switching

More information

Lecture 10: Learning DNF, AC 0, Juntas. 1 Learning DNF in Almost Polynomial Time

Lecture 10: Learning DNF, AC 0, Juntas. 1 Learning DNF in Almost Polynomial Time Analysis of Boolean Functions (CMU 8-859S, Spring 2007) Lecture 0: Learning DNF, AC 0, Juntas Feb 5, 2007 Lecturer: Ryan O Donnell Scribe: Elaine Shi Learning DNF in Almost Polynomial Time From previous

More information

5.1 Learning using Polynomial Threshold Functions

5.1 Learning using Polynomial Threshold Functions CS 395T Computational Learning Theory Lecture 5: September 17, 2007 Lecturer: Adam Klivans Scribe: Aparajit Raghavan 5.1 Learning using Polynomial Threshold Functions 5.1.1 Recap Definition 1 A function

More information

Separating Models of Learning from Correlated and Uncorrelated Data

Separating Models of Learning from Correlated and Uncorrelated Data Separating Models of Learning from Correlated and Uncorrelated Data Ariel Elbaz, Homin K. Lee, Rocco A. Servedio, and Andrew Wan Department of Computer Science Columbia University {arielbaz,homin,rocco,atw12}@cs.columbia.edu

More information

Lecture 7: Passive Learning

Lecture 7: Passive Learning CS 880: Advanced Complexity Theory 2/8/2008 Lecture 7: Passive Learning Instructor: Dieter van Melkebeek Scribe: Tom Watson In the previous lectures, we studied harmonic analysis as a tool for analyzing

More information

Unconditional Lower Bounds for Learning Intersections of Halfspaces

Unconditional Lower Bounds for Learning Intersections of Halfspaces Unconditional Lower Bounds for Learning Intersections of Halfspaces Adam R. Klivans Alexander A. Sherstov The University of Texas at Austin Department of Computer Sciences Austin, TX 78712 USA {klivans,sherstov}@cs.utexas.edu

More information

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6 CS 395T Computational Learning Theory Lecture 3: September 0, 2007 Lecturer: Adam Klivans Scribe: Mike Halcrow 3. Decision List Recap In the last class, we determined that, when learning a t-decision list,

More information

arxiv: v1 [cs.cc] 29 Feb 2012

arxiv: v1 [cs.cc] 29 Feb 2012 On the Distribution of the Fourier Spectrum of Halfspaces Ilias Diakonikolas 1, Ragesh Jaiswal 2, Rocco A. Servedio 3, Li-Yang Tan 3, and Andrew Wan 4 arxiv:1202.6680v1 [cs.cc] 29 Feb 2012 1 University

More information

Uniform-Distribution Attribute Noise Learnability

Uniform-Distribution Attribute Noise Learnability Uniform-Distribution Attribute Noise Learnability Nader H. Bshouty Dept. Computer Science Technion Haifa 32000, Israel bshouty@cs.technion.ac.il Jeffrey C. Jackson Math. & Comp. Science Dept. Duquesne

More information

8.1 Polynomial Threshold Functions

8.1 Polynomial Threshold Functions CS 395T Computational Learning Theory Lecture 8: September 22, 2008 Lecturer: Adam Klivans Scribe: John Wright 8.1 Polynomial Threshold Functions In the previous lecture, we proved that any function over

More information

Uniform-Distribution Attribute Noise Learnability

Uniform-Distribution Attribute Noise Learnability Uniform-Distribution Attribute Noise Learnability Nader H. Bshouty Technion Haifa 32000, Israel bshouty@cs.technion.ac.il Christino Tamon Clarkson University Potsdam, NY 13699-5815, U.S.A. tino@clarkson.edu

More information

Upper Bounds on Fourier Entropy

Upper Bounds on Fourier Entropy Upper Bounds on Fourier Entropy Sourav Chakraborty 1, Raghav Kulkarni 2, Satyanarayana V. Lokam 3, and Nitin Saurabh 4 1 Chennai Mathematical Institute, Chennai, India sourav@cmi.ac.in 2 Centre for Quantum

More information

Learning DNF Expressions from Fourier Spectrum

Learning DNF Expressions from Fourier Spectrum Learning DNF Expressions from Fourier Spectrum Vitaly Feldman IBM Almaden Research Center vitaly@post.harvard.edu May 3, 2012 Abstract Since its introduction by Valiant in 1984, PAC learning of DNF expressions

More information

Learning DNF from Random Walks

Learning DNF from Random Walks Learning DNF from Random Walks Nader Bshouty Department of Computer Science Technion bshouty@cs.technion.ac.il Ryan O Donnell Institute for Advanced Study Princeton, NJ odonnell@theory.lcs.mit.edu Elchanan

More information

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Avrim Blum Lecture 9: October 7, 2015 1 Computational Hardness of Learning Today we will talk about some computational hardness results

More information

Lecture 13: 04/23/2014

Lecture 13: 04/23/2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 13: 04/23/2014 Spring 2014 Scribe: Psallidas Fotios Administrative: Submit HW problem solutions by Wednesday,

More information

Lecture 5: February 21, 2017

Lecture 5: February 21, 2017 COMS 6998: Advanced Complexity Spring 2017 Lecture 5: February 21, 2017 Lecturer: Rocco Servedio Scribe: Diana Yin 1 Introduction 1.1 Last Time 1. Established the exact correspondence between communication

More information

Polynomial time Prediction Strategy with almost Optimal Mistake Probability

Polynomial time Prediction Strategy with almost Optimal Mistake Probability Polynomial time Prediction Strategy with almost Optimal Mistake Probability Nader H. Bshouty Department of Computer Science Technion, 32000 Haifa, Israel bshouty@cs.technion.ac.il Abstract We give the

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7: Computational Complexity of Learning Agnostic Learning Hardness of Learning via Crypto Assumption: No poly-time algorithm

More information

Notes for Lecture 15

Notes for Lecture 15 U.C. Berkeley CS278: Computational Complexity Handout N15 Professor Luca Trevisan 10/27/2004 Notes for Lecture 15 Notes written 12/07/04 Learning Decision Trees In these notes it will be convenient to

More information

Lecture 29: Computational Learning Theory

Lecture 29: Computational Learning Theory CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational

More information

Learning Juntas. Elchanan Mossel CS and Statistics, U.C. Berkeley Berkeley, CA. Rocco A. Servedio MIT Department of.

Learning Juntas. Elchanan Mossel CS and Statistics, U.C. Berkeley Berkeley, CA. Rocco A. Servedio MIT Department of. Learning Juntas Elchanan Mossel CS and Statistics, U.C. Berkeley Berkeley, CA mossel@stat.berkeley.edu Ryan O Donnell Rocco A. Servedio MIT Department of Computer Science Mathematics Department, Cambridge,

More information

Lecture 8: March 12, 2014

Lecture 8: March 12, 2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 8: March 2, 204 Spring 204 Scriber: Yichi Zhang Overview. Last time Proper learning for P implies property testing

More information

Optimal Cryptographic Hardness of Learning Monotone Functions

Optimal Cryptographic Hardness of Learning Monotone Functions Optimal Cryptographic Hardness of Learning Monotone Functions Dana Dachman-Soled, Homin K. Lee, Tal Malkin, Rocco A. Servedio, Andrew Wan, and Hoeteck Wee {dglasner,homin,tal,rocco,atw,hoeteck}@cs.columbia.edu

More information

Testing Monotone High-Dimensional Distributions

Testing Monotone High-Dimensional Distributions Testing Monotone High-Dimensional Distributions Ronitt Rubinfeld Computer Science & Artificial Intelligence Lab. MIT Cambridge, MA 02139 ronitt@theory.lcs.mit.edu Rocco A. Servedio Department of Computer

More information

Maximum Margin Algorithms with Boolean Kernels

Maximum Margin Algorithms with Boolean Kernels Maximum Margin Algorithms with Boolean Kernels Roni Khardon and Rocco A. Servedio 2 Department of Computer Science, Tufts University Medford, MA 0255, USA roni@cs.tufts.edu 2 Department of Computer Science,

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

Math 262A Lecture Notes - Nechiporuk s Theorem

Math 262A Lecture Notes - Nechiporuk s Theorem Math 6A Lecture Notes - Nechiporuk s Theore Lecturer: Sa Buss Scribe: Stefan Schneider October, 013 Nechiporuk [1] gives a ethod to derive lower bounds on forula size over the full binary basis B The lower

More information

Learning and Fourier Analysis

Learning and Fourier Analysis Learning and Fourier Analysis Grigory Yaroslavtsev http://grigory.us CIS 625: Computational Learning Theory Fourier Analysis and Learning Powerful tool for PAC-style learning under uniform distribution

More information

Improved Approximation of Linear Threshold Functions

Improved Approximation of Linear Threshold Functions Improved Approximation of Linear Threshold Functions Ilias Diakonikolas Computer Science Division UC Berkeley Berkeley, CA ilias@cs.berkeley.edu Rocco A. Servedio Department of Computer Science Columbia

More information

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model TTIC 325 An Introduction to the Theory of Machine Learning Learning from noisy data, intro to SQ model Avrim Blum 4/25/8 Learning when there is no perfect predictor Hoeffding/Chernoff bounds: minimizing

More information

More Efficient PAC-learning of DNF with Membership Queries. Under the Uniform Distribution

More Efficient PAC-learning of DNF with Membership Queries. Under the Uniform Distribution More Efficient PAC-learning of DNF with Membership Queries Under the Uniform Distribution Nader H. Bshouty Technion Jeffrey C. Jackson Duquesne University Christino Tamon Clarkson University Corresponding

More information

Attribute-Efficient Learning and Weight-Degree Tradeoffs for Polynomial Threshold Functions

Attribute-Efficient Learning and Weight-Degree Tradeoffs for Polynomial Threshold Functions JMLR: Workshop and Conference Proceedings vol 23 (2012) 14.1 14.19 25th Annual Conference on Learning Theory Attribute-Efficient Learning and Weight-Degree Tradeoffs for Polynomial Threshold Functions

More information

Fourier analysis of boolean functions in quantum computation

Fourier analysis of boolean functions in quantum computation Fourier analysis of boolean functions in quantum computation Ashley Montanaro Centre for Quantum Information and Foundations, Department of Applied Mathematics and Theoretical Physics, University of Cambridge

More information

Embedding Hard Learning Problems Into Gaussian Space

Embedding Hard Learning Problems Into Gaussian Space Embedding Hard Learning Problems Into Gaussian Space Adam Klivans and Pravesh Kothari The University of Texas at Austin, Austin, Texas, USA {klivans,kothari}@cs.utexas.edu Abstract We give the first representation-independent

More information

Distribution Free Learning with Local Queries

Distribution Free Learning with Local Queries Distribution Free Learning with Local Queries Galit Bary-Weisberg Amit Daniely Shai Shalev-Shwartz March 14, 2016 arxiv:1603.03714v1 [cs.lg] 11 Mar 2016 Abstract The model of learning with local membership

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework Computational Learning Theory - Hilary Term 2018 1 : Introduction to the PAC Learning Framework Lecturer: Varun Kanade 1 What is computational learning theory? Machine learning techniques lie at the heart

More information

On Learning Monotone DNF under Product Distributions

On Learning Monotone DNF under Product Distributions On Learning Monotone DNF under Product Distributions Rocco A. Servedio Department of Computer Science Columbia University New York, NY 10027 rocco@cs.columbia.edu December 4, 2003 Abstract We show that

More information

Continuity. Chapter 4

Continuity. Chapter 4 Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of

More information

Lecture 23: Analysis of Boolean Functions

Lecture 23: Analysis of Boolean Functions CCI-B609: A Theorist s Toolkit, Fall 016 Nov 9 Lecture 3: Analysis of Boolean Functions Lecturer: Yuan Zhou cribe: Haoyu Zhang 1 Introduction In this lecture, we will talk about boolean function f : {0,

More information

Junta Approximations for Submodular, XOS and Self-Bounding Functions

Junta Approximations for Submodular, XOS and Self-Bounding Functions Junta Approximations for Submodular, XOS and Self-Bounding Functions Vitaly Feldman Jan Vondrák IBM Almaden Research Center Simons Institute, Berkeley, October 2013 Feldman-Vondrák Approximations by Juntas

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

Learning Combinatorial Functions from Pairwise Comparisons

Learning Combinatorial Functions from Pairwise Comparisons Learning Combinatorial Functions from Pairwise Comparisons Maria-Florina Balcan Ellen Vitercik Colin White Abstract A large body of work in machine learning has focused on the problem of learning a close

More information

CSE 291: Fourier analysis Chapter 2: Social choice theory

CSE 291: Fourier analysis Chapter 2: Social choice theory CSE 91: Fourier analysis Chapter : Social choice theory 1 Basic definitions We can view a boolean function f : { 1, 1} n { 1, 1} as a means to aggregate votes in a -outcome election. Common examples are:

More information

Learning and Testing Submodular Functions

Learning and Testing Submodular Functions Learning and Testing Submodular Functions Grigory Yaroslavtsev http://grigory.us Slides at http://grigory.us/cis625/lecture3.pdf CIS 625: Computational Learning Theory Submodularity Discrete analog of

More information

Continuity. Chapter 4

Continuity. Chapter 4 Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Introduction to Computational Learning Theory

Introduction to Computational Learning Theory Introduction to Computational Learning Theory The classification problem Consistent Hypothesis Model Probably Approximately Correct (PAC) Learning c Hung Q. Ngo (SUNY at Buffalo) CSE 694 A Fun Course 1

More information

Mansour s Conjecture is True for Random DNF Formulas

Mansour s Conjecture is True for Random DNF Formulas Mansour s Conjecture is True for Random DNF Formulas Adam Klivans University of Texas at Austin klivans@cs.utexas.edu Homin K. Lee University of Texas at Austin homin@cs.utexas.edu March 9, 2010 Andrew

More information

ICML '97 and AAAI '97 Tutorials

ICML '97 and AAAI '97 Tutorials A Short Course in Computational Learning Theory: ICML '97 and AAAI '97 Tutorials Michael Kearns AT&T Laboratories Outline Sample Complexity/Learning Curves: nite classes, Occam's VC dimension Razor, Best

More information

Testing Problems with Sub-Learning Sample Complexity

Testing Problems with Sub-Learning Sample Complexity Testing Problems with Sub-Learning Sample Complexity Michael Kearns AT&T Labs Research 180 Park Avenue Florham Park, NJ, 07932 mkearns@researchattcom Dana Ron Laboratory for Computer Science, MIT 545 Technology

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex bodies is hard Navin Goyal Microsoft Research India navingo@microsoft.com Luis Rademacher Georgia Tech lrademac@cc.gatech.edu Abstract We show that learning a convex body in R d, given

More information

CS Foundations of Communication Complexity

CS Foundations of Communication Complexity CS 49 - Foundations of Communication Complexity Lecturer: Toniann Pitassi 1 The Discrepancy Method Cont d In the previous lecture we ve outlined the discrepancy method, which is a method for getting lower

More information

Yale University Department of Computer Science

Yale University Department of Computer Science Yale University Department of Computer Science Lower Bounds on Learning Random Structures with Statistical Queries Dana Angluin David Eisenstat Leonid (Aryeh) Kontorovich Lev Reyzin YALEU/DCS/TR-42 December

More information

1 The Probably Approximately Correct (PAC) Model

1 The Probably Approximately Correct (PAC) Model COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #3 Scribe: Yuhui Luo February 11, 2008 1 The Probably Approximately Correct (PAC) Model A target concept class C is PAC-learnable by

More information

Testing for Concise Representations

Testing for Concise Representations Testing for Concise Representations Ilias Diakonikolas Columbia University ilias@cs.columbia.edu Ronitt Rubinfeld MIT ronitt@theory.csail.mit.edu Homin K. Lee Columbia University homin@cs.columbia.edu

More information

On Noise-Tolerant Learning of Sparse Parities and Related Problems

On Noise-Tolerant Learning of Sparse Parities and Related Problems On Noise-Tolerant Learning of Sparse Parities and Related Problems Elena Grigorescu, Lev Reyzin, and Santosh Vempala School of Computer Science Georgia Institute of Technology 266 Ferst Drive, Atlanta

More information

Finite fields, randomness and complexity. Swastik Kopparty Rutgers University

Finite fields, randomness and complexity. Swastik Kopparty Rutgers University Finite fields, randomness and complexity Swastik Kopparty Rutgers University This talk Three great problems: Polynomial factorization Epsilon-biased sets Function uncorrelated with low-degree polynomials

More information

Hardness Results for Agnostically Learning Low-Degree Polynomial Threshold Functions

Hardness Results for Agnostically Learning Low-Degree Polynomial Threshold Functions Hardness Results for Agnostically Learning Low-Degree Polynomial Threshold Functions Ilias Diakonikolas Columbia University ilias@cs.columbia.edu Rocco A. Servedio Columbia University rocco@cs.columbia.edu

More information

On the Fourier spectrum of symmetric Boolean functions

On the Fourier spectrum of symmetric Boolean functions On the Fourier spectrum of symmetric Boolean functions Amir Shpilka Technion and MSR NE Based on joint work with Avishay Tal 1 Theme: Analysis of Boolean functions Pick favorite representation: Fourier

More information

3.1 Decision Trees Are Less Expressive Than DNFs

3.1 Decision Trees Are Less Expressive Than DNFs CS 395T Computational Complexity of Machine Learning Lecture 3: January 27, 2005 Scribe: Kevin Liu Lecturer: Adam Klivans 3.1 Decision Trees Are Less Expressive Than DNFs 3.1.1 Recap Recall the discussion

More information

Bounds on the OBDD-Size of Integer Multiplication via Universal Hashing

Bounds on the OBDD-Size of Integer Multiplication via Universal Hashing Bounds on the OBDD-Size of Integer Multiplication via Universal Hashing Philipp Woelfel Dept. of Computer Science University Dortmund D-44221 Dortmund Germany phone: +49 231 755-2120 fax: +49 231 755-2047

More information

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center On the power and the limits of evolvability Vitaly Feldman Almaden Research Center Learning from examples vs evolvability Learnable from examples Evolvable Parity functions The core model: PAC [Valiant

More information

Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets

Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets Emanuele Viola July 6, 2009 Abstract We prove that to store strings x {0, 1} n so that each prefix sum a.k.a. rank query Sumi := k i x k can

More information

MATH 409 Advanced Calculus I Lecture 12: Uniform continuity. Exponential functions.

MATH 409 Advanced Calculus I Lecture 12: Uniform continuity. Exponential functions. MATH 409 Advanced Calculus I Lecture 12: Uniform continuity. Exponential functions. Uniform continuity Definition. A function f : E R defined on a set E R is called uniformly continuous on E if for every

More information

CS 151 Complexity Theory Spring Solution Set 5

CS 151 Complexity Theory Spring Solution Set 5 CS 151 Complexity Theory Spring 2017 Solution Set 5 Posted: May 17 Chris Umans 1. We are given a Boolean circuit C on n variables x 1, x 2,..., x n with m, and gates. Our 3-CNF formula will have m auxiliary

More information

Proclaiming Dictators and Juntas or Testing Boolean Formulae

Proclaiming Dictators and Juntas or Testing Boolean Formulae Proclaiming Dictators and Juntas or Testing Boolean Formulae Michal Parnas The Academic College of Tel-Aviv-Yaffo Tel-Aviv, ISRAEL michalp@mta.ac.il Dana Ron Department of EE Systems Tel-Aviv University

More information

On the average sensitivity and density of k-cnf formulas

On the average sensitivity and density of k-cnf formulas On the average sensitivity and density of k-cnf formulas Dominik Scheder 1 and Li-Yang Tan 2 1 Aarhus University 2 Columbia University Abstract. We study the relationship between the average sensitivity

More information

New Results for Random Walk Learning

New Results for Random Walk Learning Journal of Machine Learning Research 15 (2014) 3815-3846 Submitted 1/13; Revised 5/14; Published 11/14 New Results for Random Walk Learning Jeffrey C. Jackson Karl Wimmer Duquesne University 600 Forbes

More information

Lecture 18: Inapproximability of MAX-3-SAT

Lecture 18: Inapproximability of MAX-3-SAT CS 880: Advanced Complexity Theory 3/7/2008 Lecture 18: Inapproximability of MAX-3-SAT Instructor: Dieter van Melkebeek Scribe: Jeff Kinne In this lecture we prove a tight inapproximability result for

More information

CS Communication Complexity: Applications and New Directions

CS Communication Complexity: Applications and New Directions CS 2429 - Communication Complexity: Applications and New Directions Lecturer: Toniann Pitassi 1 Introduction In this course we will define the basic two-party model of communication, as introduced in the

More information

Computational Learning

Computational Learning Online Learning Computational Learning Computer programs learn a rule as they go rather than being told. Can get information by being given labeled examples, asking queries, experimentation. Automatic

More information

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,

More information

Simple Learning Algorithms for. Decision Trees and Multivariate Polynomials. xed constant bounded product distribution. is depth 3 formulas.

Simple Learning Algorithms for. Decision Trees and Multivariate Polynomials. xed constant bounded product distribution. is depth 3 formulas. Simple Learning Algorithms for Decision Trees and Multivariate Polynomials Nader H. Bshouty Department of Computer Science University of Calgary Calgary, Alberta, Canada Yishay Mansour Department of Computer

More information

CS6840: Advanced Complexity Theory Mar 29, Lecturer: Jayalal Sarma M.N. Scribe: Dinesh K.

CS6840: Advanced Complexity Theory Mar 29, Lecturer: Jayalal Sarma M.N. Scribe: Dinesh K. CS684: Advanced Complexity Theory Mar 29, 22 Lecture 46 : Size lower bounds for AC circuits computing Parity Lecturer: Jayalal Sarma M.N. Scribe: Dinesh K. Theme: Circuit Complexity Lecture Plan: Proof

More information

Low degree almost Boolean functions are sparse juntas

Low degree almost Boolean functions are sparse juntas Low degree almost Boolean functions are sparse juntas Irit Dinur Yuval Filmus Prahladh Harsha November 19, 2017 Abstract Nisan and Szegedy showed that low degree Boolean functions are juntas. Kindler and

More information

Learning symmetric non-monotone submodular functions

Learning symmetric non-monotone submodular functions Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru

More information

Compute the Fourier transform on the first register to get x {0,1} n x 0.

Compute the Fourier transform on the first register to get x {0,1} n x 0. CS 94 Recursive Fourier Sampling, Simon s Algorithm /5/009 Spring 009 Lecture 3 1 Review Recall that we can write any classical circuit x f(x) as a reversible circuit R f. We can view R f as a unitary

More information

The Analysis of Partially Symmetric Functions

The Analysis of Partially Symmetric Functions The Analysis of Partially Symmetric Functions Eric Blais Based on joint work with Amit Weinstein Yuichi Yoshida Classes of simple functions Classes of simple functions Constant Classes of simple functions

More information

1 Nisan-Wigderson pseudorandom generator

1 Nisan-Wigderson pseudorandom generator CSG399: Gems of Theoretical Computer Science. Lecture 3. Jan. 6, 2009. Instructor: Emanuele Viola Scribe: Dimitrios Kanoulas Nisan-Wigderson pseudorandom generator and design constuction Nisan-Wigderson

More information

Lecture 5: Derandomization (Part II)

Lecture 5: Derandomization (Part II) CS369E: Expanders May 1, 005 Lecture 5: Derandomization (Part II) Lecturer: Prahladh Harsha Scribe: Adam Barth Today we will use expanders to derandomize the algorithm for linearity test. Before presenting

More information

Discriminative Learning can Succeed where Generative Learning Fails

Discriminative Learning can Succeed where Generative Learning Fails Discriminative Learning can Succeed where Generative Learning Fails Philip M. Long, a Rocco A. Servedio, b,,1 Hans Ulrich Simon c a Google, Mountain View, CA, USA b Columbia University, New York, New York,

More information

Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas

Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas Vitaly Feldman IBM Research - Almaden San Jose, CA, USA Email: vitaly@post.harvard.edu Jan Vondrák IBM Research - Almaden San Jose,

More information

Learning Intersections and Thresholds of Halfspaces

Learning Intersections and Thresholds of Halfspaces Learning Intersections and Thresholds of Halfspaces Adam R. Klivans Λ Department of Mathematics MIT Cambridge, MA 0139 klivans@math.mit.edu Ryan O Donnell y Department of Mathematics MIT Cambridge, MA

More information

Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions

Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions Adam R Klivans UT-Austin klivans@csutexasedu Philip M Long Google plong@googlecom April 10, 2009 Alex K Tang

More information

Learning Unions of ω(1)-dimensional Rectangles

Learning Unions of ω(1)-dimensional Rectangles Learning Unions of ω(1)-dimensional Rectangles Alp Atıcı 1 and Rocco A. Servedio Columbia University, New York, NY, USA {atici@math,rocco@cs}.columbia.edu Abstract. We consider the problem of learning

More information

THE CHOW PARAMETERS PROBLEM

THE CHOW PARAMETERS PROBLEM THE CHOW PARAMETERS PROBLEM RYAN O DONNELL AND ROCCO A. SERVEDIO Abstract. In the 2nd Annual FOCS (1961), Chao-Kong Chow proved that every Boolean threshold function is uniquely determined by its degree-0

More information

Active Testing! Liu Yang! Joint work with Nina Balcan,! Eric Blais and Avrim Blum! Liu Yang 2011

Active Testing! Liu Yang! Joint work with Nina Balcan,! Eric Blais and Avrim Blum! Liu Yang 2011 ! Active Testing! Liu Yang! Joint work with Nina Balcan,! Eric Blais and Avrim Blum! 1 Property Testing Instance space X = R n (Distri D over X)! Tested function f : X->{0,1}! A property P of Boolean fn

More information

Testing Booleanity and the Uncertainty Principle

Testing Booleanity and the Uncertainty Principle Testing Booleanity and the Uncertainty Principle Tom Gur Weizmann Institute of Science tom.gur@weizmann.ac.il Omer Tamuz Weizmann Institute of Science omer.tamuz@weizmann.ac.il Abstract Let f : { 1, 1}

More information