Characterizing the Sample Complexity of Private Learners

Size: px

Start display at page:

Download "Characterizing the Sample Complexity of Private Learners"

Bathsheba Garrett
5 years ago
Views:

1 Characterizing the Saple Coplexity of ivate Learners Aos Beiel ept of Coputer Science Ben-Gurion University Kobbi Nissi ept of Coputer Science Ben-Gurion University & Harvard University Uri Steer ept of Coputer Science Ben-Gurion University ABSTRACT In 2008, Kasiviswanathan el al defined private learning as a cobination of PAC learning and differential privacy 16 Inforally, a private learner is applied to a collection of labeled individual inforation and outputs a hypothesis while preserving the privacy of each individual Kasiviswanathan et al gave a generic construction of private learners for finite) concept classes, with saple coplexity logarithic in the size of the concept class This saple coplexity is higher than what is needed for non-private learners, hence leaving open the possibility that the saple coplexity of private learning ay be soeties significantly higher than that of non-private learning We give a cobinatorial characterization of the saple size sufficient and necessary to privately learn a class of concepts This characterization is analogous to the well known characterization of the saple coplexity of nonprivate learning in ters of the VC diension of the concept class We introduce the notion of probabilistic representation of a concept class, and our new coplexity easure Repi corresponds to the size of the sallest probabilistic representation of the concept class We show that any private learning algorith for a concept class C with saple coplexity iplies RepiC) = O), and that there exists a private learning algorith with saple coplexity = ORepiC)) We further deonstrate that a siilar characterization holds for the database size needed for privately coputing a large class of optiization probles and also for the well studied proble of private data release Categories and Subject escriptors K1 Coputers and Society: Public Policy Issues ivacy; F2 Analysis of Algoriths and oble Coplexity: Miscellaneous Research partially supported by the Israel Science Foundation grants No 938/09 and 2761/12) and by the Frankel Center for Coputer Science Perission to ake digital or hard copies of all or part of this work for personal or classroo use is granted without fee provided that copies are not ade or distributed for profit or coercial advantage and that copies bear this notice and the full citation on the first page To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific perission and/or a fee ITCS 13, January 9 12, 2013, Berkeley, California, USA Copyright 2013 ACM /13/01 $1500 General Ters Theory Keywords ifferential privacy, PAC learning, Saple coplexity, obabilistic Representation 1 INTROUCTION Motivated by the observation that learning generalizes any of the analyses applied to large collections of data, Kasiviswanathan el al 16 defined in 2008 private learning as a cobination of probably approxiately correct PAC) learning 19 and differential privacy 11 A PAC learner is given a collection of labeled exaples sapled according to an unknown probability distribution and labeled according to an unknown concept) and generalizes the labeled exaples into a hypothesis h that should predict with high accuracy the labeling of fresh exaples taken fro the sae unknown distribution and labeled with the sae unknown concept The privacy requireent is that the choice of h preserves differential privacy of saple points Intuitively this eans that this choice should not be significantly affected by any particular saple ifferential privacy is increasingly accepted as a standard for rigorous privacy and recent research has shown that differentially private variants exists to any analyses We refer the reader to surveys of work 9, 10 The saple coplexity required for learning a concept class C deterines the aount of labeled data needed for learning a concept c C It is well known that the saple coplexity of learning a concept class C non-privately) is proportional to a coplexity easure of the class C knowns as the VC-diension 20, 6, 13 Kasiviswanathan et al 16 proved that a private learner exists for every finite concept class The proof is via a generic construction that exhibits saple coplexity logarithic in the size of the concept class The VC-diension of a concept class is bounded by this quantity and significantly lower for soe interesting concept classes), and hence the results of 16 left open the possibility that the saple coplexity of private learning ay be significantly higher than that of non-private learning In analogy to the characterization of the saple coplexity of non-private) PAC learners via the VC-diension, we give a cobinatorial characterization of the saple size sufficient and necessary for private PAC learners Towards obtaining this characterization, we introduce the notion of 97

2 probabilistic representation of a concept class We note that our characterization, as the VC-diension characterization, ignores the coputation required by the learner Soe of our algoriths are, however, coputationally efficient 11 Related Work We start with a short description of prior work on the saple coplexity of private learning To siplify the exposition, we ignore dependencies on the error, confidence and privacy paraeters by considering the constants for this and the following section The dependency on these paraeters would be ade explicit in the later sections of the paper Recall that the saple coplexity of non-private learners for a class of functions C is proportional to the VC-diension of the class 6, 13 a cobinatorial easure of the class that is equal to the size of the largest set of inputs that is shattered by the class This characterization, as ours, ignores the coputation required by the learner Kasiviswanathan et al 16 showed, inforally, that every finite concept class C can be learned privately ignoring coputational coplexity) Their construction is based on the exponential echanis of McSherry and Talwar 17, and the Oln C ) bound on saple coplexity results fro the union bound arguent used in the analysis of the exponential echanis Coputationally efficient learners were shown to exist by Blu et al for all concept classes that can be efficiently learned in the statistical queries odel Kasiviswanathan et al 16 showed an exaple of a concept class the class of parity functions that is not learnable in the statistical queries odel but can be learned privately and efficiently These positive results suggest that any natural coputational learning tasks that are efficiently learned non-privately can be learned privately and efficiently Beiel et al 3 studied the saple coplexity of private learning They exained the concept class of point functions POINT d where each concept evaluates to one on exactly one point of the doain and to zero otherwise Note that the VC-diension of POINT d is one Beiel et al proved lower bounds on the saple coplexity of properly and privately learning the class POINT d and related classes), iplying that the VC diension of a class does not characterize the saple coplexity of private proper learning On the other hand, they observed that the saple coplexity can be iproved for iproper private learners whenever there exists a saller hypothesis class H that represents C in the sense that for every concept c C and for every distribution on the exaples, there is a hypothesis h H that is close to c Using the exponential echanis to choose aong the hypotheses in H instead of C, the saple coplexity is reduced to ln H this is why the size of the representation H is defined to be ln H ) For soe classes this can draatically iprove the saple coplexity, eg, for the class POINT d defined in Exaple 32), the saple coplexity is iproved fro Oln POINT d ) = Od) to Oln d) Using other techniques, Beiel et al showed that the saple coplexity of learning POINT d can be reduced even further to O1), hence showing the largest possible gap between proper and non proper private learning Such a gap does not exists for non-private learning Chaudhuri and Hsu 7 studied the saple coplexity needed for private learning infinite concept classes when the data is drawn fro a continuous distribution They showed that under these settings there exists a siple concept class for which any proper learner that uses a finite nuber of exaples and guarantees differential privacy fails to satisfy accuracy guarantee for at least one data distribution This iplies that the results of Kasiviswanathan et al 16 do not extend to infinite hypothesis classes Interestingly, our results iply an iproper private algorith for an infinite extension of the class POINT that is, a class over the natural nubers of all boolean functions that return 1 on exactly one nuber) Chaudhuri and Hsu 7 also study learning algoriths that are only required to protect the privacy of the labels and do not necessarily protect the privacy of the exaples theselves) They prove upper bounds and lower bounds on the saple coplexity of such algoriths In particular, they prove a lower bound on the saple coplexity using the doubling diension of the disagreeent etric of the hypothesis class with respect to the unlabeled data distribution This result does not iply our characterization as the privacy requireent in protecting the labels is uch weaker than protecting the saple point and the label A line of research started in 18) that is very relevant to our paper is boosting learning algoriths, that is, taking learning algoriths that have a big classification error and producing a learning algorith with sall error work et al 12 show how to privately boost accuracy, that is, given a private learning algoriths that have a big classification error, they produce a private learning algorith with sall error In Lea 31, we show how to boost the accuracy α for probabilistic representations This gives an alternative private boosting, whose proof is sipler However, as it uses the exponential echanis, it is generally) not coputationally efficient 12 Our Results Beiel et al 3 showed how to use a representation of a class to privately learn it We ake an additional step in iproving the saple coplexity by considering a probabilistic representation of a concept class C Instead of one collection H representing C, we consider a list of collections H 1,, H r such that for every c C and every distribution on the exaples, if we saple a collection H i fro the list, then with high probability there is a hypothesis h H i that is close to c To privately learn C, the learning algorith first saples i {1,, r} and then uses the exponential echanis to select a hypothesis fro H i This reduces the saple coplexity to Oax i ln H i ); the size of the probabilistic representation is hence defined to be ax i ln H i We show that for POINT d there exists a probabilistic representation of size O1) This results in a private learning algorith with saple coplexity O1), atching a different private algorith for POINT d presented in 3 Our new algorith offers soe iproveent in the saple coplexity copared to the algorith of 3 when considering the learning and privacy paraeters Furtherore, our algorith can be ade coputationally efficient without aking any coputational hardness assuptions, while the efficient version in 3 assues the existence of one-way functions Finally, it is conceptually sipler and in particular it avoids the sub-sapling technique used in 3 One can ask if there are private learning algoriths with saller saple coplexity than the size of the sallest probabilistic representation We show that the answer is no 98

3 the size of the sallest probabilistic representation is a lower bound on the saple coplexity Thus, the size of the sallest probabilistic representation of a class C, which we call the representation diension and denote by RepiC), characterizes up to constants) the saple size necessary and sufficient for privately learning the class C We also show that for concepts defined over a finite doain, the difference between the sizes of the best deterinistic and probabilistic representation is bounded Naely, that if C is a concept class over the doain {0, 1} d, then there exists a deterinistic representation of C of size ORepiC) + ln d) Thus, for classes whose sallest deterinistic representation is of size ωln d), the size of the sallest deterinistic representation characterizes the saple coplexity of private learning of the class The notion of probabilistic representation applies not only to private learning, but also to optiization probles We consider a scenario where there is a doain X, a database S of records, each taken fro the doain X, a set of solutions F, and a quality function q : X F 0, 1 that we wish to axiize If the exponential echanis is used for approxiately) solving the proble, then the size of the database should be Ωln F ) in order to achieve a reasonable approxiation Using our notions of a representation of F and of a probabilistic representation of F, one can reduce the size of the inial database without paying too uch in the quality of the solution Interestingly, a siilar notion to representation, called solution list algoriths, was considered in 2 for constructing secure protocols for search probles while leaking only a few bits on the input Curiously, their notion of leakage is very different fro that of differential privacy We give two exaples of such optiization probles First, an exaple inspired by 2: each record in the database is a clause with exactly 3 literals and we want to find an assignent satisfying at least 7/8 fraction of the clauses while protecting the privacy of the clauses A construction of 2 yields a deterinistic representation for this proble where the size of the database can be uch saller Using a probabilistic representation, we can give a good assignent even for databases of constant size This exaple is a siple instance of a scenario, where each individual has a preference on the solution and we want to choose a solution axiizing the nuber of individuals whose preference are et, while protecting the privacy of the preference Another exaple of optiization is sanitization, where given a database we want to publish a synthetic database, which gives a siilar utility as the original database while protecting the privacy of the individual records of the database Using our techniques, we study the inial database size for which sanitization gives reasonable perforance with respect to a given faily of queries Open oble We still do not know the relation between this diension and the VC diension By Sauer s Lea, if C is a concept class over {0, 1} d, then the nuber of functions in C is at ost expd VCC)) By 16, there is a private learning algorith for C whose saple size is Od VCC)), thus, the probabilistic representation diension of C is Od VCC)) We do not know if there is a class C such that RepiC) VCC) A candidate for such separation appears in 1 2 PRELIMINARIES Notation We use O γgn)) as a shorthand for Ohγ) gn)) for soe non-negative function h Given a set B of cardinality r, and a distribution P on {1, 2,, r}, we use the notation b P B to denote a rando eleent of B chosen according to P 21 eliinaries fro ivacy A database is a vector S = z 1,, z ) over a doain X, where each entry z i S represents inforation contributed by one individual atabases S 1 and S 2 are called neighboring if they differ in exactly one entry An algorith preserves differential privacy if neighboring databases induce nearby outcoe distributions Forally, efinition 21 ifferential ivacy 11) A randoized algorith A is ɛ-differentially private if for all neighboring databases S 1, S 2, and for all sets F of outputs, AS 1) F expɛ) AS 2) F 1) The probability is taken over the rando coins of A An iediate consequence of the definition is that for any two databases S 1, S 2 X, and for all sets F of outputs, AS 1) F exp ɛ) AS 2) F 22 eliinaries fro Learning Theory Let X d = {0, 1} d A concept c : X d {0, 1} is a function that labels exaples taken fro the doain X d by either 0 or 1 A concept class C over X d is a class of concepts apping X d to {0, 1} PAC learning algoriths are given exaples sapled according to an unknown probability distribution over X d, and labeled according to an unknown target concept c C The generalization error of a hypothesis h : X d {0, 1} is defined as error c, h) = hx) cx) x X d For a labeled saple S = x i, y i) i=1, the epirical error of h is error Sh) = 1 {i : hxi) yi} efinition 22 An α-good hypothesis for c and is a hypothesis h such that error c, h) α efinition 23 PAC Learning 19) Algorith A is an α, β)-pac learner for a concept class C over X d using hypothesis class H and saple size if for all concepts c C, all distributions on X d, given an input of saples S = z 1,, z ), where z i = x i, cx i)) and x i are drawn iid fro, algorith A outputs a hypothesis h H satisfying error c, h) α 1 β The probability is taken over the rando choice of the exaples in S according to and the coin tosses of the learner A efinition 2 An algorith satisfying efinition 23 with H C is called a proper PAC learner; otherwise it is called an iproper PAC learner 99

4 23 ivate Learning As a private learner is a PAC learner, its outcoe hypothesis should also be a good predictor of labels Hence, the privacy requireent fro a private learner is not that an application of the hypothesis h on a new saple pertaining to an individual) should leak no inforation about the saple efinition 25 ivate PAC Learning 16) Let A be an algorith that gets an input S = z 1,, z ) Algorith A is an α, β, ɛ)-ppac learner for a concept class C over X d using hypothesis class H and saple size if ivacy Algorith A is ɛ-differentially private as forulated in efinition 21); Utility Algorith A is an α, β)-pac learner for C using H and saple size as forulated in efinition 23) 2 The Exponential Mechanis We next describe the exponential echanis of McSherry and Talwar 17 We present its private learning variant; however, it can be used in ore general scenarios The goal here is to chooses a hypothesis h H approxiately iniizing the epirical error The choice is probabilistic, where the probability ass that is assigned to each hypothesis decreases exponentially with its epirical error Inputs: a privacy paraeter ɛ, a hypothesis class H, and labeled saples S = x i, y i) i=1 1 h H define qs, h) = {i : hx i) = y i} 2 Randoly choose h H with probability exp ɛ qs, h)/2) exp ɛ qs, f)/2) f H oposition 26 enote ê in f H {error Sf)} The probability that the exponential echanis outputs a hypothesis h such that error Sh) > ê + is at ost H exp ɛ /2) Moreover, The exponential echanis is ɛ differentially private 25 Concentration Bounds Let X 1,, X n be independent rando variables where X i = 1 = p and X i = 0 = 1 p for soe 0 < p < 1 Clearly, E i Xi = pn Chernoff and Hoeffding bounds show that the su is concentrated around this expected value: i Xi > 1 + δ)pn exp pnδ 2 /3 ) for δ > 0, i Xi < 1 δ)pn exp pnδ 2 /2 ) for 0 < δ < 1, i Xi pn > δ 2 exp 2δ 2 /n ) for δ 0 The first two inequalities are known as the ultiplicative Chernoff bounds 8, and the last inequality is known as the Hoeffding bound 15 3 THE SAMPLE COMPLEXITY OF PRI- VATE LEARNERS In this section we present a cobinatorial easure of a concept class C that characterizes the saple coplexity necessary and sufficient for privately learning C The easure is a probabilistic representation of the class C We start with the notation of deterinistic representation fro 3 efinition 31 3) A hypothesis class H is an α- representation for a class C if for every c C and every distribution on X d there exists a hypothesis h H such that error c, h) α Exaple 32 POINT d ) For j X d, define c j : X d {0, 1} as c jx) = 1 if x = j, and c jx) = 0 otherwise efine POINT d = {c j} j Xd In 3 it was shown that for α < 1/2, every α-representation for POINT d ust be of cardinality at least d, and that an α-representation H d for POINT d exists where H d = Od/α 2 ) The above representation can be used for non-private learning, by taking a big enough saple and finding a hypothesis h H d iniizing the epirical error For private learning it was shown in 3 that a saple of size O α,β,ɛ log H d ) suffices, with a learner that eploys the exponential echanis to choose a hypothesis fro H d efinition 33 For a hypothesis class H we denote sizeh) = ln H We define the eterinistic Representation iension of a concept class C as { RepiC) = in sizeh) : H 1 } -represents C Exaple 3 By the results of 3, stated in the previous exaple, RepiPOINT d ) = θlnd)) We are now ready to present the notion of a probabilistic representation The idea behind this notion is that we have a list of hypothesis classes, such that for every concept c and distribution, if we saple a hypothesis class fro the list, then with high probability it contains a hypothesis that is close to c efinition 35 Let P be a distribution over {1, 2,, r}, and let H = {H 1, H 2,, H r} be a faily of hypothesis classes every H i H is a set of boolean functions) We say that H, P) is an α, β)-probabilistic representation for a class C if for every c C and every distribution on X d : h Hi st errorc, h) α 1 β P The probability is over randoly choosing a set H i P H Exaple 36 POINT d ) In Section 7 we construct for every α and every β a pair H, P) that α, β)- probabilistically represents the class POINT d, where H contains all the sets of at ost ln1/β) boolean functions α efinition 37 Let H = {H 1, H 2,, H r} be a faily of hypothesis classes We denote H = r, and sizeh ) = ax{ ln H i : H i H } We define the Representation iension of a concept class C as RepiC) = in sizeh ) : P st H, P) is a 1, 1 )-probabilistic representation for C Exaple 38 POINT d ) The size of the probabilistic representation entioned in Exaple 36 is ln ln1/β)) α Placing α = β = 1, we see that the Representation iension of POINT d is constant 100

5 31 Equivalence of α, β)-obabilistic Representation and ivate Learning We now show that RepiC) characterizes the saple coplexity of private learners We start by showing in Lea 39 that an α, β)-probabilistic representation of C iplies a private learning algorith whose saple coplexity is the size of the representation We then show in Lea 312 that if there is a private learning algorith with saple coplexity, then there is probabilistic representation of C of size O); this lea iplies that RepiC) is a lower bound on the saple coplexity Recall that RepiC) is the size of the sallest probabilistic representation for α = β = 1/ Thus, to coplete the proof we show in Lea 31 that a probabilistic representation with α = β = 1/ iplies a probabilistic representation for arbitrary α and β Lea 39 If there a exists pair H, P) that α, β)-probabilistically represents a class C, then for every ɛ there exists an algorith A that 6α, β, ɛ)-ppac learns C 1 with a saple size = O ) sizeh ) + ln 1 )) αɛ β oof Let H, P) be an α, β)-probabilistic representation for the class C, and consider the following algorith A: Inputs: S = x i, y i) i=1, and a privacy paraeter ɛ 1 Randoly choose H i P H 2 Choose h H i using the exp echanis with ɛ By the properties of the exponential echanis, A is ɛ- differentially private We will show that with saple size 1 = O ), sizeh ) + ln 1 )) algorith A is a 6α, β)- αɛ β PAC learner for C Fix soe c C and, and define the following 3 good events: E 1 E 2 E 3 H i chosen in step 1 contains at least one hypothesis h st error Sh) 2α For every h H i st error Sh) 3α, it holds that error c, h) 6α The exponential echanis chooses an h such that error Sh) α + in f Hi {error Sf)} We first show that if those 3 good events happen, algorith A returns a 6α-good hypothesis Event E 1 ensures the existence of a hypothesis f H i st error Sf) 2α Thus, event E 1 E 3 ensures algorith A chooses using the exponential echanis) a hypothesis h H i st error Sh) 3α Event E 2 ensures therefore that this h obeys error c, h) 6α We will now show that those 3 events happen with high probability As H, P) is an α, β)-probabilistic representation for the class C, the chosen H i contains a hypothesis h st error c, h) α with probability at least 1 β; by the Chernoff bound with probability at least 1 exp α/3) this hypothesis has epirical error at ost 2α Event E 1 happens with probability at least 1 β)1 exp α/3)) > 1 β + exp α/3)), which is at least 1 2β) for 3 α ln1/β) Using the Chernoff bound, the probability that a hypothesis h st error c, h) > 6α has epirical error 3α is less than exp α3/) Using the union bound, the probability that there is such a hypothesis in H i is at ost H i exp α3/) Therefore, E 2 1 H i exp α3/) For ln H i )), this probability is at least 1 β) 3α β The exponential echanis ensures that the probability of event E 3 is at least 1 H i exp ɛα/2) see Section 2), which is at least 1 β) for 2 ln H i ) αɛ β All in all, by setting = 3 sizeh ) + ln 1 )) we ensure that the probability of A failing to output a 6α-good αɛ β hypothesis is at ost β We will deonstrate the above lea with two exaples: Exaple 310 Efficient learner for POINT d ) As described in Exaple 36, there exists an H, P) that α/6, β/)-probabilistically represents the class POINT d, where sizeh ) = O α,β,ɛ 1) By Lea 39, there exists an algorith that α, β, ɛ)-ppac learns C with saple size = O α,β,ɛ 1) The existence of an algorith with saple coplexity O1) was already proven in 3 Moreover, assuing the existence of oneway functions, their learner is efficient Our constructions yields an efficient learner, without assuptions To see this, consider again algorith A presented in the above proof, and note that as sizeh ) is constant, step 2 could be done in constant tie Step 1 can be done efficiently as we can efficiently saple a set H i P H In Clai 71 we initially construct a probabilistic representation in which the description of every hypothesis is exponential in d The representation is than revised using pairwise independence to yield a representation in which every hypothesis h has a short description, and given x the value hx) can be coputed efficiently Exaple 311 POINT N ) Consider the class POINT N, which is exactly like POINT d, only over the natural nubers By results of 7, 3, it is ipossible to properly PPAC learn the class POINT N Our construction can yield an inefficient) iproper private learner for POINT N with O α,β,ɛ 1) saples The details are deferred to Section 7 The next lea shows that a private learning algorith iplies a probabilistic representation This lea can be used to lower bound the saple coplexity of private learners Lea 312 Let α 1/ If there exists an algorith A that α, 1, ɛ)-ppac learns a concept class C with a saple size, then there exists a pair H, P) that 1/, 1/)- 2 probabilistically represents the class C such that sizeh ) = O ɛα) oof Let A be an α, 1, ɛ)-ppac learner for the class 2 C using hypothesis class F whose saple size is Without loss of generality, we can assue that 3 ln) since A α can ignore part of the saple) For a target concept c C and a distribution on X d, we define G α = {h F : error c, h) α} Fix soe c C and a distribution on X d, and define the following distribution on X d : { 1 α + α x, x = 0 d x = α x, x 0 d Note that for every x X d, x α x 2) 101

6 As A is an α, 1 )-PAC learner, it holds that 2 1 AS) G α,a 2, where the probability is over A s randoness and over sapling the exaples in S according to ) In addition, by inequality 2), every hypothesis h with error c, h) > 1/ has error strictly greater than α under : error c, h) α error c, h) > α So, every α-good hypothesis for c and is a 1 -good hypothesis for c and That is, G α G 1/ Therefore,,A AS) G 1/ 1 2 We say that a database S of labeled exaples is good if the unlabeled exaple 0 d appears in S at least 1 8α) ties Let S be a database constructed by taking iid saples fro, labeled by c By the Chernoff bound, S is good with probability at least 1 exp α/3) Hence, AS) G 1/ ) S is good) 1,A 2 exp α/3) 1 Therefore, there exists a database S good of saples that contains the unlabeled saple 0 d at least 1 8α) ties, and A AS good ) G 1/ 1, where the probability is only over the randoness of A All of the exaples in S good including the exaple 0 d ) are labeled by c For σ {0, 1}, denote by 0 σ a database containing copies of the exaple 0 d labeled as σ As A is ɛ-differentially private, and as the target concept c labels the exaple 0 d by either 0 or 1, for at least one σ {0, 1} it holds that A 0 σ) G 1/ exp 8αɛ) AS good ) G 1/ A A exp 8αɛ) 1/ 3) That is, AA 0 σ) / G 1/ 1 1 e 8αɛ Now, consider a set H containing the outcoes of ln)e 8αɛ executions of A 0 0), and the outcoes of ln)e 8αɛ executions of A 0 1) The probability that H does not contain a 1 -good hypothesis for c and is at ost 1 1 e 8αɛ ) ln)e8αɛ 1 Thus, H = { H F : H 2 ln)e 8αɛ}, and P, the distribution induced by A 0 0) and A 0 1), are a 1/, 1/)- probabilistic representation for the class C Note that the value c0 d ) is unknown, and can be either 0 or 1 Therefore the construction uses the two possible values one of the correct) It holds that sizeh ) = ax{ ln H : H H } = ln8 ln)) + 8αɛ = O ɛα) Lea 31 shows how to construct a probabilistic representation for an arbitrary α and β fro a probabilistic representation with α = β = 1/; in other words we boost α and β The proof of this lea is cobinatorial It allows us to start with a private learning algorith with constant α and β, ove to a representation, use the cobinatorial boosting, and ove back to a private algorith with sall α and β This should be contrasted with the private boosting of 12 which is algorithic and ore coplicated however, the algorith of work et al 12 is coputationally efficient) We first show how to construct a probabilistic representation for arbitrary β fro a probabilistic representation with β = 1/ Clai 313 For every concept class C and for every β, there exists a pair H, P) that 1/, β)-probabilistically represents C where sizeh ) RepiC) + ln ln1/β) oof Let β < 1/, and let H 0, P 0 ) be a 1, 1 )- probabilistic representation for C with sizeh 0 ) = RepiC) k 0 that is, for every Hi 0 H 0 it holds that Hi 0 e k 0 ) enote H 0 = {H1, 0 H2, 0, Hr}, 0 and consider the following faily of hypothesis classes: H 1 = { H 0 i 1 H 0 i ln1/β) } : 1 i 1 i ln1/β) r Note that for every Hi 1 H 1 it holds that Hi 1 ln1/β)e k 0 and so sizeh 1 ) k 1 k 0 + ln ln1/β) We will now show an appropriate distribution P 1 on H 1 st H 1, P 1 ) is a 1, β)-probabilistic representation for C To this end, consider the following process for randoly choosing an H 1 H 1 : 1 enote M = ln1/β) 2 For i = 1,, M : Randoly choose Hi 0 P0 H 0 3 Return H 1 = M i=1 H0 i The above process induces a distribution on H 1, denoted as P 1 As H 0 is a 1, 1 )-probabilistic representation for C, we have that h H 1 st error c, h) 1/ = P 1 M = h H 0 i st error c, h) 1/ P i=1 0 ) M 1 β Lea 31 For every concept class C, every α, and every β, there exists H, P) that α, β)-probabilistically represents C where sizeh ) = O ln 1 ) RepiC)+ln ln ln 1 α α )+ln ln 1 β ))) oof Let C be a concept class, and let H 1, P 1 ) be a 1, β/t )-probabilistic representation for C where T will be set later) By Clai 313, such a representation exists with sizeh 1 ) k 1 RepiC) + ln lnt/β) We use H 1 and P 1 to create an α, β)- probabilistic representation for C We begin with two notations: 1 For T hypotheses h 1,, h T we denote by aj h1,,h T the ajority hypothesis That is, aj h1,,h T x) = 1 if and only if {h i : h ix) = 1} T/2 2 For T hypothesis classes { H 1,, H T we denote } MAJH 1,, H T ) = aj h1,,ht : 1 i T h i H i Consider the following faily of hypothesis classes: { } H = MAJH i1,, H it ) : H i1,, H it H 1 102

7 Moreover, denote the distribution on H induced by the following rando process as P: For j = 1,, T : Randoly choose H ij P 1 H 1 Return MAJH i1,, H it ) Next we show that H, P) is an α, β)-probabilistic representation for C: For a fixed pair of a target concept c and a distribution, randoly choose H i1,, H it P 1 H 1 We now show that with probability at least 1 β) the set MAJH i1,, H it ) contains at least one α-good hypothesis for c, To this end, denote 1 = and consider the following thought experient, inspired by the Adaboost Algorith of 1: For t = 1, T : 1 Fail if H it does not contain a 1 -good hypothesis for c, t 2 enote by h t H it a 1 -good hypothesis for c, t { 2tx), if h tx) cx) 3 t+1x) = ) 1 error t c,h t ) 1 error t c,h t ) tx), otherwise Note that as 1 is a probability distribution on X d ; the sae is true for 2, 3,, T As H 1, P 1 ) is a 1, β/t )- probabilistic representation for C, the failure probability of every iteration is at ost β/t Thus using the union bound), with probability at least 1 β) the whole thought experient will succeed, and in this case we show that the error of h fin = aj h1,,h T is at ost α Consider the set R = {x : h fin x) cx)} X d This is the set of points on which at least T/2 of h 1,, h T err Next consider the partition of R to the following sets: R t = { x R : h tx) cx) ) i>t h ix) = cx) )} That is, R t contains the points x R on which h t is last to err Clearly tr t) 1/, as R t is a subset of the set of points on which h t errs Moreover, ) t T /2 tr t) 1R t) 2 T /2 1 error tc, h t) so, Finally, 1R t) 2 T /2 1 1/ 1 1/ 1R t) 2 T /2 1 1/ = R t) R t) tr t) ) T /2, 3 1 error t c, h t) 1 1/ ) T /2 1 3 error c, h fin ) = R) = T 2 1 ) T /2 = T 3 8 T t=t /2 3 ) t T /2 ) T /2 ) T /2 3 R t) ) T /2 Choosing T = 1 ln 2 α ), we get that errorc, h fin) α Hence, H, P) is an α, β)-probabilistic representation for C Moreover, for every H i H we have that H i e k 1 ) T, and so sizeh ) k 1 T RepiC) + ln lnt/β) ) T = O ln 1 α ) RepiC) + ln ln ln 1 α ) + ln ln 1 β ))) The next theore states the ain result of this section Repi characterizes the saple coplexity of private learning Theore 315 Let C be a concept class Θ ) RepiC) β αɛ saples are necessary and sufficient for the private learning of the class C oof Fix soe α 1/, β 1/2, and ɛ By Lea 31, there exists a pair H, P) that α, β )-represent class C, 6 where sizeh ) = O ln1/α) RepiC) + ln ln ln1/α) + ln ln1/β) )) Therefore, by Lea 39, there exists an algorith A that α, β, ɛ)-ppac learns the class C with a saple size 1 = O β αɛ ln 1 α ) RepiC) + ln ln ln 1 )) α ) For the lower bound, let A be an α, β, ɛ)-ppac learner for the class C with a saple size, where α 1/ and β 1/2 By Lea 312, there exists an H, P) that 1, 1 )- probabilistically represents the class C and sizeh ) = ln8)+ ln ln) + 8αɛ Therefore, by definition, RepiC) ln8 ln)) + 8αɛ Thus, 1 8αɛ RepiC) ln8 ln)) ) = Ω RepiC) αɛ ) FROM A PROBABILISTIC REPRESEN- TATION TO A ETERMINISTIC REPRE- SENTATION In this section we will establish a connection between the probabilistic) representation diension of a class and its deterinistic representation diension Observation 1 Let H, P) be an α, β)-probabilistic representation for a concept class C Then, B = H i H Hi is an α-representation of C oof As H, P) is an α, β)-probabilistic representation for C, for every c and every h Hi st errorc, h) α 1 β > 0 P The probability is over choosing a set H i P H In particular, for every c and every there exists an H i H that contains an α-good hypothesis The siple construction in Observation 1 ay result in a very large deterinistic representation For exaple, in Clai 71 we show an H, P) that α, β)- probabilistically represents the class POINT d, where H contains all the sets 103

8 of at ost α ln 1 β ) boolean functions While H i H Hi = 2 X d is indeed an α-representation for POINT d, it is extreely over-sized We will show that it is not necessary to take the union of all the H i s in H in order to get an α-representation for C As H, P) is an α, β)-probabilistic representation, for every c and every, with probability at least 1 β a randoly chosen H i P H contains an α-good hypothesis The straight forward strategy here is to first boost β as in Clai 313, and then use the union bound over all possible c C and over all possible distributions on X d Unfortunately, there are infinitely any such distributions, and the proof will be soewhat ore coplicated efinition 2 Let H = {H 1, H 2,, H r} be a faily of hypothesis classes, and P be a distribution over {1,, r} We will denote the following non private algorith as LearnerH, P,, γ): Input: a saple S = x i, y i) i=1 1 Randoly choose H i P H 2 If for every h H i error Sh) > γ, then fail 3 Return h H i iniizing error Sh) We will say that LearnerH, P,, γ) is β-successful for a class C over X d, if for every c C and every distribution on X d, given an input saple drawn iid according to and labeled by c, algorith Learner fails with probability at ost β Clai 3 If H, P) is an α, β)-probabilistic representation for a class C, then, for 3 ln1/β), algorith α LearnerH, P,, 2α) is 2β-successful for C oof We will show that with probability at least 1 2β, the set H i chosen in Step 1) contains at least one hypothesis h st error Sh) 2α As H, P) is an α, β)- probabilistic representation for class C, the chosen H i will contain a hypothesis h st error c, h) α with probability at least 1 β; by the Chernoff bound with probability at least 1 exp α/3) this hypothesis has epirical error at ost 2α The set H i contains a hypothesis h st error Sh) 2α with probability at least 1 β)1 exp α/3)) > 1 β + exp α/3)), which is at least 1 2β) for 3 ln1/β) α Clai Let H be a faily of hypothesis classes, and P a distribution on it Let γ, β and be such that sizeh ) + ln 1 )) If LearnerH, P,, γ) is β-successful γ β for a class C over X d, then there exists Ĥ H and a distribution P on it, st LearnerĤ, P,, γ) is a 2γ, 3β)- PAC learner for C and Ĥ = d β 2 oof For every input S = x i, y i) i=1, denote by p S the probability of LearnerH, P,, γ) failing on step 2 the probability is only over the choice of H i P H in the first step) As LearnerH, P,, γ) is β-successful, P, LearnerH, P,, γ) fails = S S ps β Consider the following process, denoted by oc, for randoly choosing a ultiset H of size t t will be set later): For i = 1,, t : Randoly choose H i P H Return H = H 1, H 2,, H t) enote by U t the unifor distribution on {1, 2,, t} As before, for every input S = x i, y i) i=1, denote by p S the probability of Learner H, U t,, γ) failing on its second step again, the probability is only over the choice of H i Ut H in the first step) Using those notations: Learner H, Ut,, γ) fails = U t, S S ps Fix a saple S As the choice of H i Ut H is unifor, { H i H : h H i error Sh) > γ} p S = H Using the Hoeffding bound, P roc p S p S β 2e 2tβ2 The probability is over choosing the ultiset H There are at ost 2 d+1) saples of size as every entry in the saple is an eleent of X d, concatenated with a label bit) Using the union bound over all possible saples S, S st p S p S β 2 d+1) 2 e 2tβ2 P roc For t d the above probability is strictly less than 1 This β 2 eans that for t = d there exists a ultiset Ĥ such that β 2 p S p S β for every saple S We will show that for this Ĥ, LearnerĤ, Ut,, γ) is a 2γ, 3β)-PAC learner Fix a target concept c C and a distribution on X d efine the following two good events: E 1 LearnerĤ, Ut,, γ) outputs a hypothesis h such that error Sh) γ E 2 For every h H i st error Sh) γ, it holds that error c, h) 2γ Note that if those two events happen, LearnerĤ, Ut,, γ) returns a 2γ-good hypothesis for c and We will show that those two events happen with high probability We start by bounding the failure probability of LearnerĤ, Ut,, γ) Learner Ĥ, U t,, γ) fails U t, = S S S ps S ps + β) = LearnerH, P,, γ) fails + β 2β P, When LearnerĤ, Ut,, γ) does not fail, it returns a hypothesis h with epirical error at ost γ Thus, E 1 1 2β Using the Chernoff bound, the probability that a hypothesis h with error c, h) > 2γ has epirical error γ is less 10

9 than exp γ/) Using the union bound, the probability that there is such a hypothesis in H i is at ost H i exp γ/) Therefore, E 2 1 H i exp γ/) For ln H i ), this probability is at least 1 β) γ β All in all, the probability of LearnerH, P,, γ) failing to output a 2γ-good hypothesis is at ost 3β Theore 5 If there exists a pair H, P) that α, β)- probabilistically represents a class C over X d where H ight be very big), then there exists a pair Ĥ, P) that α, 6β)-probabilistically represents C, where Ĥ H, and Ĥ = 3d sizeh ) + ln 1β ) αβ ) 2 oof Let H, P) be an α, β)-probabilistic representation for a class C Set = 3 sizeh )+ln 1 )) By Clai α β 3, LearnerH, P,, 2α) is 2β-successful for class C By Clai, there exists an Ĥ H and a distribution P on it, such that LearnerĤ, P,, 2α) is a α, 6β)-PAC learner for C and Ĥ = d = 3d sizeh ) + ln 1 )) β 2 αβ 2 β Assue towards contradiction that Ĥ, P) does not α, 6β)-represent C So, there exist a concept c C and a distribution st, with probability strictly greater than 6β, a randoly chosen H i P Ĥ does not contain a αgood hypothesis for c, Therefore, for those c and, LearnerĤ, P,, 2α) will fail to return a α-good hypothesis with probability strictly greater than 6β Theore 6 For every class C over X d there exists a 1 - representation B such that sizeb) = Olnd) +RepiC)) oof By Lea 31, there exists a pair H, P) that 1, 1 )-probabilistically represents C such that sizeh ) = ORepiC)) Using Theore 5, there exists a pair Ĥ, P) that 1, 1 )-probabilistically represents C, such that 2 sizeĥ ) = sizeh ) and Ĥ = O d sizeh )) We can now use Observation 1 and construct the set B = H i Ĥ Hi which is a 1 -representation for the class C In addition, B = O Ĥ e sizeh )) = O d sizeh ) e sizeh )) Thus, sizeb) = ln B = O lnd) + RepiC)) Corollary 7 For every concept class C over X d, RepiC) = Olnd) + RepiC)) Corollary 8 There exists a constant N st for every concept class C over X d where RepiC) N logd), the saple coplexity that is necessary and sufficient for privately learning C is Θ α,β RepiC)) 5 PROBABILISTIC REPRESENTATION FOR PRIVATELY SOLVING OPTIMIZA- TION PROBLEMS The notion of probabilistic representation applies not only to private learning, but also to a broader task of optiization probles We consider the following scenario: efinition 51 An optiization proble OPT over a universe X and a set of solutions F is defined by a quality function q : X F 0, 1 Given a database S, the task is to choose a solution f F such that qs, f) is axiized Notation We will refer to the optiization proble defined by a quality function q as OPT q efinition 52 An α-good solution for a database S is a solution s such that qs, s) ax f F {qs, f)} α Given an optiization proble OPT q, one can use the exponential echanis to choose a solution s F In general, this ethod achieves a reasonable solution only for databases of size Ωlog F /ɛ) To see this, consider a case where there exists a database S of records such that exactly one solution t F has a quality of qs, t) = 1, and every other f F has a quality of qs, f) = 1/2 The probability of the exponential echanis choosing t is: Unless t is chosen = expɛ/2) F 1) expɛ/) + expɛ/2) ln F 1) = Ω 1 ln F ), ) ɛ ɛ the above probability is strictly less than 1/2 Using our notations of probabilistic representation, it ight be possible to reduce the necessary database size Consider using the exponential echanis for choosing a solution s, not out of F, but rather fro a saller set of solutions B Roughly speaking, the factor of ln F in requireent ) will now be replaced with ln B, which corresponds to size of the representation Therefore, the database size should be at least ln B /ɛ So needs to be bigger than the size of the representation by at least a factor of 1/ɛ In the following analysis we will denote this required gap, ie, / ln B, as We will see that the existence of a private approxiation algorith iplies a probabilistic representation with 1 < 1, and that a probabilistic representation with > 1 iplies a private approxiation algo- ɛ rith Bigger corresponds to better privacy; however, it ight be harder to achieve efinition 53 Let OPT q be an optiization proble over a universe X and a set of solutions F Let B be a set of solutions, and denote sizeb) = ln B We say that B is an α-deterinistic representation of OPT q for databases of eleents if for every S X there exists a solution s B such that qs, s) ax f F {qs, f)} α efinition 5 Let B be an α-deterinistic representation of OPT q for databases of eleents enote If > 1, then we say that the ratio of B is sizeb) An α-deterinistic representation B with ratio is required to support all the databases of = sizeb) eleents That is, for every S X, the set B is required to contain at least one α-good solution Fix S X Intuitively, controls the ratio between and nuber of bits needed to represent an α-good solution for S As B contains an α-good solution for S, and assuing B is publicly known, this solution could be represented with ln B = sizeb) = / bits 105

10 efinition 55 Let OPT q be an optiization proble over a universe X and a set of solutions F Let P be a distribution over {1, 2,, r}, and let B = {B 1, B 2,, B r} be a faily of solution sets for OPT q We denote sizeb) = ax{ ln B i : B i B } We say that B, P) is an α, β)- probabilistic representation of OPT q for databases of eleents if for every S X : s B i st qs, s) ax{qs, f)} α 1 β P f F efinition 56 Let B, P) be an α, β)-probabilistic representation of OPT q for databases of eleents enote If > 1, then we say that the ratio of the sizeb) representation is efinition 57 An optiization proble OPT q is bounded if S 1 qs 1, f) S 2 qs 2, f) 1 for every solution f and every two neighboring databases S 1, S 2 We are interested in approxiating bounded optiization probles, while guaranteeing differential privacy: efinition 58 Let OPT q be a bounded optiization proble over a universe X and a set of solutions F An algorith A is an α, β, ɛ)-private approxiation algorith for OPT q with a database of records if: 1 Algorith A is ɛ-differentially private as forulated in efinition 21); 2 For every S X, algorith A outputs with probability at least 1 β) a solution s such that qs, s) ax f F {qs, f)} α Exaple 59 Sanitization) Consider a class of predicates C over X A database S contains points taken fro X A predicate query Q c for c C is defined as Q cs) = 1 {xi S : cxi) = 1} Blu et al 5 defined S a sanitizer or data release echanis) as a differentially private algorith that, on input a database S, outputs another database Ŝ with entries taken fro X A sanitizer A is α, β)-useful for predicates in the class C if for every database S it holds that c C QcS) Q cŝ) α 1 β A This scenario can be viewed as a bounded optiization proble: The solutions are sanitized databases For an input database S and and a sanitized database Ŝ, the quality function is } qs, Ŝ) = 1 ax { Q cs) Q cŝ) c C To see that this optiization proble is bounded, note that for every two neighboring databases S 1, S 2 of eleents, and every c C it holds that Q cs 1) Q cs 2) 1 Therefore, for every sanitized database f, qs 1, f) qs 2, f) = ax { QcS1) Qcf) } ax{ QcS2) Qcf) } 1 c C c C The next two leas establish an equivalence between a private approxiation algorith and a probabilistic representation for a bounded optiization proble Lea 510 Let OPT q be a bounded optiization proble over a universe X If there exists a pair B, P) that α, β)-probabilistically represents OPT q for databases of eleents, st the ratio of B, P) is > 1, then for every ˆα, ˆβ, ɛ satisfying ) 2 ln1/ ˆβ) 1 +, ɛˆα sizeb) there exists an α + ˆα), β + ˆβ), ɛ ) -approxiation algorith for OPT q with a database of size oof Consider the following algorith A: Inputs: a database S X, and a privacy paraeter ɛ 1 Randoly choose B i P B 2 Choose s B i using the exponential echanis, that is, with probability expɛ qs, s)/2) f B i expɛ qs, f)/2) By the properties of the exponential echanis, A is ɛ- differentially private Fix a database S X, and define the following 2 bad events: E 1 The set B i chosen in step 1 does not contain a solution s st qs, s) ax f F {qs, f)} α E 2 The solution s chosen in step 2 is such that qs, s) < ax t Bi qs, t) ˆα Note that if those two bad events do not occur, algorith A outputs a solution s such that qs, s) ax f F {qs, f)} α ˆα As B, P) is an α, β)-probabilistic representation of OPT q for databases of size, event E 1 happens with probability at ost β By the properties of the exponential echanis, the probability of event E 2 is bounded by B i exp ɛˆα/2) As = sizeb), this probability is at ost E 2 sizeb) exp ɛˆα/2) = sizeb) exp ɛ sizeb) ˆα/2) ) ) ln1/ ˆβ) sizeb) exp 1 + sizeb) sizeb) = sizeb) exp sizeb) ln1/ ˆβ)) = ˆβ Therefore, algorith A outputs an α + ˆα)-good solution with probability at least 1 β ˆβ) Lea 511 Let OPT q be an optiization proble If there exists an α, β, ɛ)-private approxiation algorith for OPT q with a database of records, then for every ˆβ satisfying ln 1 ) + ln ln 1ˆβ ) + ɛ > 1, 1 β there exists a pair B, P) that α, ˆβ)-probabilistically represents OPT q for databases of eleents, where the ratio of the representation is 106

11 oof Let A be an α, β, ɛ)-private approxiation algorith for OPT q, with a saple size Fix an arbitrary input database S X efine G as the set of all solutions s, possibly outputted by A, such that qs, s) ax f F {qs, f)} α As A is an α, β, ɛ)-approxiation algorith, A AS) G 1 β As A is ɛ-differentially private, A A 0) G 1 β)e ɛ, where 0 is a database with zeros That is, A A 0) / G 1 1 β)e ɛ Now, consider a set B containing the outcoes of Γ 1 ln 1ˆβ )e ɛ executions of A 0) The probability that 1 β B does not contain a solutions s G is at ost 1 1 β)e ɛ ) Γ ˆβ Thus, B = {B supporta) : B Γ}, and P, the distribution induced by A 0), are an α, ˆβ)- probabilistic representation of OPT q for databases with eleents Moreover, the ratio of the representation is sizeb) = = ax{ ln B : B B } ln 1 ) + ln ln 1ˆβ ) + ɛ = 1 β 51 Exact 3SAT Consider the following bounded optiization proble, denoted as OPT E3SAT: The universe X is the set of all possible clauses with exactly 3 different literals over n variables, and the set of solutions F is the set of all possible 2 n assignents Given a database S = σ 1, σ 2,, σ ) containing E3CNF clauses, the quality of an assignent a F is qs, a) = {i : aσi) = 1} Aiing at the very different) objective of secure protocols for search probles, Beiel et al 2 defined the notation of solution-list algoriths, which corresponds to our notation of deterinistic representation We next rephrase their results using our notations R1 For every α > 0 and every > 1, there exists a set B that α + 1/8)-deterinistically represents OPT E3SAT for databases of size = O ln lnn) + ln1/α) ) ), and a ratio of R2 Let α < 1/2 and > 1 For every set B that α- deterinistically represents OPT E3SAT for databases of size with a ratio of, it holds that = Ω ln lnn) ) Using R1) and a deterinistic version of Lea 510, for every α, β, ɛ > 0, there exists an 1/8 + α), β, ɛ ) - approxiation algorith for OPT E3SAT with a database of = O α,β,ɛ ln lnn)) clauses By R2), this is the best possible using a deterinistic representation We can reduce the necessary database size, using a probabilistic representation Fix a clause with three different literals If we pick an assignent at rando, then with probability at least 7/8 it satisfies the clause Now, fix any exact 3CNF forula If we pick an assignent at rando, then the expected fraction of satisfied clauses is at least 7/8 Moreover, for every 0 < α < 7/8, the fraction of satisfied α clauses is at least 7/8 α) with probability at least So, if we pick t = ln1/β) lnα+1/8)+ln1/α) α+1/8 rando assignents, the probability that none of the will satisfy at least 7/8 α) ) t α clauses is at ost α+1/8 = β So, for every > 1, B = {B : B is a set of at ost t assignents}, and P, the distribution induced on B by randoly picking t assignents, are an 1/8 + α), β ) -probabilistic representation of OPT E3SAT for databases of size lnt) and a ratio of By Lea 511, for every ɛ there exists an 1/8 + α), β, ɛ ) -approxiation algorith for OPT E3SAT with a database of = o α,β,ɛ 1) clauses 6 EXTENSIONS 61 ɛ, δ)-ifferential ivacy The notation of ɛ-differential privacy was generalized to ɛ, δ)-differential privacy, where the requireent in inequality 1) is changed to AS 1) F expɛ) AS 2) F + δ The proof of Lea 312 reains valid even if algorith A is only ɛ, δ)-differential private for δ 1 8 e 8αɛ 1 e ɛ ) 5) To see this, note that inequality 3) changes to A 0) G A ) ) ) A e ɛ δ e ɛ δ e ɛ δ 1 8α 1 ) e 8αɛ δ e iɛ i=0 1 ) 1 e 8αɛ δ 1 e ɛ 1 8 e 8αɛ The rest of the proof reains alost intact only inor changes in the constants) With that in ind, we see that the lower bound showed in Theore 315 for ɛ-differentially private that is, with δ = 0) learners also applies for ɛ, δ)- differentially private learners satisfying inequality 5) That is, every such learner for a class C ust use Ω RepiC) αɛ saples When using ɛ, δ)-differential privacy, δ should be negligible in the security paraeter, that is, in d the representation length of eleents in X d Therefore, using ɛ, δ)- differential privacy instead of ɛ-differential privacy cannot reduce the saple coplexity for PPAC learning a concept class C whenever RepiC) = O logd)) 62 obabilistic Representation Using a Hypothesis Class We will now consider a generalization of our representation notations that can be useful when considering PPAC learners that use a specific hypothesis class In particular, those notation can be useful when considering proper-ppac learners, that is, a learner that learns a class C using a hypothesis class B C efinition 61 We define the α-eterinistic Representation iension of a concept class C using a hypothesis ) 107

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges