arxiv: v2 [stat.ml] 23 Feb 2016

Size: px
Start display at page:

Download "arxiv: v2 [stat.ml] 23 Feb 2016"

Transcription

1 Perutational Radeacher Coplexity A New Coplexity Measure for Transductive Learning Ilya Tolstikhin 1, Nikita Zhivotovskiy 3, and Gilles Blanchard 4 arxiv: v stat.ml 3 Feb Max-Planck-Institute for Intelligent Systes, Tübingen, Gerany ilya@tuebingen.pg.de Moscow Institute of Physics and Technology, Moscow, Russia 3 Institute for Inforation Transission Probles, Moscow, Russia nikita.zhivotovskiy@phystech.edu 4 Departent of Matheatics, Universität Potsda, Potsda, Gerany gilles.blanchard@ath.uni-potsda.de Abstract. Transductive learning considers situations when a learner observes labelled training points and u unlabelled test points with the final goal of giving correct answers for the test points. This paper introduces a new coplexity easure for transductive learning called Perutational Radeacher Coplexity (PRC) and studies its properties. A novel syetrization inequality is proved, which shows that PRC provides a tighter control over expected rea of epirical processes copared to what happens in the standard i.i.d. setting. A nuber of coparison results are also provided, which show the relation between PRC and other popular coplexity easures used in statistical learning theory, including Radeacher coplexity and Transductive Radeacher Coplexity (TRC). We argue that PRC is a ore suitable coplexity easure for transductive learning. Finally, these results are cobined with a standard concentration arguent to provide novel data-dependent risk bounds for transductive learning. Keywords: Transductive Learning, Radeacher Coplexity, Statistical Learning Theory, pirical Processes, Concentration Inequalities 1 Introduction Radeacher coplexities (14, ) play an iportant role in the widely used concentration-based approach to statistical learning theory 4, which is closely related to the analysis of epirical processes 1. They easure a coplexity of function classes and provide data-dependent risk bounds in the standard i.i.d. fraework of inductive learning, thanks to syetrization and concentration inequalities. Recently, a nuber of attepts were ade to apply this achinery also to the transductive learning setting. In particular, the authors of 10 introduced a notion of transductive Radeacher coplexity and provided an extensive study of its properties, as well as general transductive risk bounds based on this new coplexity easure.

2 Ilya Tolstikhin, Nikita Zhivotovskiy, and Gilles Blanchard In the transductive learning, a learner observes labelled training points and u unlabelled test points. The goal is to give correct answers on the test points. Transductive learning naturally appears in any odern large-scale applications, including text ining, recoender systes, and coputer vision, where often the objects to be classified are available beforehand. There are two different settings of transductive learning, defined by V. Vapnik in his book, Chap. 8. The first one assues that all the objects fro the training and test sets are generated i.i.d. fro an unknown distribution P. The second one is distribution free, and it assues that the training and test sets are realized by a unifor and rando partition of a fixed and finite general population of cardinality N := +u into two disjoint subsets of cardinalities and u; oreover, no astions are ade regarding the underlying source of this general population. The second setting has gained uch attention 5 (, 9, 7, 10, 8, and 0), probably due to the fact that any upper risk bound for this setting directly iplies a risk bound also for the first setting, Theore 8.1. In essence, the second setting studies unifor deviations of risks coputed on two disjoint finite saples. Following Vapnik s discussion in 6, p. 458, we would also like to ephasize that the second setting of transductive learning naturally appears as a iddle step in proofs of the standard inductive risk bounds, as a result of syetrization or the so-called double-saple trick. This way better transductive risk bounds also translate into better inductive ones. An iportant difference between the two settings discussed above lies in the fact that the eleents of the training set in the second setting are interdependent, because they are sapled uniforly without replaceent fro the general population. As a result, the standard techniques developed for inductive learning, including concentration and Radeacher coplexities entioned in the beginning, can not be applied in this setting, since they are heavily based on the i.i.d. astion. Therefore, it is iportant to study epirical processes in the setting of sapling without replaceent. Previous work. A large step in this direction was ade in 10, where the authors presented a version of McDiarid s bounded difference inequality 5 for sapling without replaceent together with the Transductive Radeacher Coplexity (TRC). As a ain application the authors derived an upper bound on the binary test error of a transductive learning algorith in ters of TRC. However, the analysis of 10 has a nuber of shortcoings. Most iportantly, TRC depends on the unknown labels of the test set. In order to obtain coputable risk bounds, the authors resorted to the contraction inequality 15, which is known to be a loose step 17, since it destroys any dependence on the labels. Another line of work was presented in 0, where variants of Talagrand s concentration inequality were derived for the setting of sapling without replaceent. These inequalities were then applied to achieve transductive risk bounds with fast rates of convergence o( 1/ ), following a localized approach 1. In contrast, in this work we consider only the worst-case analysis based on the 5 For the extensive overview of transductive risk bounds we refer the reader to 18.

3 Perutational Radeacher Coplexity 3 global coplexity easures. An analysis under additional astions on the proble at hand, including Maen-Tsybakov type low noise conditions 4, is an interesting open question and left for future work. Suary of our results. This paper continues the analysis of epirical processes indexed by arbitrary classes of uniforly bounded functions in the setting of sapling without replaceent, initiated by 10. We introduce a new coplexity easure called perutational Radeacher coplexity (PRC) and argue that it captures the nature of this setting very well. Due to space liitations we present the analysis of PRC only for the special case when the training and test sets have the sae size = u, which is nonetheless sufficiently illustrative 6. We prove a novel syetrization inequality (Theore ), which shows that the expected PRC and the expected rea of epirical processes when sapling without replaceent are equivalent up to ultiplicative constants. Quite rearkably, the new upper and lower bounds (the latter is often called desyetrization inequality) both hold without any additive ters when = u, in contrast to the standard i.i.d. setting, where an additive ter of order O( 1/ ) is unavoidable in the lower bound. For TRC even the upper syetrization inequality 10, Lea 4 includes an additive ter of the order O( 1/ ) and no desyetrization inequality is known. This suggests that PRC ay be a ore suitable coplexity easure for transductive learning. We would also like to note that the proof of our new syetrization inequality is surprisingly siple, copared to the one presented in 10. Next we copare PRC with other popular coplexity easures used in statistical learning theory. In particular, we provide achievable upper and lower bounds, relating PRC to the conditional Radeacher coplexity (Theore 3). These bounds show that the PRC is upper and lowerbounded by the conditional Radeacher coplexity up to additive ters of orders o( 1/ ) and O( 1/ ) respectively, which are achievable (Lea 1). In addition to this, Theore 3 also significantly iproves bounds on the coplexity easure called axiu discrepancy presented in, Lea 3. We also provide a coparison between expected PRC and TRC (Corollary 1), which shows that their values are close up to sall ultiplicative constants and additive ters of order O( 1/ ). Finally, we apply these results to obtain a new coputable data-dependent risk bound for transductive learning based on the PRC(Theore 5), which holds for any bounded loss functions. We conclude by discussing the advantages of the new risk bound over the previously best known one of 10. Notations We will use calligraphic sybols to denote sets, with subscripts indicating their cardinalities: card(z ) =. For any function f we will denote its average value coputed on a finite set S by f(s). In what follows we will consider an arbitrary space Z (for instance, a space of input-output pairs) and class F of functions 6 All the results presented in this paper are also available for the general u case, but we defer the to a future extended version of this paper.

4 4 Ilya Tolstikhin, Nikita Zhivotovskiy, and Gilles Blanchard (for instance, loss functions) apping Z to R. Most of the proofs are deferred to the last section for iproved readability. Arguably, one of the ost popular coplexity easures used in statistical learning theory is the Radeacher coplexity (15, 14, ): Definition 1 (ConditionalRadeacher coplexity). Fix any subsetz = {Z 1,...,Z } Z. The following rando quantity is coonly known as a conditional Radeacher coplexity: ˆR (F,Z ) = ǫ ǫ i f(z i ) where ǫ = {ǫ i } are i.i.d. Radeacher signs, taking values ±1 with probabilities 1/. When the set Z is clear fro the context we will siply write ˆR (F). As discussed in the introduction, Radeacher coplexities play an iportant role in the analysis of epirical processes and statistical learning theory. However, this easure of coplexity was devised ainly for the i.i.d. setting, which is different fro our setting of sapling without replaceent. The following coplexity easure was introduced in 10 to overcoe this issue: Definition (Transductive Radeacher coplexity). Fix any set Z N = {Z 1,...,Z N } Z, positive integers,u such that N = +u, and p 0, 1. The following quantity is called Transductive Radeacher coplexity (TRC): ( 1 ˆR +u(f,z td N,p) = + 1 N σ i f(z i ), u )σ where σ = {σ 1 } +u are i.i.d. rando variables taking values ±1 with probabilities p and 0 with probability 1 p. We suarize the iportance of these two coplexity easures in the analysis of epirical processes when sapling without replaceent in the following result:, Theore 1. Fix an N-eleent subset Z N Z and let < N eleents of Z be sapled uniforly without replaceent fro Z N. Also let eleents of X be sapled uniforly with replaceent fro Z N. Denote Z u := Z N \ Z with u := card(z u ) = N. The following upper bound in ters of the i.i.d. Radeacher coplexity was provided in 0: ( f(zu ) f(z ) ) N ˆR Z u (F,X ). (1) X The following bound in ters of TRC was provided in 10. Assue that functions in F are uniforly bounded by B. Then for p 0 := u N and c 0 < 5.05: Z ( f(zu ) f(z ) ) ˆR +u(f,z td N,p 0 )+c 0 B N in(,u). () u

5 Perutational Radeacher Coplexity 5 While (1) did not explicitly appear in 0, it can be iediately derived using 0, Corollary 8 and i.i.d. syetrization of 13, Theore.1. Finally, we introduce our new coplexity easure: Definition 3 (Perutational Radeacher coplexity). Let Z Z be any fixed set of cardinality. For any n {1,..., 1} the following quantity will be called a perutational Radeacher coplexity (PRC): ˆQ,n (F,Z ) = ( f(zk ) f(z n ) ), Zn where Z n is a rando subset of Z containing n eleents sapled uniforly without replaceent and Z k := Z \ Z n. When the set Z is clear fro the context we will siply write ˆQ,n (F). The nae PRC is explained by the fact that if is even then the definitions of ˆQ,/ (F) and ˆR (F) are very siilar. Indeed, the only difference is that the expectation in the PRC is over the randoly peruted sequence containing equal nuber of 1 and +1, whereas in Radeacher coplexity the average is w.r.t. all the possible sequences of signs. The ter perutation coplexity has already appeared in 16, where it was used to denote a novel coplexity easure for a odel selection. However, this easure was specific to the i.i.d. setting and binary loss. Moreover, the bounds presented in 16 were of the sae order as the risk bounds based on the Radeacher coplexity with worse constants in the slack ter. 3 Syetrization and Coparison Results We start with showing a version of the i.i.d. syetrization inequality (references can be found in 15, 13) for the setting of sapling without replaceent. It shows that the expected reu of epirical processes in this setting is up to ultiplicative constants equivalent to the expected PRC. Theore. Fix an N-eleent subset Z N Z and let < N eleents of Z be sapled uniforly without replaceent fro Z N. Denote Z u := Z N \Z with u := card(z u ) = N. If = u and is even then for any n {1,..., 1}: 1 ˆQ,/ (F,Z ) ( f(zu ) f(z ) ) ˆQ,n (F,Z ). Z Z Z The inequalities also hold if we include absolute values inside the rea. Proof. The proof can be found in Sect This inequality should be copared to the previously known coplexity bounds of Theore 1. First of all, in contrast to (1) and () the new bound provides a two sided control, which shows that PRC is a correct coplexity easure for our setting. It is also rearkable that the lower bound (coonly known as

6 6 Ilya Tolstikhin, Nikita Zhivotovskiy, and Gilles Blanchard the desyetrization inequality) does not include any additive ters, since in the standard i.i.d. setting the lower bound holds only up to an additive ter of order O( 1/ ) 13, Sect..1. Also note that this result does not assue the boundedness of functions in F, which is a necessary astions both in () and in the i.i.d. desyetrization inequality. Next we copare PRC with the conditional Radeacher coplexity: Theore 3. Let Z Z be any fixed set of even cardinality. Then: ( ) ˆQ,/ (F,Z ) 1+ ˆR (F,Z ). (3) π Moreover, if the functions in F are absolutely bounded by B then ˆQ,/ (F,Z ) ˆR (F,Z ) B. (4) The results also hold if we include absolute values inside rea in ˆQ,n, ˆR. Proof. Conceptually the proof is based on the coupling between a sequence {ǫ i } of i.i.d. Radeacher signs and a unifor rando perutation {η i} of a set containing / plus and / inus signs. This idea was inspired by the techniques used in 11. The detailed proof can be found in Sect. 5.. Note that a typical order of ˆR (F) is O( 1/ ), thus the ultiplicative upper bound (3) can be uch tighter than the upper bound of (4). We would also like to note that Theore 3 significantly iproves bounds of Lea 3 in, which relate the so-called axial discrepancy easure of the class F to its Radeacher coplexity (for the further discussion we refer to Appendix). Our next result shows that bounds of Theore 3 are essentially tight. Lea 1. Let Z Z with even. There are two finite classes F of functions apping Z to R and absolutely bounded by 1, such that: and F ˆQ,/ (F,Z ) = 0, () 1/ ˆR (F,Z ) 1/ ; (5) ˆQ,/ (F,Z ) = 1, 1 Proof. The proof can be found in Sect π ˆR (F,Z ) π. (6) Inequalities (5) siultaneously show that (a) the order O( 1/ ) of the additive bound (4) can not be iproved, and (b) the ultiplicative upper bound (3) can not be reversed. Moreover, it can be shown using (6) that the factor appearing in (3) can not be iproved to 1+o( 1/ ). Finally, we copare PRC to the transductive Radeacher coplexity: Lea. Fix any set Z N = {Z 1,...,Z N } Z. If = u and N = +u: ˆR N (F,Z N ) ˆR td +u (F,Z N,1/4) ˆR N (F,Z N ).

7 Perutational Radeacher Coplexity 7 Proof. The upper bound was presented in 10, Lea 1. For the lower bound, notice that if p = 1/4 the i.i.d. signs σ i presented in Definition have the sae distributionasǫ i η i,whereǫ i arei.i.d.radeachersignsandη i arei.i.d.bernoulli rando variables with paraeters 1/. Thus, Jensen s inequality gives: ˆR td +u (F,Z N,1/4) = 4 N (ǫ,η) +u ǫ i η i f(z i ) 4 N ǫ +u ǫ i 1 f(z i) Together with Theores and 3 this result shows that when = u the PRC can not be uch larger than transductive Radeacher coplexity: Corollary 1. Using notations of Theore, we have: ( ) 4 ˆQ,/ (F,Z ) + ˆR +u td (F,Z N,1/4). Z πn If functions in F are uniforly bounded by B then we also have a lower bound: ˆQ,/ (F,Z ) 1 Z ˆR +u(f,z td N,1/4)+ B. N Proof. Siply notice that Z ( f(zu ) f(z ) ) = ˆQ N, (F,Z N ).. 4 Transductive Risk Bounds Next we will use the results of Sect. 3 to obtain a new transductive risk bound. First we will shortly describe the setting. We will consider the second, distribution-free setting of transductive learning described in the introduction. Fix any finite general population of input-output pairsz N = {(x i,y i )} N X Y,whereX andy arearbitraryinput andoutput spaces. We ake no astions regardingunderlying sourceofz N. The learner receives the labeled training set Z consisting of < N eleents sapled uniforly without replaceent fro Z N. The reaining test set Z u := Z N \Z is presented to the learner without labels (we will use X u to denote the inputs of Z u ). The goal of the learner is to find a predictor in the fixed hypothesis class H based on the training saple Z and unlabelled test points X u, which has a sall test risk easured using bounded loss function l: Y Y 0,1. For h H and (x,y) Z N denote l h (x,y) = l ( h(x),y ) and also denote the loss class L H = {l h : h H}. Then the test and training risks of h H are defined as err u (h) := l h (Z u ) and err (h) := l h (Z ) respectively. Following risk bound in ters of TRC was presented in 10, Corollary : Theore 4 (10). If = u then with probability at least 1 δ over the rando training set Z any h H satisfies: err u (h) err (h)+ ˆR +u td (L H,Z N,1/4)+11 N + N log(1/δ) (N 1/). (7)

8 8 Ilya Tolstikhin, Nikita Zhivotovskiy, and Gilles Blanchard Using results of Sect. 3 we obtain the following risk bound: Theore 5. If = u and n {1,..., 1} then with probability at least 1 δ over the rando training set Z any h H satisfies: N log(1/δ) err u (h) err (h)+ ˆQ,n (L H,Z ) + S (N 1/). (8) Moreover, with probability at least 1 δ any h H satisfies: err u (h) err (h)+ ˆQ,n (L H,Z )+ Proof. The proof can be found in Sect N log(/δ) (N 1/). (9) We conclude by coparing risk bounds of Theores 5 and 4: 1. First of all, the upper bound of (9) is coputable. This bound is based on the concentration arguent, which shows that the expected PRC (appearing in (8)) can be nicely estiated using the training set. Meanwhile, the upper bound of (7) depends on the unknown labels of the test set through TRC. In order to ake it coputable the authors of 10 resorted to the contraction inequality, which allows to drop any dependence on the labels for Lipschitz losses, which is known to be a loose step 17.. Moreover, we would like to note that for binary loss function TRC (as well as the Radeacher coplexity) does not depend on the labels at all. Indeed, this can be shown by writing l 01 (y,y ) = (1 yy )/ for y,y { 1,+1} and noting that σ i and σ i y are identically distributed for σ i used in Definition. This is not true for PRC, which is sensitive to the labels even in this setting. As a future work we hope to use this fact for analysis in the low noise setting The slack ter appearing in (8) is significantly saller than the one of (7). For instance, if δ = 0.01 then the latter is 13 ties larger. This is caused by the additive ter in syetrization inequality (). At the sae tie, Corollary 1 shows that the coplexity ter appearing in (8) is at ost two ties larger than TRC, appearing in (7). 4. Coparison result of Theore 3 shows that the upper bound of (9) is also tighter than the one which can be obtained using(1) and conditional Radeacher coplexity. 5. Siilar upper bounds (up to extra factor of ) also hold for the excess risk err u (h ) inf h H err u (h), where h iniizes the training risk err over H. This can be proved using a siilar arguent to Theore Finally, one ore application of the concentration arguent can siplify the coputation of PRC, by estiating the expected value appearing in Definition 3 with only one rando partition of Z.

9 Perutational Radeacher Coplexity 9 5 Full Proofs 5.1 Proof of Theore Lea 3. For 0 < N let S := {s 1,...,s } be sapled uniforly without replaceent fro a finite set of real nubers C = {c 1,...,c N } R. Then: 1 s i = ( 1 S N ) 1 z = 1 N ( ) N 1 S C z S ( ) c N i = 1 N c i. 1 N Proof (of Theore ). Fix any positive integers n and k such that n+k =, which iplies n < and k < = u. Note that Lea 3 iplies: f(z u ) = Sk f(sk ), f(z ) = Sn f(sn ), where S k and S n are sapled uniforly without replaceent fro Z u and Z respectively. Using Jensen s inequality we get: ( f(zu ) f(z ) ) ( = f(sk ) f(sn ) ) Z Z S k Sn ( f(sk ) f(s n ) ). (10) (Z,S k,s n) The arginal distribution of (S k,s n ), appearing in (10), can be equivalently describedbyfirstsaplingz froz N,thenS n froz (bothtiesuniforly without replaceent), and setting S k := Z \S n (recall that n+k = ). Thus (Z,S k,s n) ( f(sk ) f(s n ) ) = Z S n ( f(z \S n ) f(s n ) ) Z which copletes the proof of the upper bound. We have shown that for n {1,..., 1} and k := n: ˆQ,n (F,Z ) = ( f(zk ) f(z n ) ), (11) Z (Z k,z n) where Z n and Z k are sapled uniforly without replaceent fro Z N and Z N \ Z n respectively. Let Z n be sapled uniforly without replaceent fro Z N \(Z n Z k ) and let Z u k be the reaining u k eleents of Z N. Using Lea 3 once again we get: f(z n ) (Zn,Z k ) = f(zu k ) (Zn,Z k ). We can rewrite the r.h.s.of (11) as: ( f(zk ) f(z n )+ f(zu k ) f(z n ) (Zn,Z k ) ) (Z n,z k ) ( f(zk ) f(z n )+ f(z u k ) f(z n ) ),,

10 10 Ilya Tolstikhin, Nikita Zhivotovskiy, and Gilles Blanchard where we have used Jensen s inequality. If we take n = k = / we get ( ˆQ,/ (F,Z ) f(zk Z u k ) f(z n Z n ) ). Z It is left to notice that the rando subsets Z k Z u k and Z n Z n have the sae distributions as Z u and Z. 5. Proof of Theore 3 Let = n, ǫ = {ǫ i } be i.i.d.radeacher signs, and η = {η i} be a unifor rando perutation of a set containing n plus and n inus signs. The proof of Theore 3 is based on the coupling of rando variables ǫ and η, which is described in Lea 4. We will need a nuber of definitions. Consider binary cube B := { 1,+1}. Denote S := {v B : v i = 0}, which is a set of all the vectors in B having equal nuber of plus and inus signs. For any v B denote v 1 = v i and consider the following set: T(v) = arg in v S v v 1, which consists of the points in S closest to v in Haing etric. For any v B let t(v) be a rando eleent of T(v), distributed uniforly. We will use t i (v) to denote i-th coordinate of the vector t(v). Reark 1. If v S then T(v) = {v}. Otherwise, T(v) will clearly contain ore than one eleent of S. Naely, it can be shown, that if for soe positive integer q it holds that v i = q, then q is necessarily even and T(v) consists of all the vectors in S which can be obtained by replacing q/ of +1 signs in v with 1 signs, and thus in this case card ( T(v) ) = ( (+q)/) q/. Lea 4 (Coupling). Assue that = n. Then the rando sequence t(ǫ) has the sae distribution as η. Proof. Note that the port of t(ǫ) is equal to S. Fro syetry it is easy to conclude that the distribution of t(ǫ) is exchangable. This eans that it is invariant under perutations and as a consequence unifor on S. Next result is in the core of the ultiplicative upper bound (3). Lea 5. Assue that = n. For any q {1,...,} the following holds: ( ( )) ( ǫ q t(ǫ) = 1 t q (ǫ) 1 (π) 1/) t q (ǫ). n Proof. We will first upper bound P{ǫ q t q (ǫ) t(ǫ) = e}, where e = {e i } is (w.l.o.g.) a sequence of n plus signs followed by a sequence of n inus signs. P{ǫ q t q (ǫ) t(ǫ) = e} = P{ǫ q t q (ǫ) t(ǫ) = e} P{t(ǫ) = e} ( ) = P{ǫ q t q (ǫ) t(ǫ) = e ǫ = s}, (1) n s

11 Perutational Radeacher Coplexity 11 where we have used Lea 4 and the su is over all different sequences of signs s = {s i }. For any s denote S(s) = n j=1 s j and consider ters in (1) corresponding to s with S(s) = 0, S(s) > 0, and S(s) < 0: Case 1: S(s) = 0. These ters will be zero, since t(s) = s. Case :S(s) > 0.Thiseansthats hasoreplussignsthanitshould and accordingto Reark1the apping t( ) will replaceseveralof +1 with -1.In particular, if s q = 1 then t q (s) = s q and thus the corresponding ters will be zero. If s q = 1 and in the sae tie e q = 1 the event {ǫ q t q (ǫ) t(ǫ) = e} also can not hold. Moreover, note that identity e = t(s) can hold only if e T(s), which necessarily leads to { j {1,...,}: sj = 1 } { j {1,...,}: e j = 1 }. (13) Fro this we conclude that if q {1,...,n} then all the ters corresponding to s with S(s) > 0 are zero. We will use U q (e) to denote the subset of B consisting of sequences s, such that (a) S(s) > 0, (b) s q = 1, and (c) condition (13) holds. It can be seen that if s U q (e) then: ( ) 1 n+s(s)/ P{ǫ q t q (ǫ) t(ǫ) = e ǫ = s} =. S(s)/ Thisholdssince,accordingtoReark1,t(ǫ)cantakeexactly ( ) n+s(s)/ S(s)/ different values, while only one of the is equal to e. Let us copute the cardinality of U q (e) for q {n+1,...,}. It is easy to check that condition S(s) = j for soe positive integer j iplies that s has exactly n j inus signs. Considering the fact that s q = 1 for s U q (e) we have: card ( U q (e) ) ( ) n 1 =. n j Cobining everything together we have: s: S(s)>0 P{ǫ q t q (ǫ) t(ǫ) = e ǫ = s} = ½{q > n} Finally, it is easy to show using induction that: ( n n 1 ) n j ) = 1. j=1 ( n+j j ( n n 1 n j ( n+j j=1 j Case 3: S(s) < 0. We can repeat all the steps of the previous case and get: s: S(s)<0 P{ǫ q t q (ǫ) t(ǫ) = e ǫ = s} = 1 ½{q n}. Accounting for these three cases in (1) we conclude that P{ǫ q t q (ǫ) t(ǫ) = e} = 1 ( ) 1, n π ) ).

12 1 Ilya Tolstikhin, Nikita Zhivotovskiy, and Gilles Blanchard where we have used the upper bound on the binoial coefficient fro 19, Corollary.4. We can conclude the proof of lea by writing: ( ǫ q t(ǫ) = t q (ǫ)(1 P{ǫ q t q (ǫ) t(ǫ)}) t q (ǫ) 1 (π) 1/). Proof (of Theore 3). First we prove (3). Let Z = {z 1,...,z }. We can write: ˆQ,n (F) = t i (ǫ)f(z i ) ( 1 (π) 1/) 1 ( 1+ ) π ǫ i t(ǫ)f(z i ) (14) (15) ǫ i f(z i ), (16) where we have used coupling Lea 4 in (14), Lea 5 in (15), and Jensen s inequality in (16). This copletes the proof of (3). Next we prove (4). We have: ˆQ,n (F) ˆR (F) = η η i f(z i ) ǫ ǫ i f(z i ). Using Lea 4 and Jensen s inequality we further get: ˆQ,n (F) ˆR (F) = t i (ǫ)f(z i ) ǫ t ǫ ǫ ǫ i f(z i ) t i (ǫ)f(z i ) ǫ i f(z i ) ǫ t ǫ, (17) where we have, perhaps isleadingly, denoted the conditional expectation with respect to the unifor choice fro T(ǫ) given ǫ using t ǫ. Next we have: t i (ǫ)f(z i ) ǫ i f(z i ) 4 i S(ǫ,t) ǫ i f(z i ), (18) where S(ǫ,t) {1,...,} is a subset of indices, s.t. ( t(ǫ) ) ǫ i i iff i S(ǫ,t). We can continue by writing t i (ǫ)f(z i ) ǫ i f(z i ) 4 f(z i ). (19) i S(ǫ,t)

13 Perutational Radeacher Coplexity 13 Note that since functions in F are absolutely bounded by B: f(z i ) B card(s(ǫ,t)). i S(ǫ,t) Returning to (17) and using Reark 1 we obtain: ˆQ,n (F) ˆR (F) 4B 1 card(s(ǫ,t)) ǫ = ǫ ǫ i. ǫ t Khinchin s inequality 15, Lea 4.1 together with the best known constant due to 1 gives ǫ ǫ i, which copletes the proof of (4). 5.3 Proof of Lea 5 Proof. Let Z = {z 1,...,z }. Take F to be a set of two constant functions, f 1 (z) = 1 and f (z) = 0 for all z Z. Clearly, ˆQ,n (F ) = 0. In the sae tie: { } ǫ i f(z i ) = ǫ ax 0, ǫ i ǫ i, ǫ ǫ where we used Khinchin s inequality. Finally, Khinchin s inequality also gives: { } ax 0, ǫ i = 1 ǫ ǫ i 1. ǫ Next, let F contain ( /) functions, such that their projections on Z recover all the perutations of binary vector containing equal nuber of 0 and 1. Clearly, ) = 1. Straightforward calculations show that in the sae tie ˆR (F ) = 1 ( n) and we conclude the proof using upper and lower bounds on the binoial coefficient fro 19, Corollary.4. in this case ˆQ,n (F 5.4 Proof of Theore 5 The following version of McDiarid s bounded difference inequality for the setting of sapling without replaceent was presented in10, Lea and further iproved in 8, Theore 5: Theore 6 (10, 8). Let Z be sapled uniforly without replaceent fro a fixed set Z +u Z of +u eleents. Let g: Z R be a syetric function s.t. for all i = 1,..., and for all z 1,...,z Z and z 1,...,z Z, g(z 1,...,z ) g(z 1,...,z i 1,z i,z i+1,...,z ) c. (0) Then if = u with probability not less than 1 δ the following holds: c g g+ N 3 log(1/δ) 8(N 1/).

14 14 Ilya Tolstikhin, Nikita Zhivotovskiy, and Gilles Blanchard Note that function h H (err h (Z u ) err h (Z )) aps (X Y) to R and is of course syetric. Straightforward calculations show that this function satisfies bounded difference condition (0) with c = u (10, Inequality9). Theore6 states that with probability not less than 1 δ: N log(1/δ) (err u (h) err (h)) (err u (h) err (h)) + h H S h H (N 1/). (1) Using upper bound of Theore with L H in place of F we coplete the proof of (8). Next, consider a syetric function ˆQ,n (L H,Z ) which also aps (X Y) to R. It can be shown again that it satisfies bounded difference condition (0) with c =. And thus, Theore 6 gives that with probability not less than 1 δ: ˆQ,n (L H,Z ) ˆQ N log(1/δ),n (L H,Z )+ S (N 1/). () Using this inequality together with (8) in a union bound we obtain the second inequality of the theore. Appendix: Iproving Lea 3 of Let µ be a probability distribution on Z and X := {X 1,...,X } be i.i.d. saples selected according to µ. Maxial discrepancy of F was defined in as: ˆD (F,X ) = / f(x i ) f(x i ). i=/+1 It was shown in that if functions in F are uniforly bounded by 1 then: 1 ˆR (F,X ) ˆD (F,X ) ˆR (F,X ) +4. (3) Since eleents in X are i.i.d. the distribution of ˆD is invariant under their perutations and thus ˆD (F,X ) = ˆQ,/ (F,X ). Now we can use Theore 3 to significantly iprove bounds in (3): ˆR (F,X ) ( ˆD (F,X ) 1+ )ˆR (F,X ). π Acknowledgents The authors are thankful to Marius Kloft and Ruth Urner for useful discussions and to the anonyous reviewers for their coents. GB aknowledges port of the DFG through the FOR-1735 grant. NZ was ported solely by the Russian Science Foundation grant (project ).

15 Perutational Radeacher Coplexity 15 References 1. Bartlett, P., Bousquet, O., Mendelson, S.: Local radeacher coplexities. The Annals of Statistics, 33(4), (005). Bartlett, P., Mendelson, S.: Radeacher and Gaussian coplexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, (001) 3. Blu, A., Langford, J.: PAC-MDL Bounds. In: COLT 003, pp (003) 4. Boucheron, S., Lugosi, G., Bousquet, O.: Theory of classification: a survey of recent advances. SAIM: Probability and Statistics, 9, (005) 5. Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasyptotic Theory of Independence. Oxford University Press (013) 6. Chapelle, O., Schölkopf, B., Zien, A.: Sei-Supervised Learning. MIT Press (006) 7. Cortes, C., Mohri, M.: On transductive regression. In: NIPS 006, (007) 8. Cortes, C., Mohri, M., Pechyony, D., Rastogi, A.: Stability analysis and learning bounds for transductive regression algoriths. CoRR abs/ (009) 9. Derbeko, P., l-yaniv, R., Meir, R.: xplicit learning curves for transduction and application to clustering and copression algoriths. Journal of Artificial Intelligence Research, (1), (004) 10. l-yaniv, R., Pechyony, D.: Transductive radeacher coplexity and its applications. Journal of Artificial Intelligence Research, 35(1), (009) 11. Gross, D., Nese, V.: Note on sapling without replacing fro a finite collection of atrices. (010) 1. Haagerup, U.: The best constants in Khinchine inequality. Studia Matheatica, 70(3), (1981) 13. Koltchinskii, V.: Oracle inequalities in epirical risk iniization and sparse recovery probles. Springer (011) 14. Koltchinskii, V., Panchenko, D.: Radeacher processes and bounding the risk of function learning. In: Gine. D.., Wellner, J. (eds.) High Diensional Probability, II, pp Birkhauser (1999) 15. Ledoux, M., Talagrand, M.: Probability in Banach Space. Springer-Verlag (1991) 16. Magdon-Isail, M.: Perutation coplexity bound on out-saple error. In: Advances in Neural Inforation Processing Systes(NIPS 010), pp (010) 17. Mendelson, S.: Learning without Concentration. CoRR abs/ (014) 18. Pechyony, D.: Theory and Practice of Transductive Learning. PhD thesis (008) 19. Stanica, P.: Good lower and upper bounds on binoial coefficients. Journal of Inequalities in Pure and Applied Matheatics, (3) (001) 0. Tolstikhin, I., Blanchard, G., Kloft, M.: Localized coplexities for transductive learning. In: COLT 014, pp (014) 1. Van der Vaart, A. W., Wellner, J.: Weak Convergence and pirical Processes: With Applications to Statistics. Springer (000). Vapnik, V.: Statistical Learning Theory. John Wiley & Sons (1998)

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Symmetrization and Rademacher Averages

Symmetrization and Rademacher Averages Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

1 Proving the Fundamental Theorem of Statistical Learning

1 Proving the Fundamental Theorem of Statistical Learning THEORETICAL MACHINE LEARNING COS 5 LECTURE #7 APRIL 5, 6 LECTURER: ELAD HAZAN NAME: FERMI MA ANDDANIEL SUO oving te Fundaental Teore of Statistical Learning In tis section, we prove te following: Teore.

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Stability Bounds for Non-i.i.d. Processes

Stability Bounds for Non-i.i.d. Processes tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer

More information

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40 On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Curious Bounds for Floor Function Sums

Curious Bounds for Floor Function Sums 1 47 6 11 Journal of Integer Sequences, Vol. 1 (018), Article 18.1.8 Curious Bounds for Floor Function Sus Thotsaporn Thanatipanonda and Elaine Wong 1 Science Division Mahidol University International

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

arxiv: v3 [cs.lg] 7 Jan 2016

arxiv: v3 [cs.lg] 7 Jan 2016 Efficient and Parsionious Agnostic Active Learning Tzu-Kuo Huang Alekh Agarwal Daniel J. Hsu tkhuang@icrosoft.co alekha@icrosoft.co djhsu@cs.colubia.edu John Langford Robert E. Schapire jcl@icrosoft.co

More information

A Bernstein-Markov Theorem for Normed Spaces

A Bernstein-Markov Theorem for Normed Spaces A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

arxiv: v4 [cs.lg] 4 Apr 2016

arxiv: v4 [cs.lg] 4 Apr 2016 e-publication 3 3-5 Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions arxiv:35796v4 cslg 4 Apr 6 Corinna Cortes Google Research, 76 Ninth Avenue, New York, NY Spencer

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Metric Entropy of Convex Hulls

Metric Entropy of Convex Hulls Metric Entropy of Convex Hulls Fuchang Gao University of Idaho Abstract Let T be a precopact subset of a Hilbert space. The etric entropy of the convex hull of T is estiated in ters of the etric entropy

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

PREPRINT 2006:17. Inequalities of the Brunn-Minkowski Type for Gaussian Measures CHRISTER BORELL

PREPRINT 2006:17. Inequalities of the Brunn-Minkowski Type for Gaussian Measures CHRISTER BORELL PREPRINT 2006:7 Inequalities of the Brunn-Minkowski Type for Gaussian Measures CHRISTER BORELL Departent of Matheatical Sciences Division of Matheatics CHALMERS UNIVERSITY OF TECHNOLOGY GÖTEBORG UNIVERSITY

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

Prediction by random-walk perturbation

Prediction by random-walk perturbation Prediction by rando-walk perturbation Luc Devroye School of Coputer Science McGill University Gábor Lugosi ICREA and Departent of Econoics Universitat Popeu Fabra lucdevroye@gail.co gabor.lugosi@gail.co

More information

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Solutions of some selected problems of Homework 4

Solutions of some selected problems of Homework 4 Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de

More information

VC Dimension and Sauer s Lemma

VC Dimension and Sauer s Lemma CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions

More information

Sub-Gaussian estimators of the mean of a random vector

Sub-Gaussian estimators of the mean of a random vector Sub-Gaussian estiators of the ean of a rando vector arxiv:702.00482v [ath.st] Feb 207 GáborLugosi ShaharMendelson February 3, 207 Abstract WestudytheprobleofestiatingtheeanofarandovectorX givena saple

More information

Exact tensor completion with sum-of-squares

Exact tensor completion with sum-of-squares Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David

More information

Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions

Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions Corinna Cortes Spencer Greenberg Mehryar Mohri January, 9 Abstract We present an extensive analysis of relative deviation

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Tail estimates for norms of sums of log-concave random vectors

Tail estimates for norms of sums of log-concave random vectors Tail estiates for nors of sus of log-concave rando vectors Rados law Adaczak Rafa l Lata la Alexander E. Litvak Alain Pajor Nicole Toczak-Jaegerann Abstract We establish new tail estiates for order statistics

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Sha M Kakade Microsoft Research and Wharton, U Penn skakade@icrosoftco Varun Kanade SEAS, Harvard University vkanade@fasharvardedu

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe PROPERTIES OF MULTIVARIATE HOMOGENEOUS ORTHOGONAL POLYNOMIALS Brahi Benouahane y Annie Cuyt? Keywords Abstract It is well-known that the denoinators of Pade approxiants can be considered as orthogonal

More information

Shannon Sampling II. Connections to Learning Theory

Shannon Sampling II. Connections to Learning Theory Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,

More information

Physics 215 Winter The Density Matrix

Physics 215 Winter The Density Matrix Physics 215 Winter 2018 The Density Matrix The quantu space of states is a Hilbert space H. Any state vector ψ H is a pure state. Since any linear cobination of eleents of H are also an eleent of H, it

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

A Theoretical Framework for Deep Transfer Learning

A Theoretical Framework for Deep Transfer Learning A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il

More information

A Simple Homotopy Algorithm for Compressive Sensing

A Simple Homotopy Algorithm for Compressive Sensing A Siple Hootopy Algorith for Copressive Sensing Lijun Zhang Tianbao Yang Rong Jin Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China Departent of Coputer

More information

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010 A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING By Eanuel J Candès Yaniv Plan Technical Report No 200-0 Noveber 200 Departent of Statistics STANFORD UNIVERSITY Stanford, California 94305-4065

More information

Cosine similarity and the Borda rule

Cosine similarity and the Borda rule Cosine siilarity and the Borda rule Yoko Kawada Abstract Cosine siilarity is a coonly used siilarity easure in coputer science. We propose a voting rule based on cosine siilarity, naely, the cosine siilarity

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Deflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007

Deflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007 Deflation of the I-O Series 1959-2. Soe Technical Aspects Giorgio Rapa University of Genoa g.rapa@unige.it April 27 1. Introduction The nuber of sectors is 42 for the period 1965-2 and 38 for the initial

More information

Necessity of low effective dimension

Necessity of low effective dimension Necessity of low effective diension Art B. Owen Stanford University October 2002, Orig: July 2002 Abstract Practitioners have long noticed that quasi-monte Carlo ethods work very well on functions that

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes Journal of Machine Learning Research (200) 789-84 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences

More information

Note on generating all subsets of a finite set with disjoint unions

Note on generating all subsets of a finite set with disjoint unions Note on generating all subsets of a finite set with disjoint unions David Ellis e-ail: dce27@ca.ac.uk Subitted: Dec 2, 2008; Accepted: May 12, 2009; Published: May 20, 2009 Matheatics Subject Classification:

More information

Reed-Muller codes for random erasures and errors

Reed-Muller codes for random erasures and errors Reed-Muller codes for rando erasures and errors Eanuel Abbe Air Shpilka Avi Wigderson Abstract This paper studies the paraeters for which Reed-Muller (RM) codes over GF (2) can correct rando erasures and

More information

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions Tight Inforation-Theoretic Lower Bounds for Welfare Maxiization in Cobinatorial Auctions Vahab Mirrokni Jan Vondrák Theory Group, Microsoft Dept of Matheatics Research Princeton University Redond, WA 9805

More information

Finite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields

Finite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields Finite fields I talked in class about the field with two eleents F 2 = {, } and we ve used it in various eaples and hoework probles. In these notes I will introduce ore finite fields F p = {,,...,p } for

More information

Robust polynomial regression up to the information theoretic limit

Robust polynomial regression up to the information theoretic limit 58th Annual IEEE Syposiu on Foundations of Coputer Science Robust polynoial regression up to the inforation theoretic liit Daniel Kane Departent of Coputer Science and Engineering / Departent of Matheatics

More information

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison yströ Method vs : A Theoretical and Epirical Coparison Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou Machine Learning Lab, GE Global Research, San Raon, CA 94583 Michigan State University,

More information

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada

More information