Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Size: px
Start display at page:

Download "Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes"

Transcription

1 Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute and Google Research, 25 Mercer street, New York, NY, 002 Uar Syed Google Research, 76 Ninth Avenue, New York, NY, 00 VITALY@CIMS.NYU.DU MOHRI@CS.NYU.DU USYD@GOOGL.COM Abstract This paper presents iproved Radeacher coplexity argin bounds that scale linearly with the nuber of classes as opposed to the quadratic dependence of existing Radeacher coplexity argin-based learning guarantees. We further use this result to prove a novel generalization bound for ulti-class classifier ensebles that depends only on the Radeacher coplexity of the hypothesis classes to which the classifiers in the enseble belong.. Introduction Multi-class classification is one of the central probles in achine learning. Given a saple S = {x, y ),..., x, y )} drawn i.i.d. fro soe unknown distribution D over X {,..., c}, the objective of the learner consists of finding a hypothesis h that adits a sall expected loss, h being selected out of soe hypothesis class H. The expected loss is given by X,Y ) D LhX), Y ), where L is a loss function, typically chosen to be the zero-one loss defined by Ly, y) = y y. A coon approach to ulti-class classification consists of learning a scoring function f : X Y R that assigns a score fx, y) to pair ade of an input point x X and a candidate label y. The label predicted for x is the one with the highest score: hx) = argax fx, y). y Y Proceedings of the 32 nd International Conference on Machine Learning, Lille, France, 205. JMLR: W&CP volue 37. Copyright 205 by the authors). The difference between the score of the correct label and that of the runner-up is the argin achieved for that exaple. The fraction of saple points with argin less than a specified constant is the epirical argin loss of h. These quantities play a critical role in an algorithagnostic analysis of generalization in the ulti-class setting based on data-dependent coplexity easures such as Radeacher coplexity. In particular, Koltchinskii & Panchenko, 2002) showed that with high probability, uniforly over hypothesis set, Rh) R S, h) 2c2 R Π G)) O ), where Rh) is the generalization error of hypothesis h, R S, h) its epirical argin loss, and R Π G)) the Radeacher coplexity of the faily of loss functions Π G) associated to H, which is defined precisely below. This bound is pessiistic and suggests that learning with an extreely large nuber of classes ay not be possible. Indeed, it is well known that for certain classes of coonly used hypotheses, including linear and kernel-based ones, R Π G)) O ). Therefore, for learning to occur we will need to be on the order of at least c 4ɛ /, for soe ɛ > 0. In soe odern achine learning tasks such as speech recognition and iage classification, c is often greater than 0 4. The bound above suggests that even for extreely favorable argin values of the order 0 3, a saple required for learning has to be in the order of at least 0 3. However, epirical results in speech recognition and iage categorization suggest that it is possible to learn with uch fewer saples. This result is also pessiistic in ters of coputational coplexity since storing and processing 0 3 saple points ay not be feasible. In this paper, we show that this bound can be iproved to scale linearly with the nuber of classes.

2 We further consider convex ensebles of classification odels. nseble ethods are general techniques in achine learning for cobining several ulti-class classification hypothesis to further iprove accuracy. Learning a linear cobination of base classifiers, or a classifier enseble, is one of the oldest and ost powerful ideas in achine learning. Boosting Freund & Schapire, 997) also known as forward stagewise additive odeling Friedan et al., 998) is a widely used eta-algorith for enseble learning. In the boosting approach, the enseble s isclassification error is replaced by a convex upper bound, called the surrogate loss. The algorith greedily iniizes the surrogate loss by augenting the enseble with a classifier or adjusting the weight of a classifier already in the enseble) at each of iteration. One of the ain advantages of boosting is that, because it is a stagewise procedure, one can efficiently learn a classifier enseble in which each classifier belongs to a large and potentially infinite) base hypothesis class, provided that one has an efficient algorith for learning good base classifiers. For exaple, decision trees are coonly used as the base hypothesis class. In contrast, generalization bounds for classifier ensebles tend to increase with the coplexity of the base hypothesis class Schapire et al., 997), and indeed boosting has been observed to overfit in practice Grove & Schuurans, 998; Schapire, 999; Dietterich, 2000; Rätsch et al., 200b). One way to address overfitting in a boosted enseble is to regularize the weights of the classifiers. Standard regularization penalizes all the weights in the enseble equally Rätsch et al., 200a; Duchi & Singer, 2009), but in soe cases it sees they should be penalized unequally. For exaple, in an enseble of decision trees, deeper decision trees should have a larger regularization penalty than shallower ones. Based on this idea, we present a novel generalization guarantee for ulti-class classifier ensebles that depends only on the Radeacher coplexity of the hypothesis classes to which the classifiers in the enseble belong. Cortes et al., 204) developed this idea in an algorith called DeepBoost, a boosting algorith where the decision in each iteration of which classifier to add to the enseble, and the weight assigned to that classifier, depends in part on the coplexity of the hypothesis class to which it belongs. One interpretation of DeepBoost is that it applies the principle of structural risk iniization to each iteration of boosting. Kuznetsov et al., 204) extended these ideas to the ulti-class setting. The rest of this paper is organized as follows. In Section 2 we present and prove our iproved Radeacher coplexity argin bounds that scale linearly with the nuber of classes. In Section 3 we use this result to prove a novel generalization bound for ulti-class classifier ensebles that depends only on the Radeacher coplexity of the hypothesis classes to which the classifiers in the enseble belong. We conclude with soe final rearks in Section Multi-class argin bounds In this section, we present our iproved data-dependent learning bound in the ulti-class setting. Let X denote the input space. We denote by Y = {,..., c} a set of c classes, which, for convenience, we index by an integer in, c. The label associated by a hypothesis f : X Y R to x X is given by argax y Y fx, y). The argin f x, y) of the function f for a labeled exaple x, y) X Y is defined by f x, y) = fx, y) ax y y fx, y ). ) Thus, f isclassifies x, y) iff f x, y) 0. We assue that training and test points are drawn i.i.d. according to soe distribution D over X Y and denote by S = x, y ),..., x, y )) a training saple of size drawn according to D. For any > 0, the generalization error Rf), its -argin error R f) and its epirical argin error are defined as follows: Rf) = R f) = R S, f) = x,y) D f x,y) 0, x,y) D f x,y), x,y) S f x,y), where the notation x, y) S indicates that x, y) is drawn according to the epirical distribution defined by S. For any faily of hypotheses G apping X Y to R, we define Π G) by Π G) = {x hx, y): y Y, h G}. 2) The following result due to Koltchinskii & Panchenko, 2002) is a well known argin bound for the ulti-class setting. Theore. Let G be a faily of hypotheses apping X Y to R, with Y = {,..., c}. Fix > 0. Then, for any δ > 0, with probability at least δ > 0, the following bound holds for all g G: Rg) R S, g) 2c2 R Π G)) log δ where Π G) = {x, y) gx, y) : y Y, g G}. As discussed in the introduction, the bound of Theore is pessiistic and suggests that learning with an extreely large nuber of classes ay not be possible. The following theore presents our argin learning guarantees for

3 ulti-class classification with a large nuber of classes that scales linearly with the nuber of classes, as opposed to the quadratic dependency of Theore. Theore 2. Let G be a faily of hypotheses apping X Y to R, with Y = {,..., c}. Fix > 0. Then, for any δ > 0, with probability at least δ > 0, the following bound holds for all g G: Rg) R S, g) 4c R log δ Π G)) where Π G) = {x, y) gx, y) : y Y, g G}. Note that the bound of Theore 2 is strictly better than that of Theore for all c > 2. The bound of Theore 2 is ore optiistic both in ters of coputation resources and statistical hardness of the proble. To the best of our knowledge, it is an open proble if the dependence on the nuber of classes can be further iproved in general, that is for arbitrary hypothesis sets. Proof. We will need the following definition for this proof: g x, y) = in y y gx, y) gx, y )) θ,g x, y) = ingx, y) gx, y ) θ y =y), y where θ > 0 is an arbitrary constant. Observe that gx,y) 0 θ,g x,y) 0. To verify this clai it suffices to check that gx,y) 0 θ,g x,y) 0, which is equivalent to the following stateent: if g x, y) 0 then θ,g x, y) 0. Indeed, this follows fro the following bound: θ,g x, y) = in gx, y) gx, y ) ) θ y y =y in gx, y) gx, y ) θ y =y) y y = in gx, y) gx, y ) ) = g x, y), y y where the inequality follows fro taking the iniu over a saller set. Let Φ be the argin loss function defined for all u R by Φ u) = u 0 u ) 0<u. We also let G = {x, y) θ,g x, y): g G} and G = {Φ g : g G}. By the standard Radeacher coplexity bound Koltchinskii & Panchenko, 2002; Mohri et al., 202), for any δ > 0, with probability at least δ, the following holds for all g G: Rg) Φ θ,g x i, y i )) 2R G) log δ Fixing θ = 2, we observe that Φ θ,g x i, y i )) = Φ g x i, y i )) gx i,y i). Indeed, either θ,g x i, y i ) = g x i, y i ) or θ,g x i, y i ) = 2 g x i, y i ), which iplies the desired result. Talagrand s lea Ledoux & Talagrand, 99; Mohri et al., 202) yields R G) R G) since Φ is a -Lipschitz function. Therefore, for any δ > 0, with probability at least δ, for all g G: Rg) R S, g) 2 R G) log δ and to coplete the proof it suffices to show that R G) 2cR Π G)). Here, R G) can be upper-bounded as follows: R G) = σ i gx i, y i ) ax gx i, y) 2 y=yi )) y σ i gx i, y i ) σ i ax gx i, y) 2 y=yi ). y Now we bound the second ter above. Observe that σ i gx i, y i ) σ = σ i gx i, y) yi=y σ = y Y y Y y Y σ σ σ i gx i, y) yi=y ɛi σ i gx i, y) 2 ), 2 where ɛ i = 2 yi=y. Since ɛ i {, }, σ i and σ i ɛ i adit the sae distribution and, for any y Y, each of the ters of the right-hand side can be bounded as follows: σ 2 σ 2 σ ɛi σ i gx i, y) R Π G)). ) 2 2 σ i ɛ i gx i, y) σ i gx i, y)

4 Thus, we can write σ igx i, y i ) c R Π G)). To bound the second ter, we first apply Lea 8. of Mohri et al., 202) that iediately yields that y Y σ i ax gx i, y) 2 y=yi ) y σ i gx i, y) 2 y=yi ) and since Radeacher variables are ean zero, we observe that σ i gx i, y) 2 y=yi ) = = which copletes the proof. ) σ i gx i, y) 2 σ i y=yi σ i gx i, y) R Π G)) 3. Multi-class data-dependent learning guarantee for convex ensebles We consider p failies H,..., H p of functions apping fro X Y to 0, and the enseble faily F = conv p k= H k), that is the faily of functions f of the for f = T t= th t, where =,..., T ) is in the siplex and where, for each t, T, h t is in H kt for soe k t, p. The following theore gives a argin-based Radeacher coplexity bound for learning with ensebles of base classifiers with ultiple hypothesis sets. As with other Radeacher coplexity learning guarantees, our bound is data-dependent, which is an iportant and favorable characteristic of our results. Theore 3. Assue p > and let H,..., H p be p failies of functions apping fro X Y to 0,. Fix > 0. Then, for any δ > 0, with probability at least δ over the choice of a saple S of size drawn i.i.d. according to D, the following inequality holds for all f = T t= th t F: Rf) R S, f) 8c t= t R Π H kt )) 2 log p c 4 log ) c 2 2 log p 2 4 log p log 2 δ Thus, Rf) R S, f) 8c T t= tr H kt ) log p ) O 2 log 2 c 2 4 log p. Before we present the proof of this result we discuss soe of its consequences. For p =, that is for the special case of a single hypothesis set, this bound to the bound of Theore 2. However, the ain rearkable benefit of this learning bound is that its coplexity ter adits an explicit dependency on the ixture coefficients t. It is a weighted average of Radeacher coplexities with ixture weights t, t, T. Thus, the second ter of the bound suggests that, while soe hypothesis sets H k used for learning could have a large Radeacher coplexity, this ay not negatively affect generalization if the corresponding total ixture weight su of t s corresponding to that hypothesis set) is relatively sall. Using such potentially coplex failies could help achieve a better argin on the training saple. The theore cannot be proven via the standard Radeacher coplexity analysis of Koltchinskii & Panchenko 2002) since the coplexity ter of the bound would then be R conv p k= H k)) = R p k= H k) which does not adit an explicit dependency on the ixture weights and is lower bounded by T t= tr H kt ). Thus, the theore provides a finer learning bound than the one obtained via a standard Radeacher coplexity analysis. Our proof akes use of Theore 2 and a proof technique used in Schapire et al., 997). Proof. For a fixed h = h,..., h T ), any in the probability siplex defines a distribution over {h,..., h T }. Sapling fro {h,..., h T } according to and averaging leads to functions g of the for g = n T n th t for soe n = n,..., n T ), with T t= n t = n, and h t H kt. For any N = N,..., N p ) with N = n, we consider the faily of functions G F,N = { n p N k h k,j k, j) p N k, h k,j H k }, k= j= and the union of all such failies G F,n = N =n G F,N. Fix > 0. For a fixed N, the Radeacher coplexity of Π G F,N ) can be bounded as follows for any : R Π G F,N )) n p k= N k R Π H k )). Thus, by Theore 2, the following ulti-class argin-based Radeacher coplexity bound holds. For any δ > 0, with probability at least δ, for all g G F,N, R g) R S, g) 4c n p log δ N k R Π H k )) k= Since there are at ost p n possible p-tuples N with N =

5 n, by the union bound, for any δ > 0, with probability at least δ, for all g G F,n, we can write R g) R S, g) 4c n p log pn δ N k R Π H k )) k= Thus, with probability at least δ, for all functions g = n T n th t with h t H kt, the following inequality holds R g) R S, g) 4c n p k= t:k t=k log pn δ n t R Π H kt )) Taking the expectation with respect to and using n t /n = t, we obtain that for any δ > 0, with probability at least δ, for all g, we can write R g) R S, g) 4c log pn δ t R Π H kt )) t= Fix n. Then, for any δ n > 0, with probability at least δ n, R /2 g) R S,/2 g) 8c t R Π H kt )) t= log pn δ n δ Choose δ n = 2p for soe δ > 0, then for p 2, n n δ δ n = 2 /p) δ. Thus, for any δ > 0 and any n, with probability at least δ, the following holds for all g: R /2 g) R S,/2 g) 8c t R Π H kt )) t= log 2p2n δ 3) Now, for any f = T t= th t F and any g = n T n th t, we can upper-bound Rf) = Pr x,y) D f x, y) 0, the generalization error of f, as The nuber Sp, n) of p-tuples N with N = n is known to be precisely ) pn p. follows: Rf) = Pr f x, y) g x, y) g x, y) 0 x,y) D Pr f x, y) g x, y) < /2 Pr g x, y) /2 = Pr f x, y) g x, y) < /2 R /2 g). We can also write R /2 g) = R S,/2 g f f) Pr gx, y) f x, y) < /2 R S, f). x,y) S Cobining these inequalities yields Pr f x, y) 0 R S, f) x,y) D Pr f x, y) g x, y) < /2 x,y) D Pr x,y) S gx, y) f x, y) < /2 R /2 g) R S,/2 g). Taking the expectation with respect to yields Rf) R S, f) x,y) D, f x,y) gx,y)< /2 x,y) S, gx,y) f x,y)< /2 R /2 g) R S,/2 g). 4) Fix x, y) and for any function ϕ: X Y 0, define y ϕ as follows: y ϕ = argax y y ϕx, y). For any g, by definition of g, we can write g x, y) gx, y) gx, y f ). In light of this inequality and Hoeffding s bound, the following holds: f x,y) gx,y)< /2 = Pr f x, y) g x, y) < /2 fx, Pr y) fx, y f ) ) gx, y) gx, y f ) ) < /2 e n2 /8. Siilarly, for any g, we can write f x, y) fx, y) fx, y g). Using this inequality, the union bound and Hoeffding s bound, the other expectation ter appearing on the right-hand side of 4) can be bounded as follows: gx,y) f x,y)< /2 = Pr gx, y) f x, y) < /2 gx, Pr y) gx, y g ) ) fx, y) fx, y g) ) < /2 gx, Pr y) gx, y ) ) fx, y) fx, y ) ) < /2 y y c )e n2 /8.

6 Thus, for any fixed f F, we can write Rf) R S, f) ce n2 /8 R /2 g) R S,/2 g). Therefore, the following inequality holds: Rf) R S, f) f F ce n2 /8 R /2 g) R S,/2 g), g and, in view of 3), for any δ > 0 and any n, with probability at least δ, the following holds for all f F: Rf) R S, f) 8c Choosing n = 4 log p ) yields the following inequality: 2 Rf) R S, f) 8c t= t R Π H kt )) ce n2 8 2n ) log p log 2 δ. 2 4 log c and concludes the proof. 4. Conclusion t= t R Π H kt )) 2 log p c 4 log ) c 2 2 log p 2 4 log p log 2 δ We presented iproved Radeacher coplexity argin bounds that scale linearly with the nuber of classes, as opposed to the quadratic dependency of the existing Radeacher coplexity argin-based learning guarantees. Furtherore, we used this result to prove a novel generalization bound for ulti-class classifier ensebles that depends only on the Radeacher coplexity of the hypothesis classes to which the classifiers in the enseble belong. Cortes et al., 204) developed this idea in an algorith called DeepBoost, a boosting algorith where the decision at each iteration of which classifier to add to the enseble, and which weight to assign to that classifier, depends on the coplexity of the hypothesis class to which it belongs. One interpretation of DeepBoost is that it applies the principle of structural risk iniization to each iteration of boosting. Kuznetsov et al., 204) extended these ideas to the ulticlass setting. 2 To select n we consider fn) = ce nu nv, where u = 2 /8 and v = log p/. Taking the derivative of f, setting it to zero and solving for n, we obtain n = W v ) where 2u 2c 2 u W is the second branch of the Labert function inverse of x xe x ). Using the bound log x W x) 2 log x leads to the following choice of n: n = log v ). 2u 2c 2 u References Cortes, Corinna, Mohri, Mehryar, and Syed, Uar. Deep boosting. In ICML, pp , 204. Dietterich, Thoas G. An experiental coparison of three ethods for constructing ensebles of decision trees: Bagging, boosting, and randoization. Machine Learning, 402):39 57, Duchi, John C. and Singer, Yora. Boosting with structural sparsity. In ICML, pp. 38, Freund, Yoav and Schapire, Robert. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Coputer Syste Sciences, 55): 9 39, 997. Friedan, Jeroe H., Hastie, Trevor, and Tibshirani, Robert. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28:2000, 998. Grove, Ada J and Schuurans, Dale. Boosting in the liit: Maxiizing the argin of learned ensebles. In AAAI/IAAI, pp , 998. Koltchinskii, Vladir and Panchenko, Ditry. pirical argin distributions and bounding the generalization error of cobined classifiers. Annals of Statistics, 30, Kuznetsov, Vitaly, Mohri, Mehryar, and Syed, Uar. Multi-class deep boosting. In Ghahraani, Z., Welling, M., Cortes, C., Lawrence, N.D., and Weinberger, K.Q. eds.), Advances in Neural Inforation Processing Systes 27, pp Curran Associates, Inc., 204. Ledoux, Michel and Talagrand, Michel. Probability in Banach Spaces: Isoperietry and Processes. Springer, 99. Mohri, Mehryar, Rostaizadeh, Afshin, and Talwalkar, Aeet. Foundations of Machine Learning. The MIT Press, 202. Rätsch, Gunnar, Mika, Sebastian, and Waruth, Manfred K. On the convergence of leveraging. In NIPS, pp , 200a. Rätsch, Gunnar, Onoda, Takashi, and Müller, Klaus- Robert. Soft argins for AdaBoost. Machine Learning, 423): , 200b. Schapire, Robert. Theoretical views of boosting and applications. In Proceedings of ALT 999, volue 720 of Lecture Notes in Coputer Science, pp Springer, 999.

7 Schapire, Robert., Freund, Yoav, Bartlett, Peter, and Lee, Wee Sun. Boosting the argin: A new explanation for the effectiveness of voting ethods. In ICML, pp , 997. Margin Bounds for Learning with a Large Nuber of Classes

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Deep Boosting. Abstract. 1. Introduction

Deep Boosting. Abstract. 1. Introduction Corinna Cortes Google Research, 8th Avenue, New York, NY Mehryar Mohri Courant Institute and Google Research, 25 Mercer Street, New York, NY 2 Uar Syed Google Research, 8th Avenue, New York, NY Abstract

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Deep Boosting MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Model selection. Deep boosting. theory. algorithm. experiments. page 2 Model Selection Problem:

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Deep Boosting. Abstract. 1. Introduction

Deep Boosting. Abstract. 1. Introduction Corinna Cortes Google Research, 8th Avenue, New York, NY 00 Mehryar Mohri Courant Institute and Google Research, 5 Mercer Street, New York, NY 00 Uar Syed Google Research, 8th Avenue, New York, NY 00 Abstract

More information

Structured Prediction Theory Based on Factor Graph Complexity

Structured Prediction Theory Based on Factor Graph Complexity Structured Prediction Theory Based on Factor Graph Coplexity Corinna Cortes Google Research New York, NY 00 corinna@googleco Mehryar Mohri Courant Institute and Google New York, NY 00 ohri@cisnyuedu Vitaly

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Introduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 11 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Boosting Mehryar Mohri - Introduction to Machine Learning page 2 Boosting Ideas Main idea:

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Deep Boosting. Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) COURANT INSTITUTE & GOOGLE RESEARCH.

Deep Boosting. Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) COURANT INSTITUTE & GOOGLE RESEARCH. Deep Boosting Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Ensemble Methods in ML Combining several base classifiers

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Introduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 13 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification Mehryar Mohri - Introduction to Machine Learning page 2 Motivation

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

arxiv: v4 [cs.lg] 4 Apr 2016

arxiv: v4 [cs.lg] 4 Apr 2016 e-publication 3 3-5 Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions arxiv:35796v4 cslg 4 Apr 6 Corinna Cortes Google Research, 76 Ninth Avenue, New York, NY Spencer

More information

Stability Bounds for Non-i.i.d. Processes

Stability Bounds for Non-i.i.d. Processes tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer

More information

Accuracy at the Top. Abstract

Accuracy at the Top. Abstract Accuracy at the Top Stephen Boyd Stanford University Packard 264 Stanford, CA 94305 boyd@stanford.edu Mehryar Mohri Courant Institute and Google 25 Mercer Street New York, NY 002 ohri@cis.nyu.edu Corinna

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Matheatical Sciences, 251 Mercer Street, New York,

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

Boosting Ensembles of Structured Prediction Rules

Boosting Ensembles of Structured Prediction Rules Boosting Ensembles of Structured Prediction Rules Corinna Cortes Google Research 76 Ninth Avenue New York, NY 10011 corinna@google.com Vitaly Kuznetsov Courant Institute 251 Mercer Street New York, NY

More information

Foundations of Machine Learning Multi-Class Classification. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Multi-Class Classification. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Multi-Class Classification Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation Real-world problems often have multiple classes: text, speech,

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 1,2 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Matheatical Sciences, 251 Mercer Street, New York,

More information

Symmetrization and Rademacher Averages

Symmetrization and Rademacher Averages Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and

More information

Foundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification page 2 Motivation Real-world problems often have multiple classes:

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes Journal of Machine Learning Research (200) 789-84 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Learning with Deep Cascades

Learning with Deep Cascades Learning with Deep Cascades Giulia DeSalvo 1, Mehryar Mohri 1,2, and Uar Syed 2 1 Courant Institute of Matheatical Sciences, 251 Mercer Street, New Yor, NY 10012 2 Google Research, 111 8th Avenue, New

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

VC Dimension and Sauer s Lemma

VC Dimension and Sauer s Lemma CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions

More information

On the Impact of Kernel Approximation on Learning Accuracy

On the Impact of Kernel Approximation on Learning Accuracy On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu

More information

Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions

Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions Corinna Cortes Spencer Greenberg Mehryar Mohri January, 9 Abstract We present an extensive analysis of relative deviation

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes Journal of Machine Learning Research (200 66-686 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification

PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification Eilie Morvant eilieorvant@lifuniv-rsfr okol Koço sokolkoco@lifuniv-rsfr Liva Ralaivola livaralaivola@lifuniv-rsfr Aix-Marseille

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

arxiv: v2 [stat.ml] 23 Feb 2016

arxiv: v2 [stat.ml] 23 Feb 2016 Perutational Radeacher Coplexity A New Coplexity Measure for Transductive Learning Ilya Tolstikhin 1, Nikita Zhivotovskiy 3, and Gilles Blanchard 4 arxiv:1505.0910v stat.ml 3 Feb 016 1 Max-Planck-Institute

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Boosting with Abstention

Boosting with Abstention Boosting with Abstention Corinna Cortes Google Research New York, NY corinna@google.co Giulia DeSalvo Courant Institute New York, NY desalvo@cis.nyu.edu Mehryar Mohri Courant Institute and Google New York,

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

Closed-form evaluations of Fibonacci Lucas reciprocal sums with three factors

Closed-form evaluations of Fibonacci Lucas reciprocal sums with three factors Notes on Nuber Theory Discrete Matheatics Print ISSN 30-32 Online ISSN 2367-827 Vol. 23 207 No. 2 04 6 Closed-for evaluations of Fibonacci Lucas reciprocal sus with three factors Robert Frontczak Lesbank

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1. Notes on Coplexity Theory Last updated: October, 2005 Jonathan Katz Handout 7 1 More on Randoized Coplexity Classes Reinder: so far we have seen RP,coRP, and BPP. We introduce two ore tie-bounded randoized

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

ADANET: adaptive learning of neural networks

ADANET: adaptive learning of neural networks ADANET: adaptive learning of neural networks Joint work with Corinna Cortes (Google Research) Javier Gonzalo (Google Research) Vitaly Kuznetsov (Google Research) Scott Yang (Courant Institute) MEHRYAR

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison yströ Method vs : A Theoretical and Epirical Coparison Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou Machine Learning Lab, GE Global Research, San Raon, CA 94583 Michigan State University,

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

Boosting with Abstention

Boosting with Abstention Boosting with Abstention Corinna Cortes Google Research New York, NY 00 corinna@google.co Giulia DeSalvo Courant Institute New York, NY 00 desalvo@cis.nyu.edu Mehryar Mohri Courant Institute and Google

More information

Foundations of Machine Learning Kernel Methods. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Kernel Methods. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Kernel Methods Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Motivation Efficient coputation of inner products in high diension. Non-linear decision

More information

Structured Prediction Theory and Algorithms

Structured Prediction Theory and Algorithms Structured Prediction Theory and Algorithms Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Google Research) Scott Yang (Courant Institute) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE

More information

E. Alpaydın AERFAISS

E. Alpaydın AERFAISS E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University

More information

A theory of learning from different domains

A theory of learning from different domains DOI 10.1007/s10994-009-5152-4 A theory of learning fro different doains Shai Ben-David John Blitzer Koby Craer Alex Kulesza Fernando Pereira Jennifer Wortan Vaughan Received: 28 February 2009 / Revised:

More information

Foundations of Machine Learning Lecture 5. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Lecture 5. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Lecture 5 Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Kernel Methods Motivation Non-linear decision boundary. Efficient coputation of inner products

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Kernel-Based Nonparametric Anomaly Detection

Kernel-Based Nonparametric Anomaly Detection Kernel-Based Nonparaetric Anoaly Detection Shaofeng Zou Dept of EECS Syracuse University Eail: szou@syr.edu Yingbin Liang Dept of EECS Syracuse University Eail: yliang6@syr.edu H. Vincent Poor Dept of

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Rademacher Complexity Bounds for Non-I.I.D. Processes

Rademacher Complexity Bounds for Non-I.I.D. Processes Rademacher Complexity Bounds for Non-I.I.D. Processes Mehryar Mohri Courant Institute of Mathematical ciences and Google Research 5 Mercer treet New York, NY 00 mohri@cims.nyu.edu Afshin Rostamizadeh Department

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

Generalization Bounds for Learning Weighted Automata

Generalization Bounds for Learning Weighted Automata Generalization Bounds for Learning Weighted Autoata Borja Balle a,, Mehryar Mohri b,c a Departent of Matheatics and Statistics, Lancaster University, Lancaster, UK b Courant Institute of Matheatical Sciences,

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Introduction to Discrete Optimization

Introduction to Discrete Optimization Prof. Friedrich Eisenbrand Martin Nieeier Due Date: March 9 9 Discussions: March 9 Introduction to Discrete Optiization Spring 9 s Exercise Consider a school district with I neighborhoods J schools and

More information

MANY physical structures can conveniently be modelled

MANY physical structures can conveniently be modelled Proceedings of the World Congress on Engineering Coputer Science 2017 Vol II Roly r-orthogonal (g, f)-factorizations in Networks Sizhong Zhou Abstract Let G (V (G), E(G)) be a graph, where V (G) E(G) denote

More information