Robustness and Regularization of Support Vector Machines

Size: px
Start display at page:

Download "Robustness and Regularization of Support Vector Machines"

Transcription

1 Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA Shie Mannor ECE, McGill University Montreal, QC Canada Abstract We consider a robust classification proble and show that standard regularized SVM is a special case of our forulation, providing an explicit link between regularization and robustness. At the sae tie, the physical connection of noise and robustness suggests the potential for a broad new faily of robust classification algoriths. Finally, we show that robustness is a fundaental property of classification algoriths, by re-proving consistency of support vector achines using only robustness arguents (instead of VC diension or stability). 1 Introduction Support Vector Machines or SVMs [1, 2] find the hyperplane in the feature space that achieves axiu saple argin in the separable case. When the saples are not separable, a penalty ter that approxiates the total training-error is considered [3]. It is well known that iniizing the training error itself can lead to poor classification perforance for new unlabeled data because of, essentially, overfitting [4]. One of the ost popular ethods proposed to cobat this proble is iniizing a cobination of the training-error and a regularization ter. The resulting regularized classifier perfors better on new data. This phenoenon is often interpreted fro a statistical learning theory view: the regularization ter restricts the coplexity of the classifier, hence the deviation of the testing error and the training error is controlled [5, 6, 7]. We consider a different setup, assuing that soe non-iid (potentially adversarial) disturbance is added to the training saples we observe. We follow a robust optiization approach (e.g., [8, 9]) iniizing the worst possible epirical error under such disturbances. The use of robust optiization in classification is not new (e.g., [10, 11]). Past robust classification odels consider only box-type uncertainty sets, which allow the possibility that the data have all been skewed in soe non-neutral anner. We develop a new robust classification fraework that treats non box-type uncertainty sets, itigates conservatis, and provides an explicit connection to regularization. Our contributions include: We show that the standard regularized SVM classifier is a special case of our robust classification, thus explicitly relating robustness and regularization. This provides an alternative explanation to the success of regularization, and also suggests new physically-otivated ways to construct regularizers. Our robust SVM forulation perits finer control of the adversarial disturbance, restricting it to satisfy aggregate constraints across data points, therefore controlling the conservatis. 1

2 We show that the robustness perspective, steing fro a non-iid analysis, can be useful in a standard iid setup, by using it to give a new proof of consistency for standard SVM classification. This result iplies that generalization ability is a direct result of robustness to local disturbances, and we can construct learning algoriths that generalize well by robustifying non-consistent algoriths. We explain here how the explicit equivalence of robustness and regularization we derive differs fro previous work, and why it is interesting. While certain equivalence relationships between robustness and regularization have been established for probles outside the achine learning field ([8, 9]), their results do not directly apply to the classification proble. Research on classifier regularization ainly focuses on bounding the coplexity of the function class (e.g., [5, 6, 7]). Meanwhile, research on robust classification has not attepted to relate robustness and regularization (e.g., [10, 11, 12]), in part due to the robustness forulations used there. The connection of robustness and regularization in the SVM context is iportant for the following reasons. It gives an alternative and potentially powerful explanation of the generalization ability of the regularization ter. In the classical achine learning literature, the regularization ter bounds the coplexity of the class of classifiers. The robust view of regularization regards the testing saples as a perturbed copy of the training saples. We show that when the total perturbation is given or bounded, the regularization ter bounds the gap between the classification errors of the SVM on these two sets of saples. In contrast to the standard PAC approach, this bound depends neither on how rich the class of candidate classifiers is, nor on an assuption that all saples are picked in an i.i.d. anner. In addition, this suggests novel approaches to designing good classification algoriths, in particular, designing the regularization ter. In Section 3 we use this new view to provide a novel proof of consistency that does not rely on VC-diension or stability arguents. In the PAC structural-risk iniization approach, regularization is chosen to iniize a bound on the generalization error based on the training error and a coplexity ter. This coplexity ter typically leads to overly ephasizing the regularizer, and indeed this approach is known to often be too pessiistic ([13]). The robust approach offers another avenue. Since both noise and robustness are physical processes, a close investigation of the application and noise characteristics at hand, can provide insights into how to properly robustify, and therefore regularize the classifier. For exaple, it is widely known that noralizing the saples so that the variance aong all features are roughly the sae often leads to good generalization perforance. Fro the robustness perspective, this siply says that the noise is skewed (ellipsoidal) rather than spherical, and hence an appropriate robustification ust be designed to fit the skew of the physical noise process. Notation: Capital letters and boldface letters are used to denote atrices and colun vectors, respectively. For a given nor, we use to denote its dual nor. The set of integers fro 1 to n is denoted by [1 : n]. 2 Robust Classification and Regularization We consider the standard binary classification proble, where we are given a finite nuber of training saples {x i, y i } Rn { 1, +1}, and ust find a linear classifier, specified by the function h (x) = sgn( w, x + b). For the standard regularized classifier, the paraeters (w, b) are obtained by solving the following convex optiization proble: { in r(w, b) + ax [ 1 y i ( w,x i + b), 0 ]}. where r(w, b) is a regularization ter. Previous robust classification work [10, 14] considers the classification proble where the input are subject to (unknown) disturbances δ = (δ 1,..., δ ) and essentially solves the following ini-ax proble: in { ax r(w, b) + ax [ 1 y i ( w, x i δ i + b), 0 ]}, (1) δ Nbox for a box-type uncertainty set N box. That is, let N i denote the projection of N box onto the δ i coponent, then N box = N 1 N. Effectively, this allows siultaneous worst-case disturbances 2

3 across any saples, and leads to overly conservative solutions. The goal of this paper is to obtain a robust forulation where the disturbances {δ i } ay be eaningfully taken to be correlated, i.e., to solve for a non-box-type N : { in ax r(w, b) + ax [ 1 y i ( w,x i δ i + b), 0 ]}. (2) δ N We define explicitly the correlated disturbance (or uncertainty) which we study below. Definition A set N 0 R n is called an Atoic Uncertainty Set if [ (I) 0 N 0 ; (II) sup w δ ] [ = sup w δ ] <, w R n. δ N 0 δ N 0 2. Let N 0 be an atoic uncertainty set. A set N R n is called a Concave Correlated Uncertainty Set (CCUS) of N 0, if N N N +. Here N Nt ; N t {(δ 1,, δ ) δ t N 0 ; δ i =t = 0; } t=1 N + {(α 1 δ 1,, α δ ) α i = 1; α i 0, δ i N 0, i}. The concave correlated uncertainty definition odels the case where the disturbances on each saple are treated identically, but their aggregate behavior across ultiple saples is controlled. In particular, {(δ 1,, δ ) δ i c} is a CCUS with N 0 = { δ } δ c. Theore 1. Assue {x i, y i } are non-separable, r( ) : Rn+1 R is an arbitrary function, N is a CCUS with corresponding atoic uncertainty set N 0. Then the following two probles are equivalent: { in sup r(w, b) + ax [ 1 y i ( w,x i δ i + b), 0 ]} ; (3) (δ 1,,δ ) N in : r(w, b) + sup (w δ) + δ N 0 ξ i, (4) s.t. : ξ i 1 [y i ( w, x i + b)], i = 1,...,; ξ i 0, i = 1,...,. Proof. We outline the proof. Let v(w, b) sup δ N0 (w δ)+ ax[ 1 y i ( w,x i +b), 0 ]. It suffices to show that for any (ŵ,ˆb) R n+1, v(ŵ,ˆb) sup (δ 1,,δ ) N + sup (δ 1,,δ ) N ax [ 1 y i ( ŵ,x i δ i + ˆb), 0 ]. (5) ax [ 1 y i ( ŵ,x i δ i + ˆb), 0 ] v(ŵ,ˆb). (6) Since the saples {x i, y i } are not separable, there exists t [1 : ] such that y t ( ŵ,x t + ˆb) < 0. With soe algebra, we have sup(δ1,,δ ) N t ax [ 1 y i ( ŵ,x i δ i +ˆb), 0 ] = v(ŵ,ˆb). Since Nt N, Inequality (5) follows. Establishing Inequality (6) is standard. The following corollary thus shows regularized SVM is a special case of robust classification. { Corollary 1. Let T (δ 1, δ ) } δ i c. If the training saples {x i, y i } are non-separable, then the following two optiization probles on (w, b) are equivalent: in : ax [ ( 1 y i w, xi δ i + b ), 0 ], (7) ax (δ 1,,δ ) T k in : c w + ax [ ( 1 y i w, xi + b ), 0 ]. (8) 3

4 This corollary explains the widely known fact that the regularized classifier tends to be robust. It also suggests that the appropriate way to regularize should coe fro a disturbance-robustness perspective, e.g., by exaining the variation of the data and solving the corresponding robust classification proble. Corollary 1 can be easily generalize to a kernelized version, i.e., a linear classifier in the feature space H that is defined as a Hilbert space containing the range of soe feature apping Φ( ). { Corollary 2. Let T H (δ 1, δ ) } δ i H c. If {Φ(x i ), y i } are non-separable, then the following two optiization probles on (w, b) are equivalent in : ax [ ( 1 y i w, Φ(xi ) δ i + b ), 0 ], (9) ax (δ 1,,δ ) T k in : c w H + ax [ ( 1 y i w, Φ(xi ) + b ), 0 ]. (10) Here, H stands for the RKHS nor, which is self-dual. Corollary 2 essentially says that the standard kernelized SVM is iplicitly a robust classifier with disturbance in the feature-space. Disturbance in the feature-space is less intuitive than disturbance in the saple space. However, the next lea relates these two different setups: under certain conditions, a classifier that achieves robustness in the feature space (the SVM for exaple) also achieves robustness in the saple space. The proof is straightforward and oitted. Lea 1. Suppose there exist X R n, ρ > 0, and a continuous non-decreasing function f : R + R + satisfying f(0) = 0, such that Then k(x,x) + k(x,x ) 2k(x,x ) f( x x 2 2), x,x X, x x 2 ρ. Φ(ˆx + δ) Φ(ˆx) H 3 Consistency of Regularization f( δ 2 2 ), δ 2 ρ, ˆx, ˆx + δ X. In this section we explore a fundaental connection between learning and robustness, by using robustness properties to re-prove the statistical consistency of the linear classifier, and then the kernelized SVM. Indeed, our proof irrors the consistency proof found in [15], with the key difference that we replace etric entropy, VC-diension, and stability used there, with robustness. In contrast to these standard techniques which often work for a liited range of algoriths, robustness arguent works for a uch wider range of algoriths and allows a unified approach to show consistency. Thus far we have considered the setup where the training-saples are corrupted by certain setinclusive disturbances, and now we turn to the standard statistical learning setup. That is, let X R n be bounded, and suppose the training saples (x i, y i ) are generated i.i.d. according to an unknown distribution P supported on X { 1, +1}. The next theore shows that our robust classifier setup and equivalently regularized SVM iniizes an asyptotical upper-bound of the expected classification error and hinge loss. Theore 2. Denote K ax x X k(x,x). Suppose there exist ρ > 0 and a continuous nondecreasing function f : R + R + satisfying f(0) = 0, such that: k(x,x) + k(x,x ) 2k(x,x ) f( x x 2 2 ), x,x X, x x 2 ρ. Then there exists a rando sequence {γ,c } independent of P such that, c > 0, li γ,c = 0, alost surely, and (w, b) H R, the following bounds on the Bayes loss and the hinge loss hold E P (1 y sgn( w, Φ(x) +b) ) γ,c + c w H + 1 ax [ 1 y i ( w, Φ(x i ) + b), 0 ], E (x,y) P ( ax(1 y( w, Φ(x) + b), 0) ) γ,c (1 + K w H + b ) + c w H + 1 ax [ 1 y i ( w, Φ(x i ) + b), 0 ]. 4

5 Proof. Step 1: We first prove the theore for a non-kernelized case. We fix a c > 0 and drop the subscript c of γ,c. A testing saple (x, y ) and a training saple (x, y) are called a saple pair if y = y and x x 2 c. We say a set of training saples and a set of testing saples for l pairings if there exist l saple pairs with no data reused. Given training saples and testing saples, we use M to denote the largest nuber of pairings. Lea 2. Given c > 0, M / 1 alost surely as +, uniforly w.r.t. P. Proof. We provide a proof sketch. Partition X into finite (say T ) sets such that if a training saple and a testing saple fall into one set, they for a pairing. This is doable due to finite-diensionality of the saple space. Let Ni tr and Ni te be the nuber of training saples and testing saples falling in the i th set, respectively. Thus, (N1 tr,, NT tr ) and (Nte 1,, NT te ) are ultinoially distributed rando vectors following a sae distribution. It is straightforward to show that T Ni tr Ni te / 0 with probability one (e.g., Bretagnolle-Huber-Carol inequality), and hence M / 1 alost surely. Moreover, the convergence rate does not depend on P. Now we proceed to prove the theore. Let N 0 = {δ δ c}. Given training saples and testing saples with M saple pairs, for these paired saples, both the total testing error and the total testing hinge-loss are upper bounded by ax (δ 1,,δ ) N 0 N 0 c w 2 + ax [ ( 1 y i w, xi δ i + b ), 0 ] ax [ 1 y i ( w, x i + b), 0]. Hence the average testing error (including unpaired ones) is upper bounded by 1 M / + c w n ax [ 1 y i ( w, x i + b), 0]. Since ax x X (1 y( w,x )) 1 + b + K w 2, the average hinge loss is upper bounded by (1 M /)(1 + K w 2 + b ) + c w ax [ 1 y i ( w, x i + b), 0 ]. The proof follows since M / 1 alost surely. Step 2:Now we generalize the result to a kernelized version. Siilarly we lower-bound the nuber of saple pairs in the feature-space. The ultinoial rando variable arguent used in the proof of Lea 2 breaks down, due to possible infinite-diensionality of the feature space. Nevertheless, we are able to lower bound the nuber of saple pairs in the feature space by the nuber of saple pairs in the saple space. Define f 1 (α) ax{β 0 f(β) α}. Since f( ) is continuous, f 1 (α) > 0 for any α > 0. By Lea 1, if x and x belong to a hyper-cube of length in(ρ/ n, f 1 (c 2 )/ n) in the saple space, then Φ(x) Φ(x ) H c. Hence the nuber of saple pairs in the feature space is lower bounded by the nuber of pairs of saples that fall in the sae hyper-cube in the saple space. We can cover X with finitely any such hyper-cubes since f 1 (c 2 ) > 0. The rest of the proof is identical to Step 1. Notice that the condition in Theore 2 requires that the feature apping is sooth and hence preserves locality of the disturbance, i.e., sall disturbance in the saple space guarantees the corresponding disturbance in the feature space is also sall. This condition is satisfied by ost widely used kernels, e.g., hoogeneous polynoinal kernels, and Gaussian RBF. It is easy to construct non-sooth kernel functions which do not generalize well. For exaple, consider the following kernel k(x,x ) = 1 (x=x ), i.e., k(x,x ) = 1 if x = x, and zero otherwise. A standard RKHS regularized SVM leads to the a decision function sign( α ik(x,x i ) + b), which equals sign(b) and provides no eaningful prediction if the testing saple x is not one of the training saples. Hence as increases, the testing error reains as large as 50% regardless of the tradeoff paraeter used in the algorith, while the training error can be arbitrarily sall by fine-tuning the paraeter. 5

6 Theore 2 can further lead to the standard consistency notion, i.e., convergence to the Bayes Risk ([15]). The proof in [15] involves a step showing that the expected hinge loss of the iniizer of the regularized training hinge loss concentrates around the epirical regularized hinge loss, accoplished using concentration inequalities derived fro VC-diension considerations, and stability considerations. Instead, we can use our robustness-based results of Theore 2 to replace these approaches. The detailed proof is oitted due to space liits. 4 Concluding Rearks This work considers the relationship between robustness and regularized SVM classification. In particular, we prove that the standard nor-regularized SVM classifier is in fact the solution to a robust classification setup, and thus known results about regularized classifiers extend to robust classifiers. To the best of our knowledge, this is the first explicit such link between regularization and robustness in pattern classification. This link suggests that nor-based regularization essentially builds in a robustness to saple noise whose probability level sets are syetric, and oreover have the structure of the unit ball with respect to the dual of the regularizing nor. It would be interesting to understand the perforance gains possible when the noise does not have such characteristics, and the robust setup is used in place of regularization with appropriately defined uncertainty set. In addition, based on the robustness interpretation of the regularization ter, we re-proved the consistency of SVMs without direct appeal to notions of etric entropy, VC-diension, or stability. Our proof suggests that the ability to handle disturbance is crucial for an algorith to achieve good generalization ability. References [1] P. Boser, I. Guyon, and V. Vapnik. A training algorith for optial argin classifiers. In Proceedings of the Fifth Annual ACM Workshop on Coputational Learning Theory, pages , New York, NY, [2] V. Vapnik and A. Lerner. Pattern recognition using generalized portrait ethod. Autoation and Reote Control, 24: , [3] K. Bennett and O. Mangasarian. Robust linear prograing discriination of two linearly inseparable sets. Optiization Methods and Software, 1(1):23 34, [4] V. Vapnik and A. Chervonenkis. The necessary and sufficient conditions for consistency in the epirical risk iniization ethod. Pattern Recognition and Iage Analysis, 1(3): , [5] A. Sola, B. Schölkopf, and K. Müller. The connection between regularization operators and support vector kernels. Neural Networks, 11: , [6] P. Bartlett and S. Mendelson. Radeacher and Gaussian coplexities: Risk bounds and structural results. Journal of Machine Learning Research, 3: , Noveber [7] V. Koltchinskii and D. Panchenko. Epirical argin distributions and bounding the generalization error of cobined classifiers. The Annals of Statistics, 30(1):1 50, [8] L. El Ghaoui and H. Lebret. Robust solutions to least-squares probles with uncertain data. SIAM Journal on Matrix Analysis and Applications, 18: , [9] A. Ben-Tal and A. Neirovski. Robust solutions of uncertain linear progras. Operations Research Letters, 25(1):1 13, August [10] P. Shivasway, C. Bhattacharyya, and A. Sola. Second order cone prograing approaches for handling issing and uncertain data. Journal of Machine Learning Research, 7: , July [11] G. Lanckriet, L. El Ghaoui, C. Bhattacharyya, and M. Jordan. A robust iniax approach to classification. Journal of Machine Learning Research, 3: , Deceber [12] T. Trafalis and R. Gilbert. Robust support vector achines for classification and coputational issues. Optiization Methods and Software, 22(1): , February [13] M. Kearns, Y. Mansour, A. Ng, and D. Ron. An experiental and theoretical coparison of odel selection ethods. Machine Learning, 27:7 50, [14] C. Bhattacharyya, K. Pannagadatta, and A. Sola. A second order cone prograing forulation for classifying issing data. In Lawrence K. Saul, Yair Weiss, and Léon Bottou, editors, Advances in Neural Inforation Processing Systes (NIPS17), Cabridge, MA, MIT Press. [15] I. Steinwart. Consistency of support vector achines and other regularized kernel classifiers. IEEE Transactions on Inforation Theory, 51(1): ,

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Research Article Robust ε-support Vector Regression

Research Article Robust ε-support Vector Regression Matheatical Probles in Engineering, Article ID 373571, 5 pages http://dx.doi.org/10.1155/2014/373571 Research Article Robust ε-support Vector Regression Yuan Lv and Zhong Gan School of Mechanical Engineering,

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer. UIVRSITY OF TRTO DIPARTITO DI IGGRIA SCIZA DLL IFORAZIO 3823 Povo Trento (Italy) Via Soarive 4 http://www.disi.unitn.it O TH US OF SV FOR LCTROAGTIC SUBSURFAC SSIG A. Boni. Conci A. assa and S. Piffer

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada

More information

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions Kernel Choice and Classifiability for RKHS Ebeddings of Probability Distributions Bharath K. Sriperubudur Departent of ECE UC San Diego, La Jolla, USA bharathsv@ucsd.edu Kenji Fukuizu The Institute of

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Multi-view Discriminative Manifold Embedding for Pattern Classification

Multi-view Discriminative Manifold Embedding for Pattern Classification Multi-view Discriinative Manifold Ebedding for Pattern Classification X. Wang Departen of Inforation Zhenghzou 450053, China Y. Guo Departent of Digestive Zhengzhou 450053, China Z. Wang Henan University

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

Rotational Prior Knowledge for SVMs

Rotational Prior Knowledge for SVMs Rotational Prior Knowledge for SVMs Arkady Epshteyn and Gerald DeJong University of Illinois at Urbana-Chapaign, Urbana, IL 68, USA {aepshtey,dejong}@uiuc.edu Abstract. Incorporation of prior knowledge

More information

Stability Bounds for Non-i.i.d. Processes

Stability Bounds for Non-i.i.d. Processes tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University

More information

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax:

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax: A general forulation of the cross-nested logit odel Michel Bierlaire, EPFL Conference paper STRC 2001 Session: Choices A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics,

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

Introduction to Discrete Optimization

Introduction to Discrete Optimization Prof. Friedrich Eisenbrand Martin Nieeier Due Date: March 9 9 Discussions: March 9 Introduction to Discrete Optiization Spring 9 s Exercise Consider a school district with I neighborhoods J schools and

More information

PAC-Bayesian Learning of Linear Classifiers

PAC-Bayesian Learning of Linear Classifiers Pascal Gerain Pascal.Gerain.@ulaval.ca Alexandre Lacasse Alexandre.Lacasse@ift.ulaval.ca François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Départeent d inforatique

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Prediction by random-walk perturbation

Prediction by random-walk perturbation Prediction by rando-walk perturbation Luc Devroye School of Coputer Science McGill University Gábor Lugosi ICREA and Departent of Econoics Universitat Popeu Fabra lucdevroye@gail.co gabor.lugosi@gail.co

More information

Kernel-Based Nonparametric Anomaly Detection

Kernel-Based Nonparametric Anomaly Detection Kernel-Based Nonparaetric Anoaly Detection Shaofeng Zou Dept of EECS Syracuse University Eail: szou@syr.edu Yingbin Liang Dept of EECS Syracuse University Eail: yliang6@syr.edu H. Vincent Poor Dept of

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

Convex Programming for Scheduling Unrelated Parallel Machines

Convex Programming for Scheduling Unrelated Parallel Machines Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

A Theoretical Framework for Deep Transfer Learning

A Theoretical Framework for Deep Transfer Learning A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

Accuracy at the Top. Abstract

Accuracy at the Top. Abstract Accuracy at the Top Stephen Boyd Stanford University Packard 264 Stanford, CA 94305 boyd@stanford.edu Mehryar Mohri Courant Institute and Google 25 Mercer Street New York, NY 002 ohri@cis.nyu.edu Corinna

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

Lecture 9: Multi Kernel SVM

Lecture 9: Multi Kernel SVM Lecture 9: Multi Kernel SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 204 April 6, 204 Roadap Tuning the kernel: MKL The ultiple kernel proble Sparse kernel achines for regression: SVR SipleMKL:

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

On the Impact of Kernel Approximation on Learning Accuracy

On the Impact of Kernel Approximation on Learning Accuracy On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu

More information

Lecture 9 November 23, 2015

Lecture 9 November 23, 2015 CSC244: Discrepancy Theory in Coputer Science Fall 25 Aleksandar Nikolov Lecture 9 Noveber 23, 25 Scribe: Nick Spooner Properties of γ 2 Recall that γ 2 (A) is defined for A R n as follows: γ 2 (A) = in{r(u)

More information

Neural Network Learning as an Inverse Problem

Neural Network Learning as an Inverse Problem Neural Network Learning as an Inverse Proble VĚRA KU RKOVÁ, Institute of Coputer Science, Acadey of Sciences of the Czech Republic, Pod Vodárenskou věží 2, 182 07 Prague 8, Czech Republic. Eail: vera@cs.cas.cz

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Error Exponents in Asynchronous Communication

Error Exponents in Asynchronous Communication IEEE International Syposiu on Inforation Theory Proceedings Error Exponents in Asynchronous Counication Da Wang EECS Dept., MIT Cabridge, MA, USA Eail: dawang@it.edu Venkat Chandar Lincoln Laboratory,

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Machine Learning: Fisher s Linear Discriminant. Lecture 05

Machine Learning: Fisher s Linear Discriminant. Lecture 05 Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05 upervised Learning ask learn an (unkon) function

More information

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations Randoized Accuracy-Aware Progra Transforations For Efficient Approxiate Coputations Zeyuan Allen Zhu Sasa Misailovic Jonathan A. Kelner Martin Rinard MIT CSAIL zeyuan@csail.it.edu isailo@it.edu kelner@it.edu

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Sample Guarantees

Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Sample Guarantees Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Saple Guarantees Jean Honorio CSAIL, MIT Cabridge, MA 0239, USA jhonorio@csail.it.edu Toi Jaakkola CSAIL, MIT Cabridge, MA

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

arxiv: v1 [cs.lg] 8 Jan 2019

arxiv: v1 [cs.lg] 8 Jan 2019 Data Masking with Privacy Guarantees Anh T. Pha Oregon State University phatheanhbka@gail.co Shalini Ghosh Sasung Research shalini.ghosh@gail.co Vinod Yegneswaran SRI international vinod@csl.sri.co arxiv:90.085v

More information

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison yströ Method vs : A Theoretical and Epirical Coparison Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou Machine Learning Lab, GE Global Research, San Raon, CA 94583 Michigan State University,

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information