Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research
|
|
- Neal Hunt
- 5 years ago
- Views:
Transcription
1 Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research
2 Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak) learning algorith Land such that: for all >0, for all c C and all distributions D, for saples S of size polynoial. Pr R(h S) S D 1 2 =poly(1/ ) (Kearns and Valiant, 1994) 1, for a fixed >0 2
3 Finding siple relatively accurate base classifiers often not hard weak learner. Main ideas: Boosting Ideas use weak learner to create a strong learner. cobine base classifiers returned by weak learner (enseble ethod). But, how should the base classifiers be cobined? 3
4 H { 1, +1} X. AdaBoost (Freund and Schapire, 1997) AdaBoost(S =((x 1,y 1 ),...,(x,y ))) 1 for i 1 to do 2 D 1 (i) 1 3 for t 1 to T do 4 h t base classifier in H with sall error t = Pr i D t [h t (x i )6=y i ] 1 5 t 2 log 1 t t 6 Z t 2[ t (1 t )] 1 2. noralization factor 7 for i 1 to do 8 D t+1 (i) P t 9 f t s=1 sh s 10 return h = sgn(f T ) D t (i)exp( t y i h t (x i )) Z t 4
5 Distributions D t originally unifor. Notes over training saple: at each round, the weight of a isclassified exaple is increased. observation: D, since t+1 (i)= e y i f t (x i ) Q t s=1 Z s D t+1 (i) = D ty t(i)e i h t (x i ) = D t 1yiht t 1(i)e 1(xi) e tyiht(xi) = 1 e y P t i s=1 sh s (x i ) t Z t Z t 1 Z t s=1 Z. s Weight assigned to base classifier h t : t directly depends on the accuracy of at round t. h t 5
6 Illustration t = 1 t = 2 6
7 t =
8 α 1 + α 2 + α 3 = 8
9 Bound on Epirical Error Theore: The epirical error of the classifier output by AdaBoost verifies: R(h) exp 2 If further for all t [1,T], ( 1 2 t), then R(h) exp( 2 2 T ). does not need to be known in advance: adaptive boosting. T t=1 1 2 (Freund and Schapire, 1997) t 2. 9
10 Proof: Since, as we saw, D t+1 (i)= e y i f t (x i ), Q t s=1 Z s R(h) = 1 Now, since Z t = = 1 Z t 1 yi f(x i ) 0 T t=1 1 Z t D T +1 (i) = is a noralization factor, D t (i)e ty i h t (x i ) i:y i h t (x i ) 0 D t (i)e t + =(1 t)e t + t e t =(1 t) t 1 t + t exp( y i f(x i )) T t=1 i:y i h t (x i )<0 1 t t Z t. D t (i)e t =2 t(1 t). 10
11 Thus, T Z t = T 2 t(1 t) = T t 2 t=1 Notes: t=1 T t=1 exp t t=1 2 =exp 2 T t=1 1 2 t 2. t iniizer of (1 t)e + t e. since (1 t)e t = t e t, at each round, AdaBoost assigns the sae probability ass to correctly classified and isclassified instances. for base classifiers x [ 1, +1], t can be siilarly chosen to iniize. Z t 11
12 AdaBoost = Coordinate Descent Objective Function: convex and differentiable. F ( ) = 1 X e y if(x i ) = 1 e x X e y P N i j=1 jh j (x i ). 0 1loss 12
13 Direction: unit vector derivative: F 0 ( t e k with best directional F ( t 1 + e k ) F ( t 1 ) 1, e k )=li!0 X Since F (, t 1 + e k )= e y P N i j=1 t 1,jh j (x i ) y i h k (x i ) F 0 ( t 1, e k )= 1 X y i h k (x i )e y i = 1 X y i h k (x i ) D t (i) Z t " X = D t (i)1 yi h k (x i )=+1 = P N j=1 t 1,jh j (x i ) Thus, direction corresponding to base classifier with sallest error. X h (1 t,k ) t,k i Zt = h 2 t,k 1 Zt D t (i)1 yi h k (x i )= 1# i Zt.. 13
14 Step size: chosen to iniize F ( t 1 + e k ); df ( t 1 + e k ) d =0,,, X y i h k (x i )e y P N i j=1 t 1,jh j (x i ) e y ih k (x i ) =0 X y i h k (x i ) D t (i) Z t e y ih k (x i ) =0 X y i h k (x i ) D t (i)e y ih k (x i ) =0, (1 t,k )e t,k e =0, = 1 2 log 1 t,k t,k. Thus, step size atches base classifier weight of AdaBoost. 14
15 Alternative Loss Functions boosting loss x e x logistic loss x log 2 (1 + e x ) square loss x (1 x) 2 1 x 1 hinge loss x ax(1 x, 0) zero-one loss x 1 x<0 15
16 Standard Use in Practice Base learners: decision trees, quite often just decision stups (trees of depth one). Boosting stups: data in R N, e.g., N =2, (height(x), weight(x)). associate a stup to each coponent. pre-sort each coponent: O(Nlog ). total coplexity: O(( log )N + NT ). at each round, find best coponent and threshold. stups not weak learners: think XOR exaple! 16
17 Assue that F T = sgn Overfitting? VCdi(H)=d and for a fixed T, define can for a very rich faily of classifiers. It can be shown (Freund and Schapire, 1997) that: F T T t=1 th t b : t,b R,h t H. VCdi(F T ) 2(d +1)(T +1)log 2 ((T +1)e). This suggests that AdaBoost could overfit for large values of T, and that is in fact observed in soe cases, but in various others it is not! 17
18 Epirical Observations Several epirical observations (not all): AdaBoost does not see to overfit, furtherore: test error training error error # rounds C4.5 decision trees (Schapire et al., 1998). 18
19 Radeacher Coplexity of Convex Hulls Theore: Let H be a set of functions apping fro X to R. Let the convex hull of H be defined as Then, for any saple S, R S (conv(h)) = R S (H). Proof: conv(h) ={ p k=1 R S (conv(h)) = 1 E µ k h k : p 1,µ k 0, sup h k H,µ 0, µ 1 1 = 1 E sup sup h k H µ 0, µ 1 1 = 1 E sup h k H = 1 E sup h H ax k [1,p] p k=1 ih(x i ) k=1 µ k 1,h k H}. i p k=1 p µ k ih k (x i ) = R S (H). µ k h k (x i ) ih k (x i ) 19
20 Margin Bound - Enseble Methods Corollary: Let H be a set of real-valued functions. Fix >0. For any >0, with probability at least 1, the following holds for all h conv(h) : R(h) R (h)+ 2 R H + log 1 R(h) R (h)+ 2 R S H +3 (Koltchinskii and Panchenko, 2002) 2 log 2 2. Proof: Direct consequence of argin bound of Lecture 4 and R S (conv(h))=r S (H). 20
21 Margin Bound - Enseble Methods (Koltchinskii and Panchenko, 2002); see also (Schapire et al., 1998) Corollary: Let H be a faily of functions taking values in { 1, +1} with VC diension d. Fix >0. For any >0, with probability at least 1, the following holds for all h conv(h) : R(h) R (h)+ 2 2d log e d + log 1 2. Proof: Follows directly previous corollary and VC diension bound on Radeacher coplexity (see lecture 3). 21
22 Notes All of these bounds can be generalized to hold uniforly for all (0, 1), at the cost of an additional 2 log log ter 2 and other inor constant factor changes (Koltchinskii and Panchenko, 2002). For AdaBoost, the bound applies to the functions x Note that f(x) T 1 = T t=1 th t (x) 1 conv(h). does not appear in the bound. 22
23 Margin Distribution 1 Theore: For any Pr yf(x) 1 >0, the following holds: 2 T T t=1 1 t (1 t) 1+. Proof: Using the identity D t+1 (i)= e y i f(x i ), 1 yi f(x i ) = 1 = e 1 Q T t=1 Z t exp( y i f(x i )+ 1 ) e 1 T t=1 Z t =2 T T t=1 T Z t t=1 D T +1 (i) 1 t t t(1 t). 23
24 Notes If for all t [1,T], ( 1 2 t), then the upper bound can be bounded by Pr yf(x) 1 (1 2 ) 1 (1 + 2 ) 1+ T/2. For <, (1 2 ) 1 (1+2 ) 1+ <1 and the bound decreases exponentially in T. For the bound to be convergent: O(1/ ), thus O(1/ ) is roughly the condition on the edge value. 24
25 Outliers AdaBoost assigns larger weights to harder exaples. Application: Detecting islabeled exaples. Dealing with noisy data: regularization based on the average weight assigned to a point (soft argin idea for boosting) (Meir and Rätsch, 2003). 25
26 L1-Geoetric Margin Definition: the L 1 -argin f (x) of a linear function f = P T t=1 th t with 6= 0at a point x 2 X is defined by f (x) = f(x) k 1 = P T t=1 th t (x) k k 1 the -argin of over a saple is = h(x) k k 1. L 1 f S =(x 1,...,x ) its iniu argin at points in that saple: f = in f (x i )= i2[1,] in i2[1,] h(x i ) k k 1. 26
27 SVM vs AdaBoost features or base hypotheses SVM apple (x) = 1(x). N (x) AdaBoost apple h1 (x) h(x) =. h N (x) predictor x 7! w (x) x 7! h(x) geo. argin w (x) = d 2 ( (x), hyperpl.) kwk 2 h(x) k k 1 = d 1 (h(x), hyperpl.) conf. argin y(w (x)) y( h(x)) regularization kwk 2 k k 1 (L1-AB) 27
28 Maxiu-Margin Solutions Nor 2. Nor. 28
29 But, Does AdaBoost Maxiize the Margin? No: AdaBoost ay converge to a argin that is significantly below the axiu argin (Rudin et al., 2004) (e.g., 1/3 instead of 3/8)! Lower bound: AdaBoost can achieve asyptotically a argin that is at least ax if the data is separable 2 and soe conditions on the base learners hold (Rätsch and Waruth, 2002). Several boosting-type argin-axiization algoriths: but, perforance in practice not clear or not reported. 29
30 AdaBoost s Weak Learning Condition Definition: the edge of a base classifier h t for a distribution D over the training saple is (t) = 1 t = Condition: there exists >0 for any distribution over the training saple and any base classifier (t). y i h t (x i )D(i). D 30
31 Zero-Su Gaes Definition: payoff atrix M =(M ij ) R n. possible actions (pure strategy) for row player. n possible actions for colun player. M ij payoff for row player ( = loss for colun player) when row plays i, colun plays j. Exaple: rock paper scissors rock paper scissors
32 Mixed Strategies Definition: player row selects a distribution p over the rows, player colun a distribution qover coluns. The expected payoff for row is i j E p q von Neuann s iniax theore: ax p in q equivalent for: ax p [M ij ]= n j=1 p Mq =in q in p Me j =in j [1,n] q p i M ij q j = p Mq. ax p i p Mq. ax e i Mq. [1,] (von Neuann, 1928) 32
33 John von Neuann ( ) 33
34 AdaBoost and Gae Theory Gae: Player A: selects point x i, i [1,]. Player B: selects base learner h t, t [1,T]. M { 1, +1} T Payoff atrix : M it =y i h t (x i ). von Neuann s theore: assue finite H. 2 =in D ax h H D(i)y i h(x i )=ax in y i i [1,] T t=1 th t (x i ) 1 =. 34
35 Consequences Weak learning condition non-zero argin. thus, possible to search for non-zero argin. AdaBoost = (suboptial) search for corresponding ; achieves at least half of the axiu argin. Weak learning = strong condition: the condition iplies linear separability with argin 2 >0. = 35
36 Linear Prograing Proble Maxiizing the argin: This is equivalent to the following convex optiization LP proble: Note that: x 1 =ax in y i i [1,] ax subject to : y i ( x i ) 1 =1. ( x i ) 1. = x H, with H = {x: x =0}. 36
37 Advantages of AdaBoost Siple: straightforward ipleentation. Efficient: coplexity N T O(NT ) for stups: when and are not too large, the algorith is quite fast. Theoretical guarantees: but still any questions. AdaBoost not designed to axiize argin. regularized versions of AdaBoost. 37
38 Weaker Aspects Paraeters: need to deterine T, the nuber of rounds of boosting: stopping criterion. need to deterine base learners: risk of overfitting or low argins. Noise: severely daages the accuracy of Adaboost (Dietterich, 2000). 38
39 Other Boosting Algoriths arc-gv (Breian, 1996): designed to axiize the argin, but outperfored by AdaBoost in experients (Reyzin and Schapire, 2006). L1-regularized AdaBoost (Raetsch et al., 2001): outperfos AdaBoost in experients (Cortes et al., 2014). DeepBoost (Cortes et al., 2014): ore favorable learning guarantees, outperfors both AdaBoost and L1-regularized AdaBoost in experients. 39
40 References Corinna Cortes, Mehryar Mohri, and Uar Syed. Deep boosting. In ICML, s , Leo Breian. Bagging predictors. Machine Learning, 24(2): , Thoas G. Dietterich. An experiental coparison of three ethods for constructing ensebles of decision trees: bagging, boosting, and randoization. Machine Learning, 40(2): , Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Coputer and Syste Sciences, 55(1): , G. Lebanon and J. Lafferty. Boosting and axiu likelihood for exponential odels. In NIPS, s , Ron Meir and Gunnar Rätsch. An introduction to boosting and leveraging. In Advanced Lectures on Machine Learning (LNAI2600), J. von Neuann. Zur Theorie der Gesellschaftsspiele. Matheatische Annalen, 100: ,
41 References Cynthia Rudin, Ingrid Daubechies and Robert E. Schapire. The dynaics of AdaBoost: Cyclic behavior and convergence of argins. Journal of Machine Learning Research, 5: , Rätsch, G., and Waruth, M. K. (2002) Maxiizing the Margin with Boosting, in Proceedings of the 15th Annual Conference on Coputational Learning Theory (COLT 02), Sidney, Australia, pp , July Reyzin, Lev and Schapire, Robert E. How boosting the argin can also boost classifier coplexity. In ICML, s , Robert E. Schapire. The boosting approach to achine learning: An overview. In D. D. Denison, M. H. Hansen, C. Holes, B. Mallick, B. Yu, editors, Nonlinear Estiation and Classification. Springer, Robert E. Schapire and Yoav Freund. Boosting, Foundations and Algoriths. The MIT Press, Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee. Boosting the argin: A new explanation for the effectiveness of voting ethods. The Annals of Statistics, 26(5): ,
Introduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 11 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Boosting Mehryar Mohri - Introduction to Machine Learning page 2 Boosting Ideas Main idea:
More informationRademacher Complexity Margin Bounds for Learning with a Large Number of Classes
Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute
More informationCombining Classifiers
Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/
More information1 Rademacher Complexity Bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability
More informationAdvanced Machine Learning
Advanced Machine Learning Deep Boosting MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Model selection. Deep boosting. theory. algorithm. experiments. page 2 Model Selection Problem:
More informationDeep Boosting. Abstract. 1. Introduction
Corinna Cortes Google Research, 8th Avenue, New York, NY Mehryar Mohri Courant Institute and Google Research, 25 Mercer Street, New York, NY 2 Uar Syed Google Research, 8th Avenue, New York, NY Abstract
More information1 Bounding the Margin
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost
More informationA Smoothed Boosting Algorithm Using Probabilistic Output Codes
A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu
More informationDeep Boosting. Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) COURANT INSTITUTE & GOOGLE RESEARCH.
Deep Boosting Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Ensemble Methods in ML Combining several base classifiers
More informationE0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis
E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds
More informationSupport Vector Machines. Goals for the lecture
Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed
More informationBoosting with log-loss
Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the
More informationBayes Decision Rule and Naïve Bayes Classifier
Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationDeep Boosting. Abstract. 1. Introduction
Corinna Cortes Google Research, 8th Avenue, New York, NY 00 Mehryar Mohri Courant Institute and Google Research, 5 Mercer Street, New York, NY 00 Uar Syed Google Research, 8th Avenue, New York, NY 00 Abstract
More informationIntroduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 13 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification Mehryar Mohri - Introduction to Machine Learning page 2 Motivation
More informationKernel Methods and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic
More informationSupport Vector Machines. Maximizing the Margin
Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the
More informationPAC-Bayes Analysis Of Maximum Entropy Learning
PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E
More informationSupport Vector Machines MIT Course Notes Cynthia Rudin
Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More informationRobustness and Regularization of Support Vector Machines
Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA
More informationCS Lecture 13. More Maximum Likelihood
CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood
More informationFoundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification page 2 Motivation Real-world problems often have multiple classes:
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher
More informationFoundations of Machine Learning Multi-Class Classification. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Multi-Class Classification Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation Real-world problems often have multiple classes: text, speech,
More informationA MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION
A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University
More informationAccuracy at the Top. Abstract
Accuracy at the Top Stephen Boyd Stanford University Packard 264 Stanford, CA 94305 boyd@stanford.edu Mehryar Mohri Courant Institute and Google 25 Mercer Street New York, NY 002 ohri@cis.nyu.edu Corinna
More informationEnsemble Based on Data Envelopment Analysis
Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial
More information1 Generalization bounds based on Rademacher complexity
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More informationIntelligent Systems: Reasoning and Recognition. Artificial Neural Networks
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial
More informationPAC-Bayesian Learning of Linear Classifiers
Pascal Gerain Pascal.Gerain.@ulaval.ca Alexandre Lacasse Alexandre.Lacasse@ift.ulaval.ca François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Départeent d inforatique
More informationLecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon
Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes
More informationSupport Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab
Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a
More informationSupport Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization
Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering
More informationSharp Time Data Tradeoffs for Linear Inverse Problems
Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used
More informationBoosting with Abstention
Boosting with Abstention Corinna Cortes Google Research New York, NY corinna@google.co Giulia DeSalvo Courant Institute New York, NY desalvo@cis.nyu.edu Mehryar Mohri Courant Institute and Google New York,
More informationStability Bounds for Non-i.i.d. Processes
tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer
More informationSoft-margin SVM can address linearly separable problems with outliers
Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Matheatical Sciences, 251 Mercer Street, New York,
More informationMachine Learning: Fisher s Linear Discriminant. Lecture 05
Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05 upervised Learning ask learn an (unkon) function
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More information1 Proof of learning bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a
More informationComputable Shell Decomposition Bounds
Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung
More informationE0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)
E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how
More informationTopic 5a Introduction to Curve Fitting & Linear Regression
/7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline
More informationComputable Shell Decomposition Bounds
Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 1,2 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Matheatical Sciences, 251 Mercer Street, New York,
More informationLeast Squares Fitting of Data
Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a
More informationLearning with Deep Cascades
Learning with Deep Cascades Giulia DeSalvo 1, Mehryar Mohri 1,2, and Uar Syed 2 1 Courant Institute of Matheatical Sciences, 251 Mercer Street, New Yor, NY 10012 2 Google Research, 111 8th Avenue, New
More informationRandomized Recovery for Boolean Compressed Sensing
Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch
More informationModel Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon
Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential
More informationBoosting with Abstention
Boosting with Abstention Corinna Cortes Google Research New York, NY 00 corinna@google.co Giulia DeSalvo Courant Institute New York, NY 00 desalvo@cis.nyu.edu Mehryar Mohri Courant Institute and Google
More informationStructured Prediction Theory Based on Factor Graph Complexity
Structured Prediction Theory Based on Factor Graph Coplexity Corinna Cortes Google Research New York, NY 00 corinna@googleco Mehryar Mohri Courant Institute and Google New York, NY 00 ohri@cisnyuedu Vitaly
More informationUnderstanding Machine Learning Solution Manual
Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More informationCh 12: Variations on Backpropagation
Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationGeometrical intuition behind the dual problem
Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1 Geoetrical
More informatione-companion ONLY AVAILABLE IN ELECTRONIC FORM
OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer
More informationProbabilistic Machine Learning
Probabilistic Machine Learning by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Probabilistic Linear Regression I... Maxiu Likelihood Solution II... Maxiu-a-Posteriori
More informationSupplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data
Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse
More informationThe Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016
The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets
More informationFoundations of Machine Learning Kernel Methods. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Kernel Methods Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Motivation Efficient coputation of inner products in high diension. Non-linear decision
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More information13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices
CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay
More informationDomain-Adversarial Neural Networks
Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada
More informationBayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)
Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu
More informationVC Dimension and Sauer s Lemma
CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions
More informationOn the Impact of Kernel Approximation on Learning Accuracy
On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationUNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.
UIVRSITY OF TRTO DIPARTITO DI IGGRIA SCIZA DLL IFORAZIO 3823 Povo Trento (Italy) Via Soarive 4 http://www.disi.unitn.it O TH US OF SV FOR LCTROAGTIC SUBSURFAC SSIG A. Boni. Conci A. assa and S. Piffer
More informationA Note on the Applied Use of MDL Approximations
A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention
More informationStochastic Subgradient Methods
Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods
More informationNyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison
yströ Method vs : A Theoretical and Epirical Coparison Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou Machine Learning Lab, GE Global Research, San Raon, CA 94583 Michigan State University,
More informationEstimating Parameters for a Gaussian pdf
Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3
More informationList Scheduling and LPT Oliver Braun (09/05/2017)
List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)
More informationarxiv: v1 [cs.ds] 3 Feb 2014
arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/
More informationConsistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material
Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable
More informationConvex Programming for Scheduling Unrelated Parallel Machines
Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly
More informationMultiple Instance Learning with Query Bags
Multiple Instance Learning with Query Bags Boris Babenko UC San Diego bbabenko@cs.ucsd.edu Piotr Dollár California Institute of Technology pdollar@caltech.edu Serge Belongie UC San Diego sjb@cs.ucsd.edu
More informationBoosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch
. Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de
More informationInteractive Markov Models of Evolutionary Algorithms
Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary
More informationLearnability and Stability in the General Learning Setting
Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu
More informationNew Bounds for Learning Intervals with Implications for Semi-Supervised Learning
JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University
More informationarxiv: v1 [cs.lg] 8 Jan 2019
Data Masking with Privacy Guarantees Anh T. Pha Oregon State University phatheanhbka@gail.co Shalini Ghosh Sasung Research shalini.ghosh@gail.co Vinod Yegneswaran SRI international vinod@csl.sri.co arxiv:90.085v
More informationBounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks
Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University
More informationResearch in Area of Longevity of Sylphon Scraies
IOP Conference Series: Earth and Environental Science PAPER OPEN ACCESS Research in Area of Longevity of Sylphon Scraies To cite this article: Natalia Y Golovina and Svetlana Y Krivosheeva 2018 IOP Conf.
More informationFoundations of Machine Learning Lecture 5. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Lecture 5 Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Kernel Methods Motivation Non-linear decision boundary. Efficient coputation of inner products
More informationA Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)
1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu
More informationLecture 21. Interior Point Methods Setup and Algorithm
Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and
More informationSymmetrization and Rademacher Averages
Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More informationCOMS 4771 Lecture Boosting 1 / 16
COMS 4771 Lecture 12 1. Boosting 1 / 16 Boosting What is boosting? Boosting: Using a learning algorithm that provides rough rules-of-thumb to construct a very accurate predictor. 3 / 16 What is boosting?
More informationPrecise Statements of Convergence for AdaBoost and arc-gv
Contemporary Mathematics Precise Statements of Convergence for AdaBoost and arc-gv Cynthia Rudin, Robert E Schapire, and Ingrid Daubechies We wish to dedicate this paper to Leo Breiman Abstract We present
More informationPredictive Vaccinology: Optimisation of Predictions Using Support Vector Machine Classifiers
Predictive Vaccinology: Optiisation of Predictions Using Support Vector Machine Classifiers Ivana Bozic,2, Guang Lan Zhang 2,3, and Vladiir Brusic 2,4 Faculty of Matheatics, University of Belgrade, Belgrade,
More information