Advanced Machine Learning
|
|
- Belinda Perry
- 5 years ago
- Views:
Transcription
1 Advanced Machine Learning Learning Kernels MEHRYAR MOHRI COURANT INSTITUTE & GOOGLE RESEARCH.
2 Outline Kernel methods. Learning kernels scenario. learning bounds. algorithms. page 2
3 Machine Learning Components user features sample algorithm h page 3
4 Machine Learning Components user features sample critical task algorithm main focus of ML literature h page 4
5 Kernel Methods Features PDS kernel K : X H K interpreted as a similarity measure. implicitly defined via the choice of a x, y X, (x) (y) =K(x, y). Flexibility: PDS kernel can be chosen arbitrarily. Help extend a variety of algorithms to non-linear predictors, e.g., SVMs, KRR, SVR, KPCA. PDS condition directly related to convexity of optimization problem. page 5
6 Example - Polynomial Kernels Definition: x, y R N,K(x, y) =(x y + c) d, c > 0. N =2 d=2 Example: for and, K(x, y) =(x 1 y 1 + x 2 y 2 + c) 2 = x 2 1 x x 1 x 2 2cx 1 2cx 2 c y 2 1 y y 1 y 2 2cy 1 2cy 2 c. page 6
7 XOR Problem c =1 Use second-degree polynomial kernel with : ( 1, 1) x 2 (1, 1) 2 x1 x 2 (1, 1, + 2, 2, 2, 1) (1, 1, + 2, + 2, + 2, 1) x 1 2 x1 ( 1, 1) (1, 1) (1, 1, 2, 2, + 2, 1) (1, 1, 2, + 2, 2, 1) Linearly non-separable Linearly separable by x 1 x 2 =0. page 7
8 Other Standard PDS Kernels Gaussian kernels: K(x, y) =exp Normalized kernel of x y 2 2 2, =0. (x, x ) exp x x 2. Sigmoid Kernels: K(x, y) =tanh(a(x y)+b), a,b 0. page 8
9 Primal: Dual: min w,b 1 2 w 2 + C max m i=1 i SVM m i=1 1 2 subject to: 0 i C (Cortes and Vapnik, 1995; Boser, Guyon, and Vapnik, 1992) 1 y i (w K(x i )+b) m i,j=1 m i=1 i jy i y j K(x i,x j ) iy i =0,i [1,m].. + page 9
10 Kernel Ridge Regression Primal: Dual: min w w 2 + max R m m i=1 (Hoerl and Kennard, 1970; Sanders et al., 1998) (w K(x i )+b y i ) 2. (K + I) +2 y. page 10
11 Questions How should the user choose the kernel? problem similar to that of selecting features for other learning algorithms. poor choice good choice learning made very difficult. even poor learners could succeed. The requirement from the user is thus critical. can this requirement be lessened? is a more automatic selection of features possible? page 11
12 Outline Kernel methods. Learning kernels scenario. learning bounds. algorithms. page 12
13 Standard Learning with Kernels user kernel K sample algorithm h page 13
14 Learning Kernel Framework user kernel family K sample algorithm (K, h) page 14
15 Kernel Families q 1 Most frequently used kernel families,, K q = K µ : K µ = p k=1 µ k K k, µ = µ 1. µ p q with q = µ : µ 0, µ q =1. Hypothesis sets: H q = h H K : K K q, h HK 1. page 15
16 Relation between Norms Lemma: for p, q (0, + ], the following holds: x R N,p q x q x p N 1 p 1 q x q. x =0 Proof: for the left inequalities, observe that for, apple p kxkp NX apple p xi X N apple q xi = =1. kxk q kxk i=1 q kxk {z } i=1 q apple1 Right inequalities follow immediately Hölder s inequality: x p = N i=1 x i p 1 p N i=1 ( x i p ) q p p q N i=1 (1) q q p 1 p q 1 p = x q N 1 p 1 q. page 16
17 Single Kernel Guarantee >0 >0 1 h H 1 (Koltchinskii and Panchenko, 2002) Theorem: fix. Then, for any, with probability at least, the following holds for all, R(h) R (h)+ 2 Tr[K] m + log 1 2m. page 17
18 Multiple Kernel Guarantee >0 (Cortes, MM, and Rostamizadeh, 2010) Theorem: fix. Let with. Then, for any >0, with probability at least 1, the following holds for all h and any integer 1 s r: H q R(h) apple b R (h)+ 2 1 q, r 1 q + 1 r =1 p skuks m + with u =(Tr[K 1 ],...,Tr[K p ]). s log 1 2m, page 18
19 Proof 1 q, r 1 q + 1 r =1 Let with. br S (H q )= 1 m E h = 1 m E h = 1 m E h = 1 m E h = 1 m E h sup h2h q mx i=1 i ih(x i ) sup µ2 q, > K µ apple1 sup µ2 q, > K µ apple1 sup µ2 q q > K µ i mx i,j=1 i i j K µ (x i,x j ) i > K µ = 1 h m E sup µ2 q,k k K 1/2 µ p i sup µ u u =( > K 1,..., µ2 q h, i 1/2 K µ apple1 i (Cauchy-Schwarz) > K p ) > ) = 1 m E p ku kr. (definition of dual norm) page 19
20 K Lemma Lemma: Let be a kernel matrix for a finite sample. Then, for any integer r, E h ( > K ) ri apple (Cortes, MM, and Rostamizadeh, 2010) r Tr[K] r. Proof: combinatorial argument. page 20
21 Proof 1 s r For any, br S (H q )= 1 m E p ku kr apple 1 m E p ku ks = 1 hh p m E X ( > K k ) si 1 k=1 2s i apple 1 h h X p E ( > K k ) sii 1 m k=1 = 1 h X p E m k=1 apple 1 h X p m k=1 2s h( > K k ) sii 1 2s s i 1 2s s Tr[K k ] = (Jensen s inequality) p skuks m. (lemma) page 21
22 L 1 Learning Bound Corollary: fix. For any, with probability, the following holds for all : R(h) apple b R (h)+ 2 weak dependency on. bound valid for. Tr[K k ] (Cortes, MM, and Rostamizadeh, 2010) >0 >0 1 h H 1 p m max x r edlog pe p m K k(x, x). p max k=1 Tr[K k] m + s log 1 2m. page 22
23 Proof For with q =1, the bound holds for any integer s 1 R(h) apple b R (h)+ 2 skuk s = s " p X k=1 p skuks m + Tr[K k ] s # 1 s apple sp 1 s p max k=1 Tr[K k]. s sp 1 s log p The function reaches it minimum at. s log 1 2m, page 23
24 Lower Bound Tight bound: dependency cannot be improved. argument based on VC dimension or example. Observations: case. canonical projection kernels. H 1 log p contains J p ={x sx k : k [1,p],s {-1, +1}}. VCdim(J p )= (logp). =1 h J p X={-1, +1} p K k (x, x )=x k x k R (h)=r(h) for and,. VCdim(J p )/m VC lower bound:. page 24
25 R(h) Pseudo-Dimension Bound k [1,p],K k (x, x) R 2 >0 1 Assume that for all. Then, for any, with probability at least, for any, R (h)+ bound additive in 8 128em3R2 2+plog 2 p +256 R2 2 log em 8R m not informative for. p (modulo log terms). p>m based on pseudo-dimension of kernel family. similar guarantees for other families. (Srebro and Ben-David, 2006) 128mR2 log h H log(1/ ). page 25
26 Comparison /R=.2 page 26
27 L q Learning Bound >0 Corollary: fix. Let with. Then, for any >0, with probability at least 1, the following holds for all h : H q R(h) apple b R (h)+ 2 mild dependency on p. (Cortes, MM, and Rostamizadeh, 2010) 1 q, r 1 q + 1 r =1 q rp 1 r max p k=1 Tr[K k] m + s log 1 2m, Tr[K k ] m max x K k(x, x). page 27
28 Lower Bound Tight bound: dependency p 1 2r cannot be improved. p 1 4 L 2 in particular tight for regularization. Observations: equal kernels. p k=1 µ k K 1. h 2 p H K1 =( p k=1 µ k p 1 r µ q = p 1 r H q k=1 µ kk k = p thus, for. k=1 µ k) h 2 p H K k=1 µ k =0 (Hölder s inequality). {h H K1 : h HK1 p 1 2r } coincides with. page 28
29 Outline Kernel methods. Learning kernels scenario. learning bounds. algorithms. page 29
30 General LK Formulation - SVMs Notation: Kset of PDS kernel functions. K kernel matrices associated to, assumed convex. Y R m m Y ii =y i Optimization problem: K diagonal matrix with. min max 2 1 Y KY K K subject to: 0 C y =0. convex problem: function linear in pointwise maximum. K, convexity of page 30
31 Parameterized LK Formulation Notation: (K µ ) µ parameterized set of PDS kernel functions. convex set, µ K µ concave function. Y R m m Y ii =y i Optimization problem: diagonal matrix with. min µ convex problem: function convex in pointwise maximum. max 2 1 Y K µ Y subject to: 0 C y =0. µ, convexity of page 31
32 Non-Negative Combinations K µ = p k=1 µ kk k µ 1,. By von Neumann s generalized minimax theorem (convexity wrt µ, concavity wrt, 1 convex and compact, A convex and compact): min µ 1 =max A min µ 1 max A 2 1 Y K µ Y =max A 2 1 max µ 1 =max 2 1 max A k [1,p] 2 1 Y K µ Y Y K µ Y Y K k Y. page 32
33 Non-Negative Combinations Optimization problem: in view of the previous analysis, the problem can be rewritten as the following QCQP. max,t 2 1 t subject to: k [1,p],t Y K k Y ; 0 C y =0. complexity (interior-point methods): O(pm 3 ). (Lanckriet et al., 2004) page 33
34 Equivalent Primal Formulation Optimization problem: 1 px kw k k 2 mx 2 min + C max w,µ2 q 2 µ k k=1 i=1 (!) px 0, 1 y i w k k(x i ). k=1 page 34
35 Lots of Optimization Solutions QCQP (Lanckriet et al., 2004). Wrapper methods interleaving call to SVM solver and update of µ : SILP (Sonnenburg et al., 2006). Reduced gradient (SimpleML) (Rakotomamonjy et al., 2008). Newton s method (Kloft et al., 2009). Mirror descent (Nath et al., 2009). On-line method (Orabona & Jie, 2011). Many other methods proposed. page 35
36 Does It Work? Experiments: this algorithm and its different optimization solutions often do not significantly outperform the simple uniform combination kernel in practice! observations corroborated by NIPS workshops. Alternative algorithms: significant improvement (see empirical results of (Gönen and Alpaydin, 2011)). centered alignment-based LK algorithms (Cortes, MM, and Rostamizadeh, 2010 and 2012). non-linear combination of kernels (Cortes, MM, and Rostamizadeh, 2009). page 36
37 LK Formulation - KRR Kernel family: non-negative combinations. L q regularization. (Cortes, MM, and Rostamizadeh, 2009) Optimization problem: min µ max p k=1 subject to: µ 0 µ µ 0 q. µ k K k +2 y convex optimization: linearity in pointwise maximum. µ and convexity of page 37
38 Projected Gradient Solving maximization problem in, closed-form solution, reduces problem to =(K µ + I) 1 y min µ y (K µ + I) 1 y subject to: µ 0 µ µ 0 2. Convex optimization problem, one solution using projection-based gradient descent: F µ k =Tr y (K µ + I) 1 y (K µ + I) (K µ + I) µ k = Tr (K µ + I) 1 yy (K µ + I) 1 (K µ + I) µ k = Tr (K µ + I) 1 yy (K µ + I) 1 K k = y (K µ + I) 1 K k (K µ + I) 1 y = K k. page 38
39 Proj. Grad. KRR - L 2 Reg. ProjectionBasedGradientDescent((K k ) k [1,p], µ 0 ) 1 µ µ 0 2 µ 3 while µ µ > do 4 µ µ 5 (K µ + I) 1 y 6 µ µ + ( K 1,..., K p ) 7 for k 1 to p do 8 µ k max(0, µ k ) 9 µ µ 0 + µ µ 0 µ µ 0 10 return µ page 39
40 Interpolated Step KRR - L 2 Reg. InterpolatedIterativeAlgorithm((K k ) k [1,p], µ 0 ) 1 2 (K µ0 + I) 1 y 3 while > do 4 5 v ( K 1,..., K p ) 6 µ µ 0 + v v 7 +(1 )(K µ + I) 1 y 8 return Simple and very efficient: few iterations (less than15). page 40
41 L 2 -Regularized Combinations (Cortes, MM, and Rostamizadeh, 2009) Dense combinations are beneficial when using many kernels. Combining kernels based on single features, can be viewed as principled feature weighting DVD baseline L 1 L Reuters (acq) baseline L 2 L page 41
42 Conclusion Solid theoretical guarantees suggesting the use of a large number of base kernels. Broad literature on optimization techniques but often no significant improvement over uniform combination. Recent algorithms with significant improvements, in particular non-linear combinations. Still many theoretical and algorithmic questions left to explore. page 42
43 References Bousquet, Olivier and Herrmann, Daniel J. L. On the complexity of learning the kernel matrix. In NIPS, Corinna Cortes, Marius Kloft, and Mehryar Mohri. Learning kernels using local Rademacher complexity. In Proceedings of NIPS, Corinna Cortes, Mehryar Mohri, and Afshin Rostamizadeh. Learning non-linear combinations of kernels. In NIPS, Cortes, Corinna, Mohri, Mehryar, and Rostamizadeh, Afshin. Generalization Bounds for Learning Kernels. In ICML, Cortes, Corinna, Mohri, Mehryar, and Rostamizadeh, Afshin. Two-stage learning kernel methods. In ICML, Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh. Algorithms for Learning Kernels Based on Centered Alignment. JMLR 13: , page 43
44 References Corinna Cortes, Mehryar Mohri, and Afshin Rostamizadeh. Tutorial: Learning Kernels. ICML 2011, Bellevue, Washington, July Zakria Hussain, John Shawe-Taylor. Improved Loss Bounds For Multiple Kernel Learning. In AISTATS, 2011 [see arxiv for corrected version]. Sham M. Kakade, Shai Shalev-Shwartz, Ambuj Tewari: Regularization Techniques for Learning with Matrices. JMLR 13: , Koltchinskii, V. and Panchenko, D. Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics, 30, Koltchinskii, Vladimir and Yuan, Ming. Sparse recovery in large ensembles of kernel machines on-line learning and bandits. In COLT, Lanckriet, Gert, Cristianini, Nello, Bartlett, Peter, Ghaoui, Laurent El, and Jordan, Michael. Learning the kernel matrix with semidefinite programming. JMLR, 5, page 44
45 References Mehmet Gönen, Ethem Alpaydin: Multiple Kernel Learning Algorithms. JMLR 12: (2011). Srebro, Nathan and Ben-David, Shai. Learning bounds for support vector machines with learned kernels. In COLT, Ying, Yiming and Campbell, Colin. Generalization bounds for learning the kernel problem. In COLT, page 45
Learning Kernels -Tutorial Part III: Theoretical Guarantees.
Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley
More informationAlgorithms for Learning Kernels Based on Centered Alignment
Journal of Machine Learning Research 3 (2012) XX-XX Submitted 01/11; Published 03/12 Algorithms for Learning Kernels Based on Centered Alignment Corinna Cortes Google Research 76 Ninth Avenue New York,
More informationAdvanced Machine Learning
Advanced Machine Learning Deep Boosting MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Model selection. Deep boosting. theory. algorithm. experiments. page 2 Model Selection Problem:
More informationDomain Adaptation for Regression
Domain Adaptation for Regression Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Motivation Applications: distinct training and test distributions.
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,
More informationMULTIPLEKERNELLEARNING CSE902
MULTIPLEKERNELLEARNING CSE902 Multiple Kernel Learning -keywords Heterogeneous information fusion Feature selection Max-margin classification Multiple kernel learning MKL Convex optimization Kernel classification
More informationDeep Boosting. Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) COURANT INSTITUTE & GOOGLE RESEARCH.
Deep Boosting Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Ensemble Methods in ML Combining several base classifiers
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationFoundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification page 2 Motivation Real-world problems often have multiple classes:
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationIntroduction to Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Kernel Methods Motivation Non-linear decision boundary. Efficient computation of inner
More informationADANET: adaptive learning of neural networks
ADANET: adaptive learning of neural networks Joint work with Corinna Cortes (Google Research) Javier Gonzalo (Google Research) Vitaly Kuznetsov (Google Research) Scott Yang (Courant Institute) MEHRYAR
More informationRademacher Complexity Bounds for Non-I.I.D. Processes
Rademacher Complexity Bounds for Non-I.I.D. Processes Mehryar Mohri Courant Institute of Mathematical ciences and Google Research 5 Mercer treet New York, NY 00 mohri@cims.nyu.edu Afshin Rostamizadeh Department
More informationRademacher Bounds for Non-i.i.d. Processes
Rademacher Bounds for Non-i.i.d. Processes Afshin Rostamizadeh Joint work with: Mehryar Mohri Background Background Generalization Bounds - How well can we estimate an algorithm s true performance based
More informationMulti-class classification via proximal mirror descent
Multi-class classification via proximal mirror descent Daria Reshetova Stanford EE department resh@stanford.edu Abstract We consider the problem of multi-class classification and a stochastic optimization
More informationAdvanced Machine Learning
Advanced Machine Learning Domain Adaptation MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Non-Ideal World time real world ideal domain sampling page 2 Outline Domain adaptation. Multiple-source
More informationLearning Weighted Automata
Learning Weighted Automata Joint work with Borja Balle (Amazon Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Weighted Automata (WFAs) page 2 Motivation Weighted automata (WFAs): image
More informationStochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization
Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Shai Shalev-Shwartz and Tong Zhang School of CS and Engineering, The Hebrew University of Jerusalem Optimization for Machine
More informationFoundations of Machine Learning Multi-Class Classification. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Multi-Class Classification Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation Real-world problems often have multiple classes: text, speech,
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationSample Selection Bias Correction
Sample Selection Bias Correction Afshin Rostamizadeh Joint work with: Corinna Cortes, Mehryar Mohri & Michael Riley Courant Institute & Google Research Motivation Critical Assumption: Samples for training
More informationEfficient Bandit Algorithms for Online Multiclass Prediction
Efficient Bandit Algorithms for Online Multiclass Prediction Sham Kakade, Shai Shalev-Shwartz and Ambuj Tewari Presented By: Nakul Verma Motivation In many learning applications, true class labels are
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationMachine Learning in the Data Revolution Era
Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,
More informationSample-adaptive Multiple Kernel Learning
Sample-adaptive Multiple Kernel Learning Xinwang Liu, Lei Wang, Jian Zhang, Jianping Yin Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology
More informationIntroduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 13 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification Mehryar Mohri - Introduction to Machine Learning page 2 Motivation
More informationSupport Vector Machine for Classification and Regression
Support Vector Machine for Classification and Regression Ahlame Douzal AMA-LIG, Université Joseph Fourier Master 2R - MOSIG (2013) November 25, 2013 Loss function, Separating Hyperplanes, Canonical Hyperplan
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationIntroduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 11 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Boosting Mehryar Mohri - Introduction to Machine Learning page 2 Boosting Ideas Main idea:
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationLearning with Imperfect Data
Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Joint work with: Yishay Mansour (Tel-Aviv & Google) and Afshin Rostamizadeh (Courant Institute). Standard Learning Assumptions IID assumption.
More informationPerceptron Mistake Bounds
Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationStructured Prediction
Structured Prediction Ningshan Zhang Advanced Machine Learning, Spring 2016 Outline Ensemble Methods for Structured Prediction[1] On-line learning Boosting AGeneralizedKernelApproachtoStructuredOutputLearning[2]
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationWarm up. Regrade requests submitted directly in Gradescope, do not instructors.
Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required
More informationConvex Optimization in Classification Problems
New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Andreas Maletti Technische Universität Dresden Fakultät Informatik June 15, 2006 1 The Problem 2 The Basics 3 The Proposed Solution Learning by Machines Learning
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationSupport Vector and Kernel Methods
SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationReferences. Lecture 7: Support Vector Machines. Optimum Margin Perceptron. Perceptron Learning Rule
References Lecture 7: Support Vector Machines Isabelle Guyon guyoni@inf.ethz.ch An training algorithm for optimal margin classifiers Boser-Guyon-Vapnik, COLT, 992 http://www.clopinet.com/isabelle/p apers/colt92.ps.z
More informationOn the tradeoff between computational complexity and sample complexity in learning
On the tradeoff between computational complexity and sample complexity in learning Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Joint work with Sham
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationStructured Prediction Theory and Algorithms
Structured Prediction Theory and Algorithms Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Google Research) Scott Yang (Courant Institute) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE
More informationMathematical Programming for Multiple Kernel Learning
Mathematical Programming for Multiple Kernel Learning Alex Zien Fraunhofer FIRST.IDA, Berlin, Germany Friedrich Miescher Laboratory, Tübingen, Germany 07. July 2009 Mathematical Programming Stream, EURO
More informationMachine Learning : Support Vector Machines
Machine Learning Support Vector Machines 05/01/2014 Machine Learning : Support Vector Machines Linear Classifiers (recap) A building block for almost all a mapping, a partitioning of the input space into
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationA Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices
A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22
More informationAccelerating Online Convex Optimization via Adaptive Prediction
Mehryar Mohri Courant Institute and Google Research New York, NY 10012 mohri@cims.nyu.edu Scott Yang Courant Institute New York, NY 10012 yangs@cims.nyu.edu Abstract We present a powerful general framework
More informationInverse Time Dependency in Convex Regularized Learning
Inverse Time Dependency in Convex Regularized Learning Zeyuan A. Zhu (Tsinghua University) Weizhu Chen (MSRA) Chenguang Zhu (Tsinghua University) Gang Wang (MSRA) Haixun Wang (MSRA) Zheng Chen (MSRA) December
More informationSupport Vector Machine & Its Applications
Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia
More informationMehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013 A. Kernels 1. Let X be a finite set. Show that the kernel
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationBoosting Ensembles of Structured Prediction Rules
Boosting Ensembles of Structured Prediction Rules Corinna Cortes Google Research 76 Ninth Avenue New York, NY 10011 corinna@google.com Vitaly Kuznetsov Courant Institute 251 Mercer Street New York, NY
More informationSupport Vector Machines Explained
December 23, 2008 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationLearning the Kernel Matrix with Semidefinite Programming
Journal of Machine Learning Research 5 (2004) 27-72 Submitted 10/02; Revised 8/03; Published 1/04 Learning the Kernel Matrix with Semidefinite Programg Gert R.G. Lanckriet Department of Electrical Engineering
More informationSupport Vector Machines
Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6
More informationLearning the Kernel Matrix with Semi-Definite Programming
Learning the Kernel Matrix with Semi-Definite Programg Gert R.G. Lanckriet gert@cs.berkeley.edu Department of Electrical Engineering and Computer Science University of California, Berkeley, CA 94720, USA
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationA Note on Extending Generalization Bounds for Binary Large-Margin Classifiers to Multiple Classes
A Note on Extending Generalization Bounds for Binary Large-Margin Classifiers to Multiple Classes Ürün Dogan 1 Tobias Glasmachers 2 and Christian Igel 3 1 Institut für Mathematik Universität Potsdam Germany
More informationAdvanced Machine Learning
Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.
More information1 Graph Kernels by Spectral Transforms
Graph Kernels by Spectral Transforms Xiaojin Zhu Jaz Kandola John Lafferty Zoubin Ghahramani Many graph-based semi-supervised learning methods can be viewed as imposing smoothness conditions on the target
More informationapproximation algorithms I
SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 201 meta-task encoded as low-degree polynomial in R x example: f(x) = i,j n w ij x i x j 2 given: functions
More informationComposite Loss Functions and Multivariate Regression; Sparse PCA
Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationThe definitions and notation are those introduced in the lectures slides. R Ex D [h
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 04, 2016 Due: October 18, 2016 A. Rademacher complexity The definitions and notation
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationLarge-Scale Training of SVMs with Automata Kernels
Large-Scale Training of SVMs with Automata Kernels Cyril Allauzen, Corinna Cortes, and Mehryar Mohri, Google Research, 76 Ninth Avenue, New York, NY Courant Institute of Mathematical Sciences, 5 Mercer
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationUniform concentration inequalities, martingales, Rademacher complexity and symmetrization
Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization John Duchi Outline I Motivation 1 Uniform laws of large numbers 2 Loss minimization and data dependence II Uniform
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationApplied inductive learning - Lecture 7
Applied inductive learning - Lecture 7 Louis Wehenkel & Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège Montefiore - Liège - November 5, 2012 Find slides: http://montefiore.ulg.ac.be/
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationMachine Learning Lecture 6 Note
Machine Learning Lecture 6 Note Compiled by Abhi Ashutosh, Daniel Chen, and Yijun Xiao February 16, 2016 1 Pegasos Algorithm The Pegasos Algorithm looks very similar to the Perceptron Algorithm. In fact,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationDistributed Box-Constrained Quadratic Optimization for Dual Linear SVM
Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM Lee, Ching-pei University of Illinois at Urbana-Champaign Joint work with Dan Roth ICML 2015 Outline Introduction Algorithm Experiments
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationLASSO Review, Fused LASSO, Parallel LASSO Solvers
Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable
More informationStatistical Machine Learning for Structured and High Dimensional Data
Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,
More informationA Distributed Solver for Kernelized SVM
and A Distributed Solver for Kernelized Stanford ICME haoming@stanford.edu bzhe@stanford.edu June 3, 2015 Overview and 1 and 2 3 4 5 Support Vector Machines and A widely used supervised learning model,
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationRisk Bounds for Lévy Processes in the PAC-Learning Framework
Risk Bounds for Lévy Processes in the PAC-Learning Framework Chao Zhang School of Computer Engineering anyang Technological University Dacheng Tao School of Computer Engineering anyang Technological University
More informationSupport Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines II CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Outline Linear SVM hard margin Linear SVM soft margin Non-linear SVM Application Linear Support Vector Machine An optimization
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More information