Learning Kernels -Tutorial Part III: Theoretical Guarantees.
|
|
- Dorcas Walsh
- 5 years ago
- Views:
Transcription
1 Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research Mehryar Mohri Courant Institute & Google Research Afshin Rostami UC Berkeley berkeley.edu
2 Standard Learning with Kernels user kernel K sample algorithm h 2
3 Learning Kernel Framework user kernel family K sample algorithm (K, h) 3
4 Learning Kernels Theoretical questions: what is the price to pay for relaxing the requirement from the user to specify a kernel? how does the choice of the kernel family affect generalization? 4
5 Part III Non-negative combinations. General case. 5
6 Kernel Families Most frequently used kernel families, q 1, with q = Hypothesis sets: H q = K q = p k=1 µ k K k : µ q µ : µ 0, µ q =1. h H K : K K q, h HK 1. 6
7 Rademacher Complexity Empirical Rademacher complexity of sample S =(x 1,...,x m ), R S (H) =E σ sup h H 1 m m i=1 Rademacher complexity of H : : for a where σ i s are independent uniform random variables taking values in. { 1, +1} σ i h(x i ) H, R m (H) = E S D m[ R S (H)]. 7
8 Single Kernel Margin Bound Theorem (Koltchinskii and Panchenko, 2002): fix ρ>0. Assume that K(x, x) R 2 for all x, then, for any δ >0, with probability at least 1 δ, for any h H 1, R(h) R R2 /ρ ρ (h)+2 2 m + log 1 δ 2m. 8
9 Early Learning Kernel Bounds For any δ >0, with probability at least, for any, h H 1 (Bousquet and Herrmann 2003; Lanckriet et al., 2004) 1 δ R(h) R ρ (h)+ 1 max p k=1 Tr(K k)max p K k k=1 Tr(K k ) m ρ log 1 δ. but, bound always greater than one (Srebro and Ben- David, 2006)! other bound of (Lanckriet et al., 2004) for linear combination case also always greater than one! 9
10 Multiplicative Learning Bound (Lanckriet et al., 2004) k [1,p],K k (x, x) R 2 Assume that for all. Then, for any δ >0, with probability at least 1 δ, for any, h H 1 pr2 /ρ R(h) R ρ (h)+o 2 m bound multiplicative in p (number of kernels).. 10
11 Additive Learning Bound Assume that for all k [1,p],K k (x, x) R 2. Then, for any δ >0, with probability at least 1 δ, for any, h H 1 R(h) R ρ (h)+ 8 2+p log 128em3 R2 ρ 2 p +256 R2 ρ 2 bound additive in p(modulo log terms). not informative for p>m. similar guarantees for other families. (Srebro and Ben-David, 2006) log ρem 8R 128mR2 log ρ 2 based on pseudo-dimension of kernel family. m + log(1/δ). 11
12 New Data-Dependent Bound (CC, MM, and AR, 2010) Theorem: for any sample S of size m, and positive integer r, R S (H 1 ) ru r m, with u =(Tr[K 1 ],...,Tr[K p ]). similarity with single kernel bound. can be used directly to derive an algorithm. 12
13 New Data-Dependent Bound Proof: Let q, r 1 1 with. R S (H q )= 1 m E σ = 1 m E σ = 1 m E σ = 1 m E σ = 1 m E σ sup h H q m σ i h(x i ) i=1 sup µ q,α Kα 1 sup µ q,α Kα 1 sup µ q m i,j=1 σ K µ σ sup µ uσ µ q q + 1 r =1 σ i α j K µ (x i,x j ) σ K µ α = 1 m E σ sup σ, α K 1/2 µ q,α K 1/2 1 (Cauchy-Schwarz) uσ =(σ K 1 σ,...,σ K p σ) ) = 1 m E σ uσ r. (definition of dual norm) 13
14 New Data-Dependent Bound R S (H 1 )= 1 m E σ Proof: in the following, uσ r 1 is arbitrary integer. 1 m E σ = 1 m E σ uσ r p (σ K k σ) r 1 k=1 2r 1 p E (σ K k σ) r 1 m σ = 1 m 1 m p k=1 p k=1 k=1 2r E σ (σ K k σ) r 1 2r r r Tr[K 2r k] (Jensen s inequality) = ru r m ( r 1, u σ u σ r ). (lemma) 14
15 Key Lemma Lemma: Let Kbe a kernel matrix for a finite sample. Then, for any integer r, E σ (σ Kσ) r proof based on combinatorial argument. r r Tr[K] 15
16 New Learning Bound - L1 Theorem: assume that for all k [1,p],K k (x, x) R 2. Then, for any δ >0, with probability at least 1 δ, for any, h H 1 R(h) R ρ (h) elog pr2 /ρ 2 m (CC, MM, and AR, 2010) very weak dependency on p, no extra log terms. analysis based on Rademacher complexity. bound valid for p m. see also (Kakade et al., 2010). + log 1 δ 2m. 16
17 Comparison 100 [Srebro & Ben David, 2006] 10 Bound p=m p=m^(5/6) p=m^(4/5) p=m^(3/5) p=m^(1/3) p= [Our bound, 2010] m in Millions ρ/r=.2 17
18 Lower Bound Tight bound: dependency log p cannot be improved. argument based on VC dimension or example. Observations: case X={-1, +1} p. canonical projection kernels K k (x, x )=x k x k. H 1 contains J p ={x sx k : k [1,p],s {-1, +1}}. VCdim(J. p )=Ω(logp) ρ=1 h J p R ρ (h)= R(h) Ω VCdim(J p )/m for and,. VC lower bound:. 18
19 New Paper Recent claim (Hussain and Shawe-Taylor, AISTATS 2011): additive bound in terms of log p, instead of multiplicative. main proof incorrect: probabilistic bound on Rademacher complexity, but slack term left out of proof of theorem 8. Adding it multiplicative bound. however: authors are preparing new version (private communication: J. Shawe-Taylor). 19
20 New Learning Bound - Lq Theorem: let q, r 1 1 with q + 1 r =1 and r integer. Assume that for all k [1,p],K k (x, x) R 2. Then, for any δ >0, with probability at least 1 δ, for any, R(h) R ρ (h)+2p 1 2r mild dependency on p rr2 /ρ 2 m (CC, MM, and AR, 2010) analysis based on Rademacher complexity. + log 1 δ 2m. h H q 20
21 Lower Bound Tight bound: dependency p 1 2r cannot be improved. in particular p 1 4 tight for L 2 regularization. Observations: equal kernels. p k=1 µ kk k = p k=1 µ k K1. h 2 H K1 =( p p k=1 µ k p 1 r µ q = p 1 r H q thus, for. k=1 µ k)h 2 H K p k=1 µ k =0 (Hölder s inequality). coincides with {h H K1 : h HK1 p 1 2r }. 21
22 Comparison L1 vs L p=m p=m^1/3 p=20 Bound m in Millions 22
23 Conclusion Theory: tight generalization bounds for learning kernels with L 1 or L q regularization ( p dependency). mild dependency on p. similar proof and analysis for other regularizations. Applications: results suggest using large number of kernels. recent results show significant improvements (CC, MM, AR, ICML 2010). 23
24 Part III Non-negative combinations. General case. 24
25 Kernel Family General case: K a family of kernels bounded by R. finite pseudo-dimension: general hypothesis set: H K = Pdim(K)<. h H K : K K, h HK 1. 25
26 Shattering Definition: Let H be a hypothesis set of functions from X to R. A={x 1,...,x m } is shattered by H if there exist t 1,...,t m R such that sgn L(h(x 1 ),f(x 1 )) t 1. sgn L(h(x m ),f(x m )) t m : h H =2 m. t 1 t 2 x 1 x 2 26
27 Pseudo-Dimension Definition: Let H be a hypothesis set of functions from X to R. The pseudo-dimension of H, Pdim(H), is the size of the largest set shattered by H. Definition (equivalent, see also (Vapnik, 1995)): (Pollard, 1984) Pdim(H) =VCdim (x, t) 1(h(x) t)>0 : h H. 27
28 Pseudo-Dimension - Properties Theorem: Pseudo-dimension of hyperplanes. Pdim(x w x + b: w R N,b R) =N +1. Theorem: Pseudo-dimension of a vector space of real-valued functions H. Pdim(H) =dim(h). Theorem: Pseudo-dimension of where φ is a monotone function: Pdim(φ(H)) dim(h). φ(h)={φ h: h H} 28
29 General Pdim Learning Bound Let K a family of kernel functions bounded by R. Let d=pdim(k), then, for any δ >0, with probability at least, for any, R(h) R ρ (h)+ 1 δ 8 h H K 2+d log 128em3 R2 ρ 2 d +256 R2 ρ 2 bound additive in d (modulo log terms). not informative for d>m. (Srebro and Ben-David, 2006) log ρem 8R m 128mR2 log ρ 2 +log(1/δ). 29
30 Application: Linear Combinations Linear and non-negative combination of base kernels (previous section): K lin = K 1 = K µ = K µ = p k=1 p k=1 µ k K k : p k=1 µ k K k : p p Since K lin K 1, k=1 k=1 µ k K k Pdim(K 1 ) Pdim(K lin ) dim µ k =1 K µ 0 µ k =1 µ 0 p µ k K k =p. k=1 30
31 Application: Gaussian Kernels Gaussian kernels with a fixed covariance matrix: K Gaussian = since exp (x 1, x 2 ) exp( (x 2 x 1 ) A(x 2 x 1 )): A S N + is monotone and since (x 1, x 2 ) (x 2 x 1 ) A(x 2 x 1 ): A S N + n = (x 1, x 2 ) A ij (x 2 x 1 ) i (x 2 x 1 ) j : A S N + i,j=1 span (x 1, x 2 ) (x 2 x 1 ) i (x 2 x 1 ) j :1 i j N Pdim(K Gaussian ) N(N 1) 2. Similar for diagonal, 31 A Pdim(K Gaussian ) N.,.
32 References Bousquet, Olivier and Herrmann, Daniel J. L. On the complexity of learning the kernel matrix. In NIPS, Cortes, Corinna, Mohri, Mehryar, and Rostamizadeh, Afshin. Generalization Bounds for Learning Kernels. In ICML, Cortes, Corinna, Mohri, Mehryar, and Rostamizadeh, Afshin. Two-stage learning kernel methods. In ICML, Zakria Hussain, John Shawe-Taylor. Improved Loss Bounds For Multiple Kernel Learning. In AISTATS, Kakade, Sham M., Shalev-Shwartz, Shai, and Tewari, Ambuj. Applications of strong convexity strong smoothness duality to learning with matrices, arxiv: v1. Koltchinskii, V. and Panchenko, D. Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics, 30, Koltchinskii, Vladimir and Yuan, Ming. Sparse recovery in large ensembles of kernel machines on-line learning and bandits. In COLT,
33 References Lanckriet, Gert, Cristianini, Nello, Bartlett, Peter, Ghaoui, Laurent El, and Jordan, Michael. Learning the kernel matrix with semidefinite programming. JMLR, 5, Srebro, Nathan and Ben-David, Shai. Learning bounds for support vector machines with learned kernels. In COLT, Ying, Yiming and Campbell, Colin. Generalization bounds for learning the kernel problem. In COLT,
Advanced Machine Learning
Advanced Machine Learning Learning Kernels MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Kernel methods. Learning kernels scenario. learning bounds. algorithms. page 2 Machine Learning
More informationAdvanced Machine Learning
Advanced Machine Learning Deep Boosting MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Model selection. Deep boosting. theory. algorithm. experiments. page 2 Model Selection Problem:
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationRademacher Complexity Bounds for Non-I.I.D. Processes
Rademacher Complexity Bounds for Non-I.I.D. Processes Mehryar Mohri Courant Institute of Mathematical ciences and Google Research 5 Mercer treet New York, NY 00 mohri@cims.nyu.edu Afshin Rostamizadeh Department
More informationDomain Adaptation for Regression
Domain Adaptation for Regression Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Motivation Applications: distinct training and test distributions.
More informationRademacher Bounds for Non-i.i.d. Processes
Rademacher Bounds for Non-i.i.d. Processes Afshin Rostamizadeh Joint work with: Mehryar Mohri Background Background Generalization Bounds - How well can we estimate an algorithm s true performance based
More informationFoundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification page 2 Motivation Real-world problems often have multiple classes:
More informationSample Selection Bias Correction
Sample Selection Bias Correction Afshin Rostamizadeh Joint work with: Corinna Cortes, Mehryar Mohri & Michael Riley Courant Institute & Google Research Motivation Critical Assumption: Samples for training
More informationAlgorithms for Learning Kernels Based on Centered Alignment
Journal of Machine Learning Research 3 (2012) XX-XX Submitted 01/11; Published 03/12 Algorithms for Learning Kernels Based on Centered Alignment Corinna Cortes Google Research 76 Ninth Avenue New York,
More informationLearning Bounds for Importance Weighting
Learning Bounds for Importance Weighting Corinna Cortes Google Research corinna@google.com Yishay Mansour Tel-Aviv University mansour@tau.ac.il Mehryar Mohri Courant & Google mohri@cims.nyu.edu Motivation
More informationFoundations of Machine Learning Multi-Class Classification. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Multi-Class Classification Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation Real-world problems often have multiple classes: text, speech,
More informationIntroduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 13 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification Mehryar Mohri - Introduction to Machine Learning page 2 Motivation
More informationPerceptron Mistake Bounds
Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationOn the tradeoff between computational complexity and sample complexity in learning
On the tradeoff between computational complexity and sample complexity in learning Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Joint work with Sham
More informationConvex Optimization in Classification Problems
New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationEfficient Bandit Algorithms for Online Multiclass Prediction
Efficient Bandit Algorithms for Online Multiclass Prediction Sham Kakade, Shai Shalev-Shwartz and Ambuj Tewari Presented By: Nakul Verma Motivation In many learning applications, true class labels are
More informationapproximation algorithms I
SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 201 meta-task encoded as low-degree polynomial in R x example: f(x) = i,j n w ij x i x j 2 given: functions
More informationIntroduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 11 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Boosting Mehryar Mohri - Introduction to Machine Learning page 2 Boosting Ideas Main idea:
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland RADEMACHER COMPLEXITY Slides adapted from Rob Schapire Machine Learning: Jordan Boyd-Graber UMD Introduction
More informationAdvanced Machine Learning
Advanced Machine Learning Domain Adaptation MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Non-Ideal World time real world ideal domain sampling page 2 Outline Domain adaptation. Multiple-source
More informationDeep Boosting. Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) COURANT INSTITUTE & GOOGLE RESEARCH.
Deep Boosting Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Ensemble Methods in ML Combining several base classifiers
More informationBoosting Ensembles of Structured Prediction Rules
Boosting Ensembles of Structured Prediction Rules Corinna Cortes Google Research 76 Ninth Avenue New York, NY 10011 corinna@google.com Vitaly Kuznetsov Courant Institute 251 Mercer Street New York, NY
More informationLearning with Imperfect Data
Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Joint work with: Yishay Mansour (Tel-Aviv & Google) and Afshin Rostamizadeh (Courant Institute). Standard Learning Assumptions IID assumption.
More informationMulti-class classification via proximal mirror descent
Multi-class classification via proximal mirror descent Daria Reshetova Stanford EE department resh@stanford.edu Abstract We consider the problem of multi-class classification and a stochastic optimization
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationOptimistic Rates Nati Srebro
Optimistic Rates Nati Srebro Based on work with Karthik Sridharan and Ambuj Tewari Examples based on work with Andy Cotter, Elad Hazan, Tomer Koren, Percy Liang, Shai Shalev-Shwartz, Ohad Shamir, Karthik
More informationThe definitions and notation are those introduced in the lectures slides. R Ex D [h
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 04, 2016 Due: October 18, 2016 A. Rademacher complexity The definitions and notation
More informationLearning Weighted Automata
Learning Weighted Automata Joint work with Borja Balle (Amazon Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Weighted Automata (WFAs) page 2 Motivation Weighted automata (WFAs): image
More informationMehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013 A. Kernels 1. Let X be a finite set. Show that the kernel
More informationADANET: adaptive learning of neural networks
ADANET: adaptive learning of neural networks Joint work with Corinna Cortes (Google Research) Javier Gonzalo (Google Research) Vitaly Kuznetsov (Google Research) Scott Yang (Courant Institute) MEHRYAR
More informationFast Rates for Estimation Error and Oracle Inequalities for Model Selection
Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@cs.berkeley.edu
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:
More informationRisk Bounds for Lévy Processes in the PAC-Learning Framework
Risk Bounds for Lévy Processes in the PAC-Learning Framework Chao Zhang School of Computer Engineering anyang Technological University Dacheng Tao School of Computer Engineering anyang Technological University
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationA Note on Extending Generalization Bounds for Binary Large-Margin Classifiers to Multiple Classes
A Note on Extending Generalization Bounds for Binary Large-Margin Classifiers to Multiple Classes Ürün Dogan 1 Tobias Glasmachers 2 and Christian Igel 3 1 Institut für Mathematik Universität Potsdam Germany
More informationGeneralization Bounds for Regularized Pairwise Learning
Generalization Bounds for Regularized Pairwise Learning Yunwen Lei 1, Shao-Bo Lin, Ke Tang 1 1 Shenzhen Key Laboratory of Computational Intelligence, Department of Computer Science and Engineering, Southern
More informationSupport Vector Machine & Its Applications
Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia
More informationStructured Prediction
Structured Prediction Ningshan Zhang Advanced Machine Learning, Spring 2016 Outline Ensemble Methods for Structured Prediction[1] On-line learning Boosting AGeneralizedKernelApproachtoStructuredOutputLearning[2]
More informationComputational Learning Theory - Hilary Term : Learning Real-valued Functions
Computational Learning Theory - Hilary Term 08 8 : Learning Real-valued Functions Lecturer: Varun Kanade So far our focus has been on learning boolean functions. Boolean functions are suitable for modelling
More informationUniform Glivenko-Cantelli Theorems and Concentration of Measure in the Mathematical Modelling of Learning
Uniform Glivenko-Cantelli Theorems and Concentration of Measure in the Mathematical Modelling of Learning Martin Anthony Department of Mathematics London School of Economics Houghton Street London WC2A
More informationOnline Learning: Random Averages, Combinatorial Parameters, and Learnability
Online Learning: Random Averages, Combinatorial Parameters, and Learnability Alexander Rakhlin Department of Statistics University of Pennsylvania Karthik Sridharan Toyota Technological Institute at Chicago
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationwhere x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 25, 2017 Due: November 08, 2017 A. Growth function Growth function of stum functions.
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationHierarchical kernel learning
Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationGeneralization Error Bounds for Collaborative Prediction with Low-Rank Matrices
Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices Nathan Srebro Department of Computer Science University of Toronto Toronto, ON, Canada nati@cs.toronto.edu Noga Alon School
More informationA Reformulation of Support Vector Machines for General Confidence Functions
A Reformulation of Support Vector Machines for General Confidence Functions Yuhong Guo 1 and Dale Schuurmans 2 1 Department of Computer & Information Sciences Temple University www.cis.temple.edu/ yuhong
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationIntroduction to Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Kernel Methods Motivation Non-linear decision boundary. Efficient computation of inner
More informationLinear Classification and SVM. Dr. Xin Zhang
Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a
More informationThe Representor Theorem, Kernels, and Hilbert Spaces
The Representor Theorem, Kernels, and Hilbert Spaces We will now work with infinite dimensional feature vectors and parameter vectors. The space l is defined to be the set of sequences f 1, f, f 3,...
More informationBennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence
Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present
More informationConvex Methods for Transduction
Convex Methods for Transduction Tijl De Bie ESAT-SCD/SISTA, K.U.Leuven Kasteelpark Arenberg 10 3001 Leuven, Belgium tijl.debie@esat.kuleuven.ac.be Nello Cristianini Department of Statistics, U.C.Davis
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationSupport Vector Machines
Support Vector Machines Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University for
More informationSpectrally-normalized margin bounds for neural networks
Spectrally-normalized margin bounds for neural networks Peter L. Bartlett Dylan J. Foster Matus Telgarsky 1 Overview Abstract This paper presents a margin-based multiclass generalization bound for neural
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationSupport Vector Machine I
Support Vector Machine I Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University for Advanced
More informationlearning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015
learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015 Introduction Often, training distribution does not match testing distribution Want to utilize information about test
More informationLearning the Kernel Matrix with Semidefinite Programming
Journal of Machine Learning Research 5 (2004) 27-72 Submitted 10/02; Revised 8/03; Published 1/04 Learning the Kernel Matrix with Semidefinite Programg Gert R.G. Lanckriet Department of Electrical Engineering
More informationRobust Kernel-Based Regression
Robust Kernel-Based Regression Budi Santosa Department of Industrial Engineering Sepuluh Nopember Institute of Technology Kampus ITS Surabaya Surabaya 60111,Indonesia Theodore B. Trafalis School of Industrial
More informationTime Series Prediction & Online Learning
Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.
More informationSupport Vector Machines on General Confidence Functions
Support Vector Machines on General Confidence Functions Yuhong Guo University of Alberta yuhong@cs.ualberta.ca Dale Schuurmans University of Alberta dale@cs.ualberta.ca Abstract We present a generalized
More informationSharp Generalization Error Bounds for Randomly-projected Classifiers
Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk
More informationFoundations of Machine Learning Ranking. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Ranking Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation Very large data sets: too large to display or process. limited resources, need
More informationMythBusters: A Deep Learning Edition
1 / 8 MythBusters: A Deep Learning Edition Sasha Rakhlin MIT Jan 18-19, 2018 2 / 8 Outline A Few Remarks on Generalization Myths 3 / 8 Myth #1: Current theory is lacking because deep neural networks have
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationLearning Kernel-Based Halfspaces with the Zero-One Loss
Learning Kernel-Based Halfspaces with the Zero-One Loss Shai Shalev-Shwartz The Hebrew University shais@cs.huji.ac.il Ohad Shamir The Hebrew University ohadsh@cs.huji.ac.il Karthik Sridharan Toyota Technological
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems
More informationSupport Vector Machine. Natural Language Processing Lab lizhonghua
Support Vector Machine Natural Language Processing Lab lizhonghua Support Vector Machine Introduction Theory SVM primal and dual problem Parameter selection and practical issues Compare to other classifier
More informationGeneralization error bounds for classifiers trained with interdependent data
Generalization error bounds for classifiers trained with interdependent data icolas Usunier, Massih-Reza Amini, Patrick Gallinari Department of Computer Science, University of Paris VI 8, rue du Capitaine
More informationStochastic Optimization
Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic
More informationi=1 cosn (x 2 i y2 i ) over RN R N. cos y sin x
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 November 16, 017 Due: Dec 01, 017 A. Kernels Show that the following kernels K are PDS: 1.
More informationFoundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)
More informationTime Series Prediction and Online Learning
JMLR: Workshop and Conference Proceedings vol 49:1 24, 2016 Time Series Prediction and Online Learning Vitaly Kuznetsov Courant Institute, New York Mehryar Mohri Courant Institute and Google Research,
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationLearning Algorithms for Second-Price Auctions with Reserve
Journal of Machine Learning Research 17 (216) 1-25 Submitted 12/14; Revised 8/15; Published 4/16 Learning Algorithms for Second-Price Auctions with Reserve Mehryar Mohri and Andrés Muñoz Medina Courant
More informationSample-adaptive Multiple Kernel Learning
Sample-adaptive Multiple Kernel Learning Xinwang Liu, Lei Wang, Jian Zhang, Jianping Yin Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology
More informationMachine Learning in the Data Revolution Era
Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More informationLearning the Kernel Matrix with Semi-Definite Programming
Learning the Kernel Matrix with Semi-Definite Programg Gert R.G. Lanckriet gert@cs.berkeley.edu Department of Electrical Engineering and Computer Science University of California, Berkeley, CA 94720, USA
More informationLearning Bound for Parameter Transfer Learning
Learning Bound for Parameter Transfer Learning Wataru Kumagai Faculty of Engineering Kanagawa University kumagai@kanagawa-u.ac.jp Abstract We consider a transfer-learning problem by using the parameter
More informationEfficient classification for metric data
Efficient classification for metric data Lee-Ad Gottlieb Weizmann Institute of Science lee-ad.gottlieb@weizmann.ac.il Aryeh Leonid Kontorovich Ben Gurion University karyeh@cs.bgu.ac.il Robert Krauthgamer
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationAdaptive Online Gradient Descent
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow
More informationMACHINE LEARNING. Support Vector Machines. Alessandro Moschitti
MACHINE LEARNING Support Vector Machines Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it Summary Support Vector Machines
More informationSupport Vector Machine Classification via Parameterless Robust Linear Programming
Support Vector Machine Classification via Parameterless Robust Linear Programming O. L. Mangasarian Abstract We show that the problem of minimizing the sum of arbitrary-norm real distances to misclassified
More informationRobust Novelty Detection with Single Class MPM
Robust Novelty Detection with Single Class MPM Gert R.G. Lanckriet EECS, U.C. Berkeley gert@eecs.berkeley.edu Laurent El Ghaoui EECS, U.C. Berkeley elghaoui@eecs.berkeley.edu Michael I. Jordan Computer
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationLearning Kernel Parameters by using Class Separability Measure
Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg
More information