Introduction to Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research
|
|
- Darleen Taylor
- 5 years ago
- Views:
Transcription
1 Introduction to Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
2 Kernel Methods
3 Motivation Non-linear decision boundary. Efficient computation of inner products in high dimension. Flexible selection of more complex features. page 3
4 Definitions SVMs with kernels Closure properties Sequence Kernels This Lecture page 4
5 Non-Linear Separation Linear separation impossible in most problems. Non-linear mapping from input space to highdimensional feature space: Φ: X F. Generalization ability: independent of dim(f ), depends only on ρ and m. page 5
6 Idea: Define Kernel Methods K : X X R, called kernel, such that: Φ(x) Φ(y) =K(x, y). K often interpreted as a similarity measure. Benefits: Efficiency: K is often more efficient to compute than Φ and the dot product. Flexibility: K can be chosen arbitrarily so long as the existence of Φ is guaranteed (symmetry and positive definiteness condition). page 6
7 PDS Condition Definition: a kernel symmetric (PDS) if for any matrix positive semi-definite (SPSD). K is positive definite, the is symmetric K: X X R {x 1,...,x m } X K =[K(x i,x j )] ij R m m SPSD if symmetric and one of the 2 equiv. cond. s: its eigenvalues are non-negative. n for any c R m 1, c Kc = i,j=1 c i c j K(x i,x j ) 0. Terminology: PDS for kernels, SPSD for kernel matrices (see (Berg et al., 1984)). page 7
8 Example - Polynomial Kernels Definition: x, y R N,K(x, y) =(x y + c) d, c > 0. Example: for N =2 and d=2, K(x, y) =(x 1 y 1 + x 2 y 2 + c) 2 x 2 1 y 2 x y 2 2 = 2 x1 2cx1 x 2 2 y1 y 2 2cy1. 2cx2 2cy2 c c page 8
9 XOR Problem Use second-degree polynomial kernel with c =1: 2 x1 x 2 (-1, 1) x 2 (1, 1) (1, 1, + 2, 2, 2, 1) (1, 1, + 2, + 2, + 2, 1) x 1 2 x1 (-1, -1) (1, -1) (1, 1, 2, 2, + 2, 1) (1, 1, 2, + 2, 2, 1) Linearly non-separable Linearly separable by x 1 x 2 =0. page 9
10 Other Standard PDS Kernels Gaussian kernels: K(x, y) =exp Sigmoid Kernels: x y 2 2σ 2, σ = 0. K(x, y) =tanh(a(x y)+b), a,b 0. page 10
11 Definitions SVMs with kernels Closure properties Sequence Kernels This Lecture page 11
12 Reproducing Kernel Hilbert Space Theorem: Let K: X X R be a PDS kernel. Then, there exists a Hilbert space H and a mapping Φ from X to H such that x, y X, K(x, y) =Φ(x) Φ(y). (Aronszajn, 1950) Furthermore, the following reproducing property holds: f H 0, x X, f(x) =f,φ(x) = f,k(x, ). page 12
13 Notes: H is called the reproducing kernel Hilbert space (RKHS) associated to K. A Hilbert space such that there exists Φ:X H with K(x, y)=φ(x) Φ(y) for all x, y X is also called a feature space associated to K. Φ is called a feature mapping. Feature spaces associated to K unique. are in general not page 13
14 Consequence: SVMs with PDS Kernels Constrained optimization: Solution: max α m i=1 α i 1 2 subject to: 0 α i C h(x) =sgn m m m i,j=1 m i=1 (Boser, Guyon, and Vapnik, 1992) α i α j y i y j K(x i,x j ) α i y i =0,i [1,m]. with b = y i α j y j K(x j,x i ) for any with j=1 i=1 α i y i K(x i,x)+b, x i Φ(x i ) Φ(x j ) Φ(x i ) Φ(x) Φ(x j ) Φ(x i ) 0<α i <C. page 14
15 SVMs with PDS Kernels Constrained optimization: Solution: max α 2 1 α (α y) K(α y) subject to: 0 α C α y =0. h =sgn m i=1 α i y i K(x i, )+b, with b = y i (α y) Ke i for any x i with Hadamard product 0<α i <C. page 15
16 Generalization: Representer Theorem (Kimeldorf and Wahba, 1971) Theorem: Let be a PDS kernel and its corresponding RKHS. Then, for any nondecreasing function and any the optimization problem argmin h H K: X X R H F (h) =argmin h H admits a solution of the formh = G: R R L: R m R {+ } G(h 2 H )+L h(x 1 ),...,h(x m ) m α i K(x i, ). i=1 If G is further assumed to be increasing, then any solution has this form. page 16
17 Proof: let. admits the decomposition to H =H 1 H1. Since G is non-decreasing, Any according By the reproducing property, for all i [1,m], Thus, and G H 1 =span({k(x i, ): i [1,m]}) h=h 1 + h G(h 1 2 ) G(h h 2 )=G(h 2 ). h H h(x i )=h, K(x i, ) = h 1,K(x i, ) = h 1 (x i ). L h(x 1 ),...,h(x m ) =L h 1 (x 1 ),...,h 1 (x m ) F (h 1 ) F (h). If is increasing, then F (h 1 )<F(h) and any solution of the optimization problem must be in. H 1 page 17
18 Kernel-Based Algorithms PDS kernels used to extend a variety of algorithms in classification and other areas: regression. ranking. dimensionality reduction. clustering. But, how do we define PDS kernels? page 18
19 Definitions SVMs with kernels Closure properties Sequence Kernels This Lecture page 19
20 Closure Properties of PDS Kernels Theorem: Positive definite symmetric (PDS) kernels are closed under: sum, product, tensor product, pointwise limit, composition with a power series. page 20
21 Closure Properties - Proof Proof: closure under sum: c Kc 0 c K c 0 c (K + K )c 0. closure under product: K = MM, m i,j=1 c i c j (K ij K ij )= with z k = = m i,j=1 m k=1 c 1M 1k. c m M mk m c i c j M ik M jk K ij m i,j=1 k=1 c i c j M ik M jk K ij = m z k K z k 0, k=1 page 21
22 Closure under tensor product: definition: for all x 1,x 2,y 1,y 2 X, (K 1 K 2 )(x 1,y 1,x 2,y 2 )=K 1 (x 1,x 2 )K 2 (y 1,y 2 ). thus, PDS kernel as product of the kernels (x 1,y 1,x 2,y 2 ) K 1 (x 1,x 2 ) (x 1,y 1,x 2,y 2 ) K 2 (y 1,y 2 ). Closure under pointwise limit: if for all x, y X, Then, lim K n(x, y) =K(x, y), n ( n, c K n c 0) lim n c K n c = c Kc 0. page 22
23 Closure under composition with power series: assumptions: K PDS kernel with K(x, y) <ρ for all and f(x)= x, y X n=0 a nx n,a n 0power series with radius of convergence ρ. f Kis a PDS kernel since K n is PDS by closure N n=0 a nk n under product, is PDS by closure under sum, and closure under pointwise limit. Example: for any PDS kernel, K exp(k) is PDS. page 23
24 Definitions SVMs with kernels Closure properties Sequence Kernels This Lecture page 24
25 Sequence Kernels Definition: Kernels defined over pairs of strings. Motivation: computational biology, text and speech classification. Idea: two sequences are related when they share some common substrings or subsequences. Example: sum of the product of the counts of common substrings. page 25
26 Weighted Transducers b:a/0.6 b:a/0.2 0 a:b/0.1 1 a:b/0.5 a:a/0.4 2 b:a/0.3 3/0.1 T (x, y) = Sum of the weights of all accepting paths with input x and output y. T (abb, baa)= page 26
27 Rational Kernels over Strings Definition: a kernel for some weighted transducer T. K :Σ Σ R (Cortes et al., 2004) is rational if K =T Definition: let T 1 :Σ R and T 2 : Ω R be two weighted transducers. Then, the composition of T 1 and T 2 is defined for all x Σ,y Ω by (T 1 T 2 )(x, y) = z T 1 (x, z) T 2 (z,y). Definition: the inverse of a transducer is the transducer T 1 : Σ R obtained from by swapping input and output labels. T :Σ R T page 27
28 Composition Theorem: the composition of two weighted transducer is also a weighted transducer. Proof: constructive proof based on composition algorithm. states identified with pairs. -free case: transitions defined by E = (q 1,a,b,w 1,q 2 ) E 1 (q 1,b,c,w 2,q 2 ) E 2 (q 1,q 1 ),a,c,w 1 w 2, (q 2,q 2 ). general case: use of intermediate -filter. page 28
29 Composition Algorithm ε-free Case a:b/ a:b/0.2 b:b/0.3 2 b:b/0.4 a:b/0.5 a:a/0.6 3/0.7 b:a/0.2 b:b/ a:b/0.3 2 a:b/0.4 b:a/0.5 3/0.6 a:a/.04 (0, 1) a:a/.02 a:b/.18 (3, 2) a:b/.01 (0, 0) (1, 1) b:a/.06 (2, 1) a:a/0.1 (3, 1) a:b/.24 b:a/.08 (3, 3)/.42 Complexity: O( T 1 T 2 ) in general, linear in some cases. page 29
30 Redundant ε-paths Problem (MM et al. 1996) 0 a:a 1 b:! 2 c:! 3 d:d 4 0 a:d 1!:e 2 d:a T 1 3 T 2 T 1 0 a:a 1 b:!2 2 c:!2 3 d:d 4 0 a:d 1!1: e 2 d:a 3 T2!:!1!:!1!:!1!:!1!:!1!2:!!2:!!2:!!2:!!"%!"& ).) '.' '.( #"!,!#"!#-,!#"!#- $"! #"&!"& (.' (.( *.'!"& *.(,!#"!!$ #"!,/"/-,!!"!!-,!!"!!-,!!"!!- $"! %"!,!#"!#-,!#"!#-,/"/- +.*!2:!1 x:x 0!1:!1 x:x!2:!2 x:x!1:!1 1!2:!2 F T = T 1 F T 2. 2 page 30
31 PDS Rational Kernels General Construction Theorem: for any weighted transducer T :Σ Σ R, the function K =T T 1 is a PDS rational kernel. x, y Σ Proof: by definition, for all, K(x, y) = K is pointwise limit of z T (x, z) T (y, z). (K n ) n 0 x, y Σ, K n (x, y) = z n defined by T (x, z) T (y, z). K n is PDS since for any sample (x 1,...,x m ), K n = AA with A =(K n (x i,z j )) i [1,m]. j [1,N] page 31
32 Counting Transducers b:ε/1 a:ε/1 b:ε/1 a:ε/1 Z = X = ab bbabaabba 0 X:X/1 T X 1/1 εεabεεεεε εεεεεabεε X may be a string or an automaton representing a regular expression. Counts of Z in X: sum of the weights of accepting paths of. Z T X page 32
33 Transducer Counting Bigrams b:ε/1 a:ε/1 b:ε/1 a:ε/1 0 a:a/1 b:b/1 1 a:a/1 b:b/1 2/1 T bigram Counts of Zgiven by Z T bigram ab. page 33
34 Transducer Counting Gappy Bigrams b:ε/1 a:ε/1 b:ε/λ a:ε/λ b:ε/1 a:ε/1 0 a:a/1 b:b/1 1 a:a/1 b:b/1 2/1 T gappy bigram Counts of Zgiven by Z T gappy bigram ab, gap penalty λ (0, 1). page 34
35 Kernels for Other Discrete Structures Similarly, PDS kernels can be defined on other discrete structures: Images, graphs, parse trees, automata, weighted automata. page 35
36 References N. Aronszajn, Theory of Reproducing Kernels, Trans. Amer. Math. Soc., 68, , Peter Bartlett and John Shawe-Taylor. Generalization performance of support vector machines and other pattern classifiers. In Advances in kernel methods: support vector learning, pages MIT Press, Cambridge, MA, USA, Christian Berg, Jens Peter Reus Christensen, and Paul Ressel. Harmonic Analysis on Semigroups. Springer-Verlag: Berlin-New York, Bernhard Boser, Isabelle M. Guyon, and Vladimir Vapnik. A training algorithm for optimal margin classifiers. In proceedings of COLT 1992, pages , Pittsburgh, PA, Corinna Cortes, Patrick Haffner, and Mehryar Mohri. Rational Kernels: Theory and Algorithms. Journal of Machine Learning Research (JMLR), 5: , Corinna Cortes and Vladimir Vapnik, Support-Vector Networks, Machine Learning, 20, Kimeldorf, G. and Wahba, G. Some results on Tchebycheffian Spline Functions, J. Mathematical Analysis and Applications, 33, 1 (1971) page 36 Courant Institute, NYU
37 References James Mercer. Functions of Positive and Negative Type, and Their Connection with the Theory of Integral Equations. In Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character, Vol. 83, No. 559, pp , Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Weighted Automata in Text and Speech Processing, In Proceedings of the 12th biennial European Conference on Artificial Intelligence (ECAI-96), Workshop on Extended finite state models of language. Budapest, Hungary, Fernando C. N. Pereira and Michael D. Riley. Speech Recognition by Composition of Weighted Finite Automata. In Finite-State Language Processing, pages MIT Press, I. J. Schoenberg, Metric Spaces and Positive Definite Functions. Transactions of the American Mathematical Society, Vol. 44, No. 3, pp , Vladimir N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer, Basederlin, Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer, Vladimir N. Vapnik. Statistical Learning Theory. Wiley-Interscience, New York, page 37 Courant Institute, NYU
38 Appendix
39 Shortest-Distance Problem Definition: for any regulated weighted transducer T, define the shortest distance from state to as d(q, F) = q F w[π]. π P (q,f) Problem: compute d(q, F) for all states q Q. Algorithms: Generalization of Floyd-Warshall. Single-source shortest-distance algorithm. page 39
40 All-Pairs Shortest-Distance Algorithm Assumption: closed semiring (not necessarily idempotent). Idea: generalization of Floyd-Warshall algorithm. Properties: Time complexity: Ω( Q 3 (T + T + T )). Space complexity: Ω( Q 2 ) with an in-place implementation. (MM, 2002) page 40
41 Closed Semirings (Lehmann, 1977) Definition: a semiring is closed if the closure is well defined for all elements and if associativity, commutativity, and distributivity apply to countable sums. Examples: Tropical semiring. Probability semiring when including infinity or when restricted to well-defined closures. page 41
42 Pseudocode page 42
43 Single-Source Shortest-Distance Algorithm Assumption: k-closed semiring. k+1 k x K, x i = x i. i=0 i=0 Idea: generalization of relaxation, but must keep track of weight added to d[q] since the last time was enqueued. (MM, 2002) Properties: works with any queue discipline and any k-closed semiring. Classical algorithms are special instances. q page 43
44 Pseudocode Generic-Single-Source-Shortest-Distance (G, s) 1 for i 1 to Q 2 do d[i] r[i] 0 3 d[s] r[s] 1 4 S {s} 5 while S 6 do q head(s) 7 Dequeue(S) 8 r r[q] 9 r[q] 0 10 for each e E[q] 11 do if d[n[e]] d[n[e]] (r w[e]) 12 then d[n[e]] d[n[e]] (r w[e]) 13 r[n[e]] r[n[e]] (r w[e]) 14 if n[e] S 15 then Enqueue(S, n[e]) 16 d[s] 1 page 44
45 Notes Complexity: depends on queue discipline used. O( Q + (T + T + C(A)) E max N(q) + (C(I) + C(E)) N(q)) q Q q Q coincides with that of Dijkstra and Bellman-Ford for shortest-first and FIFO orders. linear for acyclic graphs using topological order. O( Q + (T + T ) E ) Approximation: ɛ- k-closed semiring, e.g., for graphs in probability semiring. page 45
Foundations of Machine Learning Lecture 5. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Lecture 5 Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Kernel Methods Motivation Non-linear decision boundary. Efficient coputation of inner products
More informationFoundations of Machine Learning Kernel Methods. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Kernel Methods Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Motivation Efficient coputation of inner products in high diension. Non-linear decision
More informationGeneric ǫ-removal and Input ǫ-normalization Algorithms for Weighted Transducers
International Journal of Foundations of Computer Science c World Scientific Publishing Company Generic ǫ-removal and Input ǫ-normalization Algorithms for Weighted Transducers Mehryar Mohri mohri@research.att.com
More informationOn-Line Learning with Path Experts and Non-Additive Losses
On-Line Learning with Path Experts and Non-Additive Losses Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Courant Institute) Manfred Warmuth (UC Santa Cruz) MEHRYAR MOHRI MOHRI@ COURANT
More informationOpenFst: An Open-Source, Weighted Finite-State Transducer Library and its Applications to Speech and Language. Part I. Theory and Algorithms
OpenFst: An Open-Source, Weighted Finite-State Transducer Library and its Applications to Speech and Language Part I. Theory and Algorithms Overview. Preliminaries Semirings Weighted Automata and Transducers.
More informationAdvanced Machine Learning
Advanced Machine Learning Learning with Large Expert Spaces MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Problem Learning guarantees: R T = O( p T log N). informative even for N very large.
More informationLarge-Scale Training of SVMs with Automata Kernels
Large-Scale Training of SVMs with Automata Kernels Cyril Allauzen, Corinna Cortes, and Mehryar Mohri, Google Research, 76 Ninth Avenue, New York, NY Courant Institute of Mathematical Sciences, 5 Mercer
More informationN-Way Composition of Weighted Finite-State Transducers
International Journal of Foundations of Computer Science c World Scientific Publishing Company N-Way Composition of Weighted Finite-State Transducers CYRIL ALLAUZEN Google Research, 76 Ninth Avenue, New
More informationSpeech Recognition Lecture 4: Weighted Transducer Software Library. Mehryar Mohri Courant Institute of Mathematical Sciences
Speech Recognition Lecture 4: Weighted Transducer Software Library Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Software Libraries FSM Library: Finite-State Machine Library.
More informationWeighted Finite-State Transducer Algorithms An Overview
Weighted Finite-State Transducer Algorithms An Overview Mehryar Mohri AT&T Labs Research Shannon Laboratory 80 Park Avenue, Florham Park, NJ 0793, USA mohri@research.att.com May 4, 004 Abstract Weighted
More informationLearning Languages with Rational Kernels
Learning Languages with Rational Kernels Corinna Cortes 1, Leonid Kontorovich 2, and Mehryar Mohri 3,1 1 Google Research, 76 Ninth Avenue, New York, NY 111. 2 Carnegie Mellon University, 5 Forbes Avenue,
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationIntroduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 13 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification Mehryar Mohri - Introduction to Machine Learning page 2 Motivation
More informationLearning from Uncertain Data
Learning from Uncertain Data Mehryar Mohri AT&T Labs Research 180 Park Avenue, Florham Park, NJ 07932, USA mohri@research.att.com Abstract. The application of statistical methods to natural language processing
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationMoment Kernels for Regular Distributions
Moment Kernels for Regular Distributions Corinna Cortes Google Labs 1440 Broadway, New York, NY 10018 corinna@google.com Mehryar Mohri AT&T Labs Research 180 Park Avenue, Florham Park, NJ 07932 mohri@research.att.com
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationFoundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification page 2 Motivation Real-world problems often have multiple classes:
More informationSupport Vector Machines
Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6
More informationFoundations of Machine Learning Multi-Class Classification. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Multi-Class Classification Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation Real-world problems often have multiple classes: text, speech,
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationKernel Methods. Charles Elkan October 17, 2007
Kernel Methods Charles Elkan elkan@cs.ucsd.edu October 17, 2007 Remember the xor example of a classification problem that is not linearly separable. If we map every example into a new representation, then
More informationRepresenter theorem and kernel examples
CS81B/Stat41B Spring 008) Statistical Learning Theory Lecture: 8 Representer theorem and kernel examples Lecturer: Peter Bartlett Scribe: Howard Lei 1 Representer Theorem Recall that the SVM optimization
More informationSupport Vector Machines
Support Vector Machines Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Overview Motivation
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationSpeech Recognition Lecture 5: N-gram Language Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri
Speech Recognition Lecture 5: N-gram Language Models Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Components Acoustic and pronunciation model: Pr(o w) =
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationSupport Vector Machines Explained
December 23, 2008 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationi=1 cosn (x 2 i y2 i ) over RN R N. cos y sin x
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 November 16, 017 Due: Dec 01, 017 A. Kernels Show that the following kernels K are PDS: 1.
More informationKernel Methods for Learning Languages
Kernel Methods for Learning Languages Leonid (Aryeh) Kontorovich a and Corinna Cortes b and Mehryar Mohri c,b a Department of Mathematics Weizmann Institute of Science, Rehovot, Israel 76100 b Google Research,
More informationMehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013 A. Kernels 1. Let X be a finite set. Show that the kernel
More informationLearning Kernels -Tutorial Part III: Theoretical Guarantees.
Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationScale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract
Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses
More informationStrictly Positive Definite Functions on a Real Inner Product Space
Strictly Positive Definite Functions on a Real Inner Product Space Allan Pinkus Abstract. If ft) = a kt k converges for all t IR with all coefficients a k 0, then the function f< x, y >) is positive definite
More informationSVM Optimization for Lattice Kernels
SVM Optimization for Lattice Kernels Cyril Allauzen Google Research 76 Ninth Avenue New York, NY allauzen@google.com Corinna Cortes Google Research 76 Ninth Avenue New York, NY corinna@google.com Mehryar
More informationKernels MIT Course Notes
Kernels MIT 15.097 Course Notes Cynthia Rudin Credits: Bartlett, Schölkopf and Smola, Cristianini and Shawe-Taylor The kernel trick that I m going to show you applies much more broadly than SVM, but we
More informationDomain Adaptation for Regression
Domain Adaptation for Regression Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Motivation Applications: distinct training and test distributions.
More informationLearning with kernels and SVM
Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationThe Kernel Trick. Robert M. Haralick. Computer Science, Graduate Center City University of New York
The Kernel Trick Robert M. Haralick Computer Science, Graduate Center City University of New York Outline SVM Classification < (x 1, c 1 ),..., (x Z, c Z ) > is the training data c 1,..., c Z { 1, 1} specifies
More informationSupport Vector Machine & Its Applications
Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia
More informationKernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1
Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,
More informationAdvanced Machine Learning
Advanced Machine Learning Deep Boosting MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Model selection. Deep boosting. theory. algorithm. experiments. page 2 Model Selection Problem:
More informationStatistical learning on graphs
Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationAdvanced Machine Learning
Advanced Machine Learning Learning Kernels MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Kernel methods. Learning kernels scenario. learning bounds. algorithms. page 2 Machine Learning
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Andreas Maletti Technische Universität Dresden Fakultät Informatik June 15, 2006 1 The Problem 2 The Basics 3 The Proposed Solution Learning by Machines Learning
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationThe Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee
The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationWeighted Finite-State Transducers in Computational Biology
Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationIntroduction to Finite Automaton
Lecture 1 Introduction to Finite Automaton IIP-TL@NTU Lim Zhi Hao 2015 Lecture 1 Introduction to Finite Automata (FA) Intuition of FA Informally, it is a collection of a finite set of states and state
More informationSupport Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines II CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Outline Linear SVM hard margin Linear SVM soft margin Non-linear SVM Application Linear Support Vector Machine An optimization
More informationReferences. Lecture 7: Support Vector Machines. Optimum Margin Perceptron. Perceptron Learning Rule
References Lecture 7: Support Vector Machines Isabelle Guyon guyoni@inf.ethz.ch An training algorithm for optimal margin classifiers Boser-Guyon-Vapnik, COLT, 992 http://www.clopinet.com/isabelle/p apers/colt92.ps.z
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationSimilarity and kernels in machine learning
1/31 Similarity and kernels in machine learning Zalán Bodó Babeş Bolyai University, Cluj-Napoca/Kolozsvár Faculty of Mathematics and Computer Science MACS 2016 Eger, Hungary 2/31 Machine learning Overview
More informationTheory of Positive Definite Kernel and Reproducing Kernel Hilbert Space
Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationFinite-State Transducers
Finite-State Transducers - Seminar on Natural Language Processing - Michael Pradel July 6, 2007 Finite-state transducers play an important role in natural language processing. They provide a model for
More informationEECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels
EECS 598: Statistical Learning Theory, Winter 2014 Topic 11 Kernels Lecturer: Clayton Scott Scribe: Jun Guo, Soumik Chatterjee Disclaimer: These notes have not been subjected to the usual scrutiny reserved
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationSample Selection Bias Correction
Sample Selection Bias Correction Afshin Rostamizadeh Joint work with: Corinna Cortes, Mehryar Mohri & Michael Riley Courant Institute & Google Research Motivation Critical Assumption: Samples for training
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationA note on the generalization performance of kernel classifiers with margin. Theodoros Evgeniou and Massimiliano Pontil
MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 68 November 999 C.B.C.L
More informationReproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto
Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationLearning Linearly Separable Languages
Learning Linearly Separable Languages Leonid Kontorovich 1, Corinna Cortes 2, and Mehryar Mohri 3,2 1 Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213. 2 Google Research, 1440 Broadway,
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationSVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION
International Journal of Pure and Applied Mathematics Volume 87 No. 6 2013, 741-750 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v87i6.2
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationLearning with Imperfect Data
Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Joint work with: Yishay Mansour (Tel-Aviv & Google) and Afshin Rostamizadeh (Courant Institute). Standard Learning Assumptions IID assumption.
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationThe Representor Theorem, Kernels, and Hilbert Spaces
The Representor Theorem, Kernels, and Hilbert Spaces We will now work with infinite dimensional feature vectors and parameter vectors. The space l is defined to be the set of sequences f 1, f, f 3,...
More informationLinear Classification and SVM. Dr. Xin Zhang
Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationKernel Methods in Machine Learning
Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)
More informationSupport Vector Machines: Kernels
Support Vector Machines: Kernels CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 14.1, 14.2, 14.4 Schoelkopf/Smola Chapter 7.4, 7.6, 7.8 Non-Linear Problems
More informationKaggle.
Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationStatistical Natural Language Processing
199 CHAPTER 4 Statistical Natural Language Processing 4.0 Introduction............................. 199 4.1 Preliminaries............................. 200 4.2 Algorithms.............................. 201
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationbelow, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing
Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,
More informationBoosting Ensembles of Structured Prediction Rules
Boosting Ensembles of Structured Prediction Rules Corinna Cortes Google Research 76 Ninth Avenue New York, NY 10011 corinna@google.com Vitaly Kuznetsov Courant Institute 251 Mercer Street New York, NY
More informationAnalysis of Multiclass Support Vector Machines
Analysis of Multiclass Support Vector Machines Shigeo Abe Graduate School of Science and Technology Kobe University Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract Since support vector machines for pattern
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationA GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong
A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,
More information