Kernel Methods. Konstantin Tretyakov MTAT Machine Learning
|
|
- Kenneth Stevens
- 5 years ago
- Views:
Transcription
1 Kernel Methods Konstantin Tretyakov MTAT Machine Learning
2 So far Supervised machine learning Linear models Non-linear models Unsupervised machine learning Generic scaffolding
3 So far Supervised machine learning Linear models Least squares regression, SVR Fisher s discriminant, Perceptron, Logistic model, SVM Non-linear models Neural networks, Decision trees, Association rules Unsupervised machine learning Clustering/EM, PCA Generic scaffolding Probabilistic modeling, ML/MAP estimation Performance evaluation, Statistical learning theory Linear algebra, Optimization methods
4 Coming up next Supervised machine learning Linear models Least squares regression, SVR Fisher s discriminant, Perceptron, Logistic model, SVM Non-linear models Neural networks, Decision trees, Association rules Kernel-XXX Unsupervised machine learning Clustering/EM, PCA, Kernel-XXX Generic scaffolding Probabilistic modeling, ML/MAP estimation Performance evaluation, Statistical learning theory Linear algebra, Optimization methods Kernels
5 Logistic regression, Perceptron, Max. margin, Fisher s discriminant, Linear regression, Ridge Regression, LASSO, : f x =
6 Logistic regression, Perceptron, Max. margin, Fisher s discriminant, Linear regression, Ridge Regression, LASSO, : f x = w T x + b
7 Logistic regression, Perceptron, Max. margin, Fisher s discriminant, Linear regression, Ridge Regression, LASSO, : f x = w T x + b PCA, LDA, ICA, : f x =
8 Logistic regression, Perceptron, Max. margin, Fisher s discriminant, Linear regression, Ridge Regression, LASSO, : f x = w T x + b PCA, LDA, ICA, : f x = Ax
9 Logistic regression, Perceptron, Max. margin, Fisher s discriminant, Linear regression, Ridge Regression, LASSO, : f x = w T x + b PCA, LDA, ICA, : f x = Ax K-means: c i = 1 m X i1 CCA, GLM,
10 Too much linear Logistic regression, Perceptron, Max. margin, Fisher s discriminant, Linear regression, Ridge Regression, LASSO, : f x = w T x + b PCA, LDA, ICA, : x T = Ax K-means: c i = 1 m X i1 CCA, GLM,
11 Linear is not enough Limited generalization ability
12 Linear is not enough Limited generalization ability
13 Linear is not enough Limited applicability Text? Ordinal/Nominal data? Graphs/Trees/Networks? Shapes? Graph nodes?
14 Solutions
15 Solutions Feature space Kernels
16 Solutions Feature space Nonlinear feature spaces Important idea #1 Kernels The Kernel Trick Dual representation Important idea #2 Important idea #3
17 f x = wx
18 x x φ x x, x 2, x 3
19 Nonlinear feature space x x φ x x, x 2, x 3 f x = w 1 x + w 2 x 2 + w 3 x 3
20
21 x φ x = (x, x 3 x)
22 x φ x = (x, x 3 x)
23 x φ x = (x, x 3 x)
24 Nonlinear feature space f x = w T φ(x)
25 +Support for arbitrary data types φ text = word counts φ graph = node degrees φ tree = path lengths
26 What if the dimensionality is high? x 1, x 2,, x m x 1 x 1, x 1 x 2,, x m x m
27 What if the dimensionality is high? x 1, x 2,, x m x 1 x 1, x 1 x 2,, x m x m O(m 2 ) elements For all k-wise products: O m k
28 The Kernel Trick Let φ x = (x 1 x 1, x 1 x 2,, x m x m ) Consider φ x, φ y = ij φ x ij φ y ij
29 The Kernel Trick Let φ x = (x 1 x 1, x 1 x 2,, x m x m ) Consider φ x, φ y = ij φ x ij φ y ij = ij x i x j y i y j
30 The Kernel Trick Let φ x = (x 1 x 1, x 1 x 2,, x m x m ) Consider φ x, φ y = ij φ x ij φ y ij = ij x i x j y i y j = ij x i y i x j y j
31 The Kernel Trick Let φ x = (x 1 x 1, x 1 x 2,, x m x m ) Consider φ x, φ y = ij φ x ij φ y ij = ij x i x j y i y j = ij x i y i x j y j = i x i y i j x j y j
32 The Kernel Trick Let φ x = (x 1 x 1, x 1 x 2,, x m x m ) Consider φ x, φ y = ij φ x ij φ y ij = = ij i x i x j y i y j = x i y i x j y j x i y i x j y j = ij 2 j i x i y i
33 The Kernel Trick Let φ x = (x 1 x 1, x 1 x 2,, x m x m ) Consider φ x, φ y = φ x ij φ y ij ij 2 = x i y i = x, y 2 i
34 The Kernel Trick Let φ x = (x 1 x 1, x 1 x 2,, x m x m ) Consider φ x, φ y = φ x ij φ y ij ij 2 = x i y i = x, y 2 i
35 The Kernel Trick Let φ x = (x 1 x 1, x 1 x 2,, x n x n ) Consider Polynomial kernel φ x, φ y = K x, y = ij = x i y i = x, y 2 i 2 φ x ij φ y ij x, y + R d
36 The Kernel Trick What about: K x, y = x, y x, y 2?
37 The Kernel Trick What about: K x, y = x, y x, y 2 = x i y i φ ij x φ ij (y) i ij
38 The Kernel Trick What about: K x, y = x, y x, y 2 x i y i φ ij x φ ij (y) i ij x 1,, x m, 0.5x 1 x 1,, 0.5x m x m, (y 1,, y m, 0.5y 1 y 1,, 0.5y m y m )
39 The Kernel Trick What about: K x, y = x, y x, y 2 x i y i φ ij x φ ij (y) i ij x 1,, x m, 0.5x 1 x 1,, 0.5x m x m, (y 1,, y m, 0.5y 1 y 1,, 0.5y m y m )
40 The Kernel Trick What about: K x, y = 1 + x, y x, y x, y x, y 4?
41 The Kernel Trick What about: K x, y = i=0 x, y i i!
42 The Kernel Trick K x, y = What about: i=0 x, y i = exp x, y i!
43 The Kernel Trick K x, y = What about: i=0 x, y i = exp x, y i! Infinite-dimensional feature space!
44 The Kernel Trick K x, y = What about: Gaussian kernel K x, y = i=0 x, y i i! = exp( γ x y 2 ) x y 2 = exp 2σ 2 = exp x, y Infinite-dimensional feature space!
45 The Kernel Trick K x, y = What about: i=0 x, y i Exponential kernel i! K x, y = exp = exp x, y x y 2σ 2 Infinite-dimensional feature space!
46 Kernels
47 Structured data kernels String kernels P-spectrum kernels All-subsequences kernels Gap-weighted subsequences kernels Graph & tree kernels Co-rooted subtrees All subtrees Random walks
48 Kernel A function K(x, y) is a kernel, if K x, y = φ x, φ y for some feature map φ.
49 Kernel matrix For a given kernel function K and a finite dataset (x 1, x 2,, x n ) the n n matrix K ij K x i, x j is called the kernel matrix.
50 Kernel matrix Let X be the data matrix, then K = XX T is the kernel matrix for the linear kernel K x, y = x T y
51 Kernel matrix Let X be the data matrix, then K = XX T is the kernel matrix for the linear kernel K x, y = x T y Let φ be a feature mapping. Then* K = φ X φ X T is the kernel matrix for the corresponding kernel K x, y = φ x, φ y.
52 Kernel theorem Not every function K is a kernel! Example?
53 Kernel theorem Not every function K is a kernel! e. g. K x, y = 1 is not Not every n n matrix is a Kernel matrix!
54 Kernel theorem Theorem: K is a kernel function K is symmetric positive semidefinite A function is positive semidefinite iff for any finite dataset {x 1, x 2,, x n } the corresponding kernel matrix is positive semidefinite.
55 Kernel closure
56 Kernel closure Feature space concatenation
57 Kernel closure Feature space scaling
58 Kernel closure Feature space tensor product
59 Kernel closure Feature map composition
60 Kernel normalization Let φ x = φ x φ x Then K x, y = φ x, φ y = φ x φ y φ x, φ y, = φ x φ y φ x 2 φ y = 2 K x, y = K x, x K y, y
61 Kernel matrix normalization Then K x, y = φ x, φ y = φ x φ y φ x, φ y, = φ x φ y φ x 2 φ y = 2 K x, y = K x, x K y, y K ij K ij K ii K jj
62 Kernel matrix centering x i x i 1 n k x k
63 Kernel matrix centering x i x i 1 n k x k
64 Kernel matrix centering x i x i 1 n k x k X X 1 n 1 n1 n T X
65 Kernel matrix centering x i x i 1 n k x k X X 1 n 1 n1 n T X XX T X 1 n 11T X X 1 n 11T X T
66 Kernel matrix centering x i x i 1 n k x k X X 1 n 1 n1 n T X XX T X 1 n 11T X X 1 n 11T X T XX T XX T 1 n 11T XX T 1 n XXT 11 T T XX T 11 T
67 Kernel matrix centering K cent XX T XX T 1 n 11T XX T 1 n XXT 11 T + 1 n 2 11T XX T 11 T K 1 n 11T K 1 n K11T + 1 n 2 11T K11 T
68 The Dual Representation Let A be the input space, and let B be the higher-dimensional feature space. Let φ: A B be the feature map. Fix a dataset {x 1, x 2,, x n } A Let w = i α i φ(x i ) B We say that α i are the dual coordinates for w.
69 Dual coordinates w = α i φ(x i ) = φ X T α = Ξ T α i Note that ΞΞ T = φ X φ X T = K Now we can do all of the useful stuff using dual coordinates only.
70 Dual coordinates Let Then w = Ξ T α u = Ξ T β 2w =
71 Dual coordinates Let Then w = Ξ T α u = Ξ T β 2w = Ξ T (2α)
72 Dual coordinates Let Then w = Ξ T α u = Ξ T β 2w = Ξ T 2α w + u =
73 Dual coordinates Let Then w = Ξ T α u = Ξ T β 2w = Ξ T 2α w + u = Ξ T (α + β)
74 Dual coordinates Let Then w = Ξ T α u = Ξ T β 2w = Ξ T 2α w + u = Ξ T α + β w, u =
75 Dual coordinates Let Then w = Ξ T α u = Ξ T β 2w = Ξ T 2α w + u = Ξ T α + β w, u = w T u = α T ΞΞ T β = α T Kβ
76 Dual coordinates Let Then w = Ξ T α u = Ξ T β 2w = Ξ T 2α w + u = Ξ T α + β w, u = w T u = αξξ T β = αkβ w u 2 =
77 Dual coordinates Let w = Ξ T α u = Ξ T β Then 2w = Ξ T 2α w + u = Ξ T α + β w, u = w T u = αξξ T β = αkβ w u 2 = w T w + u T u 2w T u =
78 Dual coordinates Let w = Ξ T α u = Ξ T β Then 2w = Ξ T 2α w + u = Ξ T α + β w, u = w T u = αξξ T β = αkβ w u 2 = w T w + u T u 2w T u =
79 Kernelization Recall the Perceptron:
80 Kernelization Recall the Perceptron: Initialize w 0 Find a misclassified example (x i, y i ) Update weights: w w + μy i x i b b + μy i
81 Kernelization Recall the Perceptron: Initialize w 0 α 0 Find a misclassified example (x i, y i ) Update weights: w w + μy i x i b b + μy i
82 Kernelization Recall the Perceptron: Initialize w 0 α 0 Find a misclassified example (x i, y i ) Update weights: w w + μy i x i α i α i + μy i b b + μy i
83 Kernelization Recall the Perceptron: Initialize α 0 Find a misclassified example (x i, y i ) Update weights: α i α i + μy i b b + μy i
84 Kernelization Recall the Perceptron: Initialize α 0 Find a misclassified example (x i, y i ) w T x i + b y i j α j x j T x i + b y i Update weights: α i α i + μy i b b + μy i
85 Kernelization Recall the Perceptron: Initialize α 0 Find a misclassified example (x i, y i ) w T x i + b y i K i α + b y i Update weights: α i α i + μy i b b + μy i
86 Kernelization Recall the Perceptron: Initialize α 0 Find a misclassified example (x i, y i ) K i α + b y i Update weights: α i α i + μy i b b + μy i
87 Quiz Today we heard three important ideas Important idea #1: Important idea #2: Important idea #3: Function/matrix K is a kernel function/matrix iff it is Dual representation: =
88 Quiz Those algoritms have kernelized versions:
89
Kernel Methods. Konstantin Tretyakov MTAT Machine Learning
Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Least squares regression, SVR Fisher s discriminant, Perceptron, Logistic model,
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationDeviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation
Deviations from linear separability Kernel methods CSE 250B Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Systematic deviation
More informationKernel methods CSE 250B
Kernel methods CSE 250B Deviations from linear separability Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Deviations from
More informationSVMs: nonlinearity through kernels
Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationLMS Algorithm Summary
LMS Algorithm Summary Step size tradeoff Other Iterative Algorithms LMS algorithm with variable step size: w(k+1) = w(k) + µ(k)e(k)x(k) When step size µ(k) = µ/k algorithm converges almost surely to optimal
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationKernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1
Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationReview: Support vector machines. Machine learning techniques and image analysis
Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationKernels and the Kernel Trick. Machine Learning Fall 2017
Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels
More informationParameter learning in CRF s
Parameter learning in CRF s June 01, 2009 Structured output learning We ish to learn a discriminant (or compatability) function: F : X Y R (1) here X is the space of inputs and Y is the space of outputs.
More informationLecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron
CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationReferences. Lecture 7: Support Vector Machines. Optimum Margin Perceptron. Perceptron Learning Rule
References Lecture 7: Support Vector Machines Isabelle Guyon guyoni@inf.ethz.ch An training algorithm for optimal margin classifiers Boser-Guyon-Vapnik, COLT, 992 http://www.clopinet.com/isabelle/p apers/colt92.ps.z
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationIntroduction to Machine Learning
Introduction to Machine Learning Kernel Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 21
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationLearning Methods for Linear Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationMachine Learning and Data Mining. Support Vector Machines. Kalev Kask
Machine Learning and Data Mining Support Vector Machines Kalev Kask Linear classifiers Which decision boundary is better? Both have zero training error (perfect training accuracy) But, one of them seems
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationMachine Learning: The Perceptron. Lecture 06
Machine Learning: he Perceptron Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu 1 McCulloch-Pitts Neuron Function 0 1 w 0 activation / output function 1 w 1 w w
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationEach new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!
Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new
More informationSVMs, Duality and the Kernel Trick
SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationCourse in Data Science
Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationMachine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary
More informationSupport Vector Machines
Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationMachine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)
Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More information9.2 Support Vector Machines 159
9.2 Support Vector Machines 159 9.2.3 Kernel Methods We have all the tools together now to make an exciting step. Let us summarize our findings. We are interested in regularized estimation problems of
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationSupport Vector Machines. Maximizing the Margin
Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the
More informationKaggle.
Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationKernel PCA, clustering and canonical correlation analysis
ernel PCA, clustering and canonical correlation analsis Le Song Machine Learning II: Advanced opics CSE 8803ML, Spring 2012 Support Vector Machines (SVM) 1 min w 2 w w + C j ξ j s. t. w j + b j 1 ξ j,
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationSupport Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar
Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationKernel Methods & Support Vector Machines
Kernel Methods & Support Vector Machines Mahdi pakdaman Naeini PhD Candidate, University of Tehran Senior Researcher, TOSAN Intelligent Data Miners Outline Motivation Introduction to pattern recognition
More informationLecture 10: A brief introduction to Support Vector Machine
Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationFoundation of Intelligent Systems, Part I. SVM s & Kernel Methods
Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationLecture 11. Kernel Methods
Lecture 11. Kernel Methods COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture The kernel trick Efficient computation of a dot product
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationLearning with kernels and SVM
Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find
More informationLinear Classification and SVM. Dr. Xin Zhang
Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationMachine Learning & SVM
Machine Learning & SVM Shannon "Information is any difference that makes a difference. Bateman " It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible
More information