Learning From Data Lecture 26 Kernel Machines

Similar documents
Learning From Data Lecture 25 The Kernel Trick

Learning From Data Lecture 10 Nonlinear Transforms

SVMs: nonlinearity through kernels

Support Vector Machine (SVM) and Kernel Methods

Soft-Margin Support Vector Machine

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machine (SVM) and Kernel Methods

COMS 4771 Introduction to Machine Learning. Nakul Verma

CIS 520: Machine Learning Oct 09, Kernel Methods

Deviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation

Kernel methods CSE 250B

Support Vector Machines and Kernel Methods

Statistical Pattern Recognition

Learning From Data Lecture 2 The Perceptron

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Support Vector Machine (continued)

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Perceptron Revisited: Linear Separators. Support Vector Machines

Introduction to Support Vector Machines

Foundations of Computer Science Lecture 23 Languages: What is Computation?

Linear Classifiers (Kernels)

Advanced Machine Learning & Perception

Learning From Data Lecture 5 Training Versus Testing

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Review: Support vector machines. Machine learning techniques and image analysis

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

Learning From Data Lecture 8 Linear Classification and Regression

Support Vector Machines

SVMs, Duality and the Kernel Trick

Lecture 10: Support Vector Machine and Large Margin Classifier

Introduction to Machine Learning

Support Vector Machines Explained

Low Bias Bagged Support Vector Machines

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Learning From Data Lecture 13 Validation and Model Selection

Support Vector Machines

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

Kernel Methods in Machine Learning

Support Vector Machines and Speaker Verification

Kernel Methods. Outline

Kaggle.

Lecture 7: Kernels for Classification and Regression

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

Linear & nonlinear classifiers

Support Vector Machines

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Kernel methods, kernel SVM and ridge regression

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Kernel Methods & Support Vector Machines

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Applied Machine Learning Annalisa Marsico

(Kernels +) Support Vector Machines

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Support Vector Machines

Support Vector Machines: Kernels

Learning From Data Lecture 3 Is Learning Feasible?

Advanced Introduction to Machine Learning

9.2 Support Vector Machines 159

CS798: Selected topics in Machine Learning

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Introduction to Machine Learning Midterm Exam

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Support vector machines Lecture 4

Lecture 10: A brief introduction to Support Vector Machine

Announcements. Proposals graded

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Learning From Data Lecture 9 Logistic Regression and Gradient Descent

Statistical Methods for SVM

Foundations of Computer Science Lecture 14 Advanced Counting

Learning with kernels and SVM

Classifier Complexity and Support Vector Classifiers

A Tutorial on Support Vector Machine

Support Vector Machines

Linear & nonlinear classifiers

TDT4173 Machine Learning

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

Support Vector Machine

Support Vector Machines

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Support Vector Machine

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!

CS-E4830 Kernel Methods in Machine Learning

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Kernels and the Kernel Trick. Machine Learning Fall 2017

Machine Learning, Midterm Exam

Kernel Methods. Barnabás Póczos

Kernel Ridge Regression. Mohammad Emtiyaz Khan EPFL Oct 27, 2015

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Learning From Data Lecture 12 Regularization

An introduction to Support Vector Machines

Transcription:

Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity Kernels in Different Applications M Magdon-Ismail CSCI 4100/6100

recap: The Kernel Allows Us to Bypass Z-space Solve the QP minimize α 1 2 αt Gα 1 t α subject to: y t α = 0 C α 0 α index s : C > α s > 0 free support vectors x n X K(, ) Overfitting Computation SVM Pseudo-inverse Inner products with Kernel K(, ) g(x) = sign α n>0 αn y nk(x n,x)+b b = y s α n >0 α n y nk(x n,x s ) (One can compute b for several SVs and average) high d complicated separator few support vectors low effective complexity Can go to high (infinite) d high d expensive or infeasible computation kernel computationally feasible to go to high d Can go to high (infinite) d c AM L Creator: Malik Magdon-Ismail Kernel Machines: 2 /12 Polynomial Kernel

Polynomial Kernel 2nd-Order Polynomial Kernel Φ(x) = x 1 x 2 x d x 2 1 x 2 2 x 2 d 2x1 x 2 2x1 x 3 2x1 x d 2x2 x 3 2xd 1 x d K(x,x ) = Φ(x) t Φ(x ) = = d d x i x i + i=1 i=1 ( ) 2 1 2 +xt x 1 4 computed quickly in X-space, in O(d) x 2 i x 2 i +2 i<j x i x j x i x j O(d 2 ) Q-th order polynomial kernel K(x,x ) = (r +x t x ) Q K(x,x ) = (x t x ) Q inhomogeneous kernel homogeneous kernel c AM L Creator: Malik Magdon-Ismail Kernel Machines: 3 /12 RBF-Kernel

RBF-Kernel One dimensional RBF-Kernel Φ(x) = e x2 1 2 1 1! x 2 2 2! x2 2 3 3! x3 2 4 4! x4 K(x,x ) = Φ(x) t Φ(x ) (2xx ) i = e x2 e x 2 i! i=0 = e x2 e x 2 e 2xx = e (x x ) 2 computed quickly in X-space, in O(d) not feasible Hard Margin (γ = 2000, C = ) Soft Margin (γ = 2000, C = 025) d-dimensional RBF-Kernel K(x,x ) = e γ x x 2 (γ > 0) Soft Margin (γ = 100, C = 025) c AM L Creator: Malik Magdon-Ismail Kernel Machines: 4 /12 RBF-Kernel Width

Choosing RBF-Kernel Width γ e γ x x 2 Small γ Medium γ Large γ! c AM L Creator: Malik Magdon-Ismail Kernel Machines: 5 /12 RBF-Kernel simulates k-rbf-network

RBF-Kernel Simulates k-rbf-network RBF-Kernel g(x) = sign αny n e x xn 2 +b αn >0 k-rbf-network k g(x) = sign w j e x µ j 2 +w 0 j=1 Centers are at support vectors Number of centers auto-determined Centers chosen to represent the data Number of centers k is an input c AM L Creator: Malik Magdon-Ismail Kernel Machines: 6 /12 Neural Network Kernel

Neural Network Kernel K(x,x ) = tanh(κ x t x +c) Neural Network Kernel g(x) = sign α n>0 αn y ntanh(κ x t n x+c)+b 2 Layer Neural Network ( m ) g(x) = sign w j tanh(v t j x)+w 0 j=1 First layer weights are support vectors Number of hidden nodes auto-determined First layer weights arbitrary Number of hidden nodes m is an input c AM L Creator: Malik Magdon-Ismail Kernel Machines: 7 /12 Inner product measures similarity

The Inner Product Measures Similarity K(x,x ) = z t z = z z cos(θ z,z ) = z z CosSim(z,z ) Normalizing for size, Kernel measures similarity between input vectors c AM L Creator: Malik Magdon-Ismail Kernel Machines: 8 /12 Designing Kernels

Designing Kernels Construct a similarity measure for the data A linear model should be plausible in that transformed space c AM L Creator: Malik Magdon-Ismail Kernel Machines: 9 /12 String Kernels

String Kernels Applications: DNA sequences, Text ACGGTGTCAAACGTGTCAGTGTG GTCGGGTCAAAACGTGAT Dear Sir, With reference to your letter dated 26th March, I want to confirm the Order No 34-09-10 placed on 3rd March, 2010 I would appreciate if you could send me the account details where the payment has to be made As per the invoice, we are entitled to a cash discount of 2% Can you please let us know whether it suits you if we make a wire transfer instead of a cheque? Dear Jane, I am terribly sorry to hear the news of your hip fracture I can only imagine what a terrible time you must be going through I hope you and the family are coping well If there is any help you need, don t hesitate to let me know Similar? Yes, if classifying spam versus non-spam No, if classifying business versus personal To design the kernel measure similarity between strings Bag of words (number of occurences of each atom) Co-occurrence of substrings or subsequences c AM L Creator: Malik Magdon-Ismail Kernel Machines: 10 /12 Graph Kernels

Graph Kernels Performing classification on: Graph structures (eg protein networks for function prediction) Graph nodes within a network (eg advertise of not to Facebook users) Similarity between graphs: random walks degree sequences, connectivity properties, mixing properties Measuring similarity between nodes: Looking at neighborhoods, K(v,v ) = N(v) N(v ) N(v) N(v ) c AM L Creator: Malik Magdon-Ismail Kernel Machines: 11 /12 Image Kernels

Image Kernels Similar? Yes - if trying to regcognize pictures with faces No - if trying to distinguish Malik from Christos c AM L Creator: Malik Magdon-Ismail Kernel Machines: 12 /12