Learning From Data Lecture 26 Kernel Machines

Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity Kernels in Different Applications M Magdon-Ismail CSCI 4100/6100

recap: The Kernel Allows Us to Bypass Z-space Solve the QP minimize α 1 2 αt Gα 1 t α subject to: y t α = 0 C α 0 α index s : C > α s > 0 free support vectors x n X K(, ) Overfitting Computation SVM Pseudo-inverse Inner products with Kernel K(, ) g(x) = sign α n>0 αn y nk(x n,x)+b b = y s α n >0 α n y nk(x n,x s ) (One can compute b for several SVs and average) high d complicated separator few support vectors low effective complexity Can go to high (infinite) d high d expensive or infeasible computation kernel computationally feasible to go to high d Can go to high (infinite) d c AM L Creator: Malik Magdon-Ismail Kernel Machines: 2 /12 Polynomial Kernel

Polynomial Kernel 2nd-Order Polynomial Kernel Φ(x) = x 1 x 2 x d x 2 1 x 2 2 x 2 d 2x1 x 2 2x1 x 3 2x1 x d 2x2 x 3 2xd 1 x d K(x,x ) = Φ(x) t Φ(x ) = = d d x i x i + i=1 i=1 ( ) 2 1 2 +xt x 1 4 computed quickly in X-space, in O(d) x 2 i x 2 i +2 i<j x i x j x i x j O(d 2 ) Q-th order polynomial kernel K(x,x ) = (r +x t x ) Q K(x,x ) = (x t x ) Q inhomogeneous kernel homogeneous kernel c AM L Creator: Malik Magdon-Ismail Kernel Machines: 3 /12 RBF-Kernel

RBF-Kernel One dimensional RBF-Kernel Φ(x) = e x2 1 2 1 1! x 2 2 2! x2 2 3 3! x3 2 4 4! x4 K(x,x ) = Φ(x) t Φ(x ) (2xx ) i = e x2 e x 2 i! i=0 = e x2 e x 2 e 2xx = e (x x ) 2 computed quickly in X-space, in O(d) not feasible Hard Margin (γ = 2000, C = ) Soft Margin (γ = 2000, C = 025) d-dimensional RBF-Kernel K(x,x ) = e γ x x 2 (γ > 0) Soft Margin (γ = 100, C = 025) c AM L Creator: Malik Magdon-Ismail Kernel Machines: 4 /12 RBF-Kernel Width

Choosing RBF-Kernel Width γ e γ x x 2 Small γ Medium γ Large γ! c AM L Creator: Malik Magdon-Ismail Kernel Machines: 5 /12 RBF-Kernel simulates k-rbf-network

RBF-Kernel Simulates k-rbf-network RBF-Kernel g(x) = sign αny n e x xn 2 +b αn >0 k-rbf-network k g(x) = sign w j e x µ j 2 +w 0 j=1 Centers are at support vectors Number of centers auto-determined Centers chosen to represent the data Number of centers k is an input c AM L Creator: Malik Magdon-Ismail Kernel Machines: 6 /12 Neural Network Kernel

Neural Network Kernel K(x,x ) = tanh(κ x t x +c) Neural Network Kernel g(x) = sign α n>0 αn y ntanh(κ x t n x+c)+b 2 Layer Neural Network ( m ) g(x) = sign w j tanh(v t j x)+w 0 j=1 First layer weights are support vectors Number of hidden nodes auto-determined First layer weights arbitrary Number of hidden nodes m is an input c AM L Creator: Malik Magdon-Ismail Kernel Machines: 7 /12 Inner product measures similarity

The Inner Product Measures Similarity K(x,x ) = z t z = z z cos(θ z,z ) = z z CosSim(z,z ) Normalizing for size, Kernel measures similarity between input vectors c AM L Creator: Malik Magdon-Ismail Kernel Machines: 8 /12 Designing Kernels

Designing Kernels Construct a similarity measure for the data A linear model should be plausible in that transformed space c AM L Creator: Malik Magdon-Ismail Kernel Machines: 9 /12 String Kernels

String Kernels Applications: DNA sequences, Text ACGGTGTCAAACGTGTCAGTGTG GTCGGGTCAAAACGTGAT Dear Sir, With reference to your letter dated 26th March, I want to confirm the Order No 34-09-10 placed on 3rd March, 2010 I would appreciate if you could send me the account details where the payment has to be made As per the invoice, we are entitled to a cash discount of 2% Can you please let us know whether it suits you if we make a wire transfer instead of a cheque? Dear Jane, I am terribly sorry to hear the news of your hip fracture I can only imagine what a terrible time you must be going through I hope you and the family are coping well If there is any help you need, don t hesitate to let me know Similar? Yes, if classifying spam versus non-spam No, if classifying business versus personal To design the kernel measure similarity between strings Bag of words (number of occurences of each atom) Co-occurrence of substrings or subsequences c AM L Creator: Malik Magdon-Ismail Kernel Machines: 10 /12 Graph Kernels

Graph Kernels Performing classification on: Graph structures (eg protein networks for function prediction) Graph nodes within a network (eg advertise of not to Facebook users) Similarity between graphs: random walks degree sequences, connectivity properties, mixing properties Measuring similarity between nodes: Looking at neighborhoods, K(v,v ) = N(v) N(v ) N(v) N(v ) c AM L Creator: Malik Magdon-Ismail Kernel Machines: 11 /12 Image Kernels

Image Kernels Similar? Yes - if trying to regcognize pictures with faces No - if trying to distinguish Malik from Christos c AM L Creator: Malik Magdon-Ismail Kernel Machines: 12 /12