A Kernel Between Sets of Vectors

Size: px

Start display at page:

Download "A Kernel Between Sets of Vectors"

Harvey May
5 years ago
Views:

1 A Kernel Between Sets of Vectors Risi Kondor Tony Jebara Columbia University, New York, USA. 1

2 A Kernel between Sets of Vectors In SVM, Gassian Processes, Kernel PCA, kernel K de nes feature map Φ : X H such that K(x, x ) = Φ(x), Φ(x ) algorithm becomes linear in H captures prior knowledge about domain crucial role in performance kernel engineering A new kernel between composite objects 2

3 Conventional Kernels between Images Representing N N images as vectors in R N 2. e.g. K(x, x ) = e x x 2 /(2σ 2) = exp [ (x i x ] i )2 2σ 2 Only sensitive to similarity between matching pixels, no sense of distance within image, sensitive to translations, rotations, etc.. 3

4 The Bag of Tuples Representation e.g. for images: set of (x, y) pairs for each foreground pixel, or set of (x, y, intensity) triplets x = {(3, 8), (2, 15), (6, 9), (14, 8),...} Similar natural bag of vectors representations exist for sequences, time series, etc.. 4

5 Cloud of Points in Feature Space Take a base kernel κ between tuples, and consider the feature map Φ : R N H satisfying κ(x, x ) = Φ(x), Φ(x ). e.g. κ = e x x 2 /(2σ 2 ) 5

6 Producing a Kernel Between Examples K(x, x ) =? Fit distributions p and p to x and x and de ne K(x, x ) = K(p, p ) 6

7 The Bhattacharyya Kernel between distributions p and p : K(p, p ) = p(x) p (x) dx related by H = 2 2K to Hellinger distance H(p, p ) = [ ( ) ] 2 1/2 p(x) p (x) dx. Positive de nite and symmetric (Mercer) by construction. Also K(x, x ) = 1. Invariant to permutations of vectors. 7

8 The Bhattacharyya Kernel between Normal Distributions p = N (µ, Σ) p = N (µ, Σ ) K(p, p ) = [ Σ p(x) p (x) dx = ] 1/2 exp ( 14 µ Σ 1 µ 14 µ Σ 1 µ + 12 µ Σ µ ) Σ 1/2 Σ 1/2 where Σ = ( 1 2 Σ Σ 1 ) 1 and µ = 1 2 Σ 1 µ Σ 1 µ. 8

9 Fitting Normal Distributions in the Original Image Space Limited representational power 9

10 Fitting Normal Distributions in Feature Space Regularized estimators: ˆµ = 1 k Φ(x i ) k ˆΣ reg = i=1 r l=1 v l λ l v l + η i ζ i ζ i where ζ 1, ζ 2,... is a basis and v 1,..., v r are rst r eigenvectors of ˆΣ = 1 k k (Φ(x i ) ˆµ)(Φ(x i ) ˆµ). i=1 10

11 Dirac bra-ket notation x = Φ(x) x = Φ(x) (bra) (ket) Inner product: x x = Φ(x), Φ(x ) = κ(x, x ) Bilinear forms: x i a i x i i 11

12 Finding v 1,..., v r with Kernel PCA Assume µ = 0. Want to solve: ˆΣ v l = λ v l ˆΣ = 1 k k x j x j [ matrix]. j=1 Observation: v = k i=1 α i x i. Multiplying by x l : 1 k k j=1 k x l x j x j x i α i = λ i=1 k x l x i α i. i=1 Reduces to Kα = kλα with K i,j = x i x j [k k matrix]. 12

13 The First Three Principal Components of R 13

14 Reconstruction from the rst 1,2 and 3 components intensity(x) e x Σ x 14

15 Reconstruction from the rst 3 components with regularization η = 0.01 η = 0.1 η = 1 intensity(x) e x Σ reg 1 x 15

16 Properties of Bhattacharyya Kernels with Regularization Smoothing η Graceful behavior under natural transformations such as translations/rotations; just rotates cloud in H 16

17 Relationship to Gaussian Processes What are ξ H? By f(x) = x ξ they are really images themselves (RKHS view). N (ˆµ, ˆΣ) de nes a distribution over such functions (images), in fact, a Gaussian Process with E[f(x)] = 1 k κ(x i, x) Cov(f(x), f(x )) = κ(x, ) ˆΣ κ(x, ). i Not the usual Bayesian Gaussian Process training procedure! 17

18 Experiment: crosses and squares SVM trained on 100 examples of images Classification Error σ 2 =1 σ 2 =2 σ 2 =4 σ 2 =8 σ 2 =16 Gaussian κ with r = 4 and η = Regularization C solid: Bhattacharyya kernel dotted: conventional RBF 18

19 Experiment: NIST digit recognition Arti cially hard problem: SVM on 10 classes (one vs. all); only 30 pixels sampled from each image and 12 images per class in training set; r = 10 and η = 0.1. Classification Error dot product (dotted) RBF sigma=1 (dash) RBF sigma=5 (dash) RBF sigma=10 (dash) Bhattach. sigma=1 Bhattach. sigma=5 Bhattach sigma= Regularization C solid: Bhattacharyya kernel dashed: conventional RBF 19

20 Summary Bag of vectors representation. Kernel trick employed on two levels (κ and K). Semiparamteric; Bhattacharyya kernel K(p, p ) = p(x) p (x) dx computable in closed form for Normal distribution, even in H. Graceful behavior under natural transformations Possibly applicable to many other data types, not just images: sequences, time series, 3D objects, proteins,... 20

21 References L. Wolf and A. Shashua: Kernel Principal Angles for Classi cation Machines with Applications to image Sequence Interpretation CVPR 2003 [very similar ideas developed independently]. T. Gärtner, P. A. Flach, A. Kowalczyk and A. J. Smola Multi-Instance Kernels ICML T. Jebara and R. Kondor: Bhattacharyya and Expected Likelihood Kernels COLT/KW A. Bhattacharyya: On a Measure of Divergence between two Statistical Populations De ned by their Probability Distributions Bull. Calcutta Math. Soc. 35 (1943). 21

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels