A Kernel on Persistence Diagrams for Machine Learning
|
|
- Raymond Cooper
- 5 years ago
- Views:
Transcription
1 A Kernel on Persistence Diagrams for Machine Learning Jan Reininghaus 1 Stefan Huber 1 Roland Kwitt 2 Ulrich Bauer 1 1 Institute of Science and Technology Austria 2 FB Computer Science Universität Salzburg, Austria Toposys Annual Meeting IST Austria September 9, 2014 Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams 1 of 25
2 Pipeline in topological machine learning Task: shape retrieval Task: object recognition Task: texture recognition Topological data analysis Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Introduction 2 of 25
3 Pipeline in topological machine learning Task: shape retrieval SVM PCA Task: object recognition k-means Task: texture recognition Topological data analysis Machine learning Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Introduction 2 of 25
4 Pipeline in topological machine learning Task: shape retrieval SVM Kernel PCA Task: object recognition k-means Task: texture recognition Topological data analysis Machine learning We want a robust, multi-scale kernel on the set of persistence diagrams! Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Introduction 2 of 25
5 Background on kernels Definition Given a non-empty set X, a (positive definite) kernel is a symmetric, positive-definite function k : X X R, i.e. k(x, x ) = k(x, x) x, x X (1) [k(x i, x j )] n,n i,j=1,1 is pos. semi-def. x 1,..., x n X, n > 0. (2) Example: An inner product on any vector space V is a kernel. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Kernels 3 of 25
6 Background on kernels Theorem For any kernel k on a set X there exists a Hilbert space H and a mapping ϕ: X H with k(x, x ) = (ϕ(x), ϕ(x )) H x, x X. (3) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Kernels 4 of 25
7 Background on kernels: Examples Given a Vector space V with inner product (, ). k(x, x ) = (x, x ) is a kernel. k(x, x ) = (x, x ) d, with d 0, is a kernel. k(x, x ) = e λ x x 2 is a kernel for all λ > 0. Radial basis function kernel (RBF-kernel). Let k 1,..., k m be kernels on a set X. m k=1 λ ik i, with λ i 0, is a kernel. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Kernels 5 of 25
8 Background on kernels: Optimal assignment kernel Goal: A kernel on the set of finite point sets in R 2. Definition Let X, Y be two finite point sets in R 2. The optimal assignment kernel is given as {max π : X Y x X k(x, π(x)) for X Y k A (X, Y ) = max π : Y X y Y k(π(y), y) otherwise, (4) with k(, ) being a positive definite kernel for points in R 2. The optimal assignment kernel with k(x, y) = e λ x y 2 positive definite, i.e., not a kernel. and λ > 0 is not Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Kernels 6 of 25
9 Hilbert-space embedding Goal: A kernel on persistence diagrams, i.e., on finite multi-sets of points in R 2. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Hilbert space 7 of 25
10 Hilbert-space embedding Goal: A kernel on persistence diagrams, i.e., on finite multi-sets of points in R 2. Goal: An embedding Ψ: D H from the set D of persistence diagrams into a Hilbert space H. The inner product of H will serve as kernel: with F, G D. k(f, G) = (Ψ(F ), Ψ(G)) H Or we use the RBF-kernel on H, or a linear-combination of kernels, or... Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Hilbert space 7 of 25
11 Embedding a multi-set of points Given a persistence diagram D: Replace each point p D with a Dirac delta δ p centered at p: D H 2 (R 2 ): D p D δ p However, the induced metric in H 2 (R 2 ) is not stable. We apply heat diffusion with p D δ p as initial condition and Dirichlet boundary conditions at the diagonal. Solutions are in L2(R 2 ). Claim: Solutions are stable. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Hilbert space 8 of 25
12 Heat diffusion: Definition Let Ω = {(x, y) R 2 : y x}. Heat-diffusion PDE For a given diagram D we consider the solution u : Ω R 0 R of xx u + yy u = t u in Ω R >0, (5) u = 0 on Ω R 0, (6) u = δ p p D on Ω {0}. (7) At scale σ > 0, the embedding of diagrams is given as: Ψ σ (D) is in L 2 (Ω), that is, Ψ σ is an L 2 -embedding. Ψ σ (D) = u(,, σ) (8) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 9 of 25
13 Heat diffusion: The solution Heat-diffusion PDE For a given diagram D we consider the solution u : Ω R 0 R of xx u + yy u = t u in Ω R >0, u = 0 on Ω R 0, u = δ p on Ω {0}. p D Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 10 of 25
14 Heat diffusion: The solution Heat-diffusion PDE For a given diagram D we consider the solution u : R 2 R 0 R of xx u + yy u = t u in R 2 R >0, (9) u = p D δ p δ p on R 2 {0}. (10) Extend Ω to R 2 : Mirror and negate each δ p at diagonal. For p = (a, b), we set p = (b, a). Restricting its solution to Ω solves the original problem. Solution is given by convolution with Gaussian kernel. u(x, y, t) = 1 4πt p D e (x,y) p 2 4t e (x,y) p 2 4t (11) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 10 of 25
15 Heat diffusion: The solution The L 2 -embedding at scale σ results in: Ψ σ (D): Ω R: (x, y) 1 4πσ p D e (x,y) p 2 4σ e (x,y) p 2 4σ (12) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 11 of 25
16 Heat diffusion: The solution The L 2 -embedding at scale σ results in: Ψ σ (D): Ω R: (x, y) 1 4πσ p D e (x,y) p 2 4σ e (x,y) p 2 4σ (12) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 11 of 25
17 Heat diffusion: The solution The L 2 -embedding at scale σ results in: Ψ σ (D): Ω R: (x, y) 1 4πσ p D e (x,y) p 2 4σ e (x,y) p 2 4σ (12) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 11 of 25
18 Heat diffusion: The solution The L 2 -embedding at scale σ results in: Ψ σ (D): Ω R: (x, y) 1 4πσ p D e (x,y) p 2 4σ e (x,y) p 2 4σ (12) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 11 of 25
19 Heat diffusion: The solution The L 2 -embedding at scale σ results in: Ψ σ (D): Ω R: (x, y) 1 4πσ p D e (x,y) p 2 4σ e (x,y) p 2 4σ (12) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 11 of 25
20 The inner product Given two diagrams F and G. We compute the inner product explicitly: (Ψ σ (F ), Ψ σ (G)) = Ψ σ (F )Ψ σ (G) Ω = 1 8πσ p F q G e p q 2 8σ e p q 2 8σ (13) O( F G ) time, no approximation. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 12 of 25
21 Robustness Goal: We want to bound Ψ σ (F ) Ψ σ (G) L2 by W q (F, G). Observation A linear embedding Ψ: D V, with a normed vector space V, cannot be stable w.r.t W q for q {2,..., }. By linear we mean Ψ(F + G) = Ψ(F ) + Ψ(G) and Ψ( ) = 0. Set G = and F n = {u,..., u}. }{{} n times We have W q (F n, G) = min η : F n G q (u,v) η But Ψ(F n ) Ψ(G) V = n Ψ(F 1 ) V. u v q = q n W q (F 1, G). Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Robustness 13 of 25
22 Robustness Theorem For any two persistence diagrams F and G, and any σ > 0 we have Ψ σ (F ) Ψ σ (G) L2(Ω) 8 σ π W 1(F, G). (14) Multi-scale and noise: Noise on the input data causes W 1 (F, G) to increase. We can counteract by increasing σ. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Robustness 14 of 25
23 Robustness: Proof I Proof. Let η = arg min η : F G (u,v) η u v. Augmenting D with diagonal points leaves Ψσ(D) unchanged. We denote by N u (x) = 1 4πσ x u 2 e 4σ. Ψ σ (F ) Ψ σ (G) L2(Ω) = (N u N u ) (N v N v ) (u,v) η 2 N u N v L2(Ω) (u,v) η L2(Ω) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Robustness 15 of 25
24 Robustness: Proof II Using N u N v L2(Ω) = 1 4πσ 1 e u v 2 2 8σ Ψ σ (F ) Ψ σ (G) L2(Ω) 1 πσ we have (u,v) η 1 e u v 2 2 8σ. By the convexity of x e x, we have e λx 1 λx, and hence Ψ σ (F ) Ψ σ (G) L2(Ω) 1 πσ 1 σ 8π (u,v) η 1 σ 8π W 1(F, G) 1 8σ u v 2 2 (u,v) η u v Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Robustness 16 of 25
25 Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death birth Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25
26 Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death birth Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25
27 Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. λ k (x) death (a, b) x a b birth Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25
28 Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death (a, b) λ k (x) a b x Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25
29 Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death (a, b) λ k (x) λ 1 a b x Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25
30 Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death (a, b) λ k (x) λ 2 λ 1 a b x Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25
31 Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death (a, b) λ k (x) λ 2 λ 1 a λ 3 b x Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25
32 Persistence landscapes: Definition The persistence landscape of a persistence diagram can be interpreted as a function in L p (R 2 ). Let Ψ L : D L p (R 2 ): D λ(, ) denote this embedding. λ(k, x) 0 x 1 2 k... Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 18 of 25
33 Persistence landscapes: Stability Lemma (Bottleneck stability) For two persistence diagrams F and G it holds that Ψ L (F ) Ψ L (G) L (R 2 ) W (F, G) (15) Lemma (Weighted Wasserstein stability) For two persistence diagrams F and G it holds that Ψ L (F ) Ψ L (G) Lq(R 2 ) q min η : F G (u,v) η pers(u) u v q + 2 u v q+1 (16) q + 1 Ψ L : D L q (R 2 ) is not a linear embedding. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 19 of 25
34 Stability comparison: Experiment 1 Let G = and let F t = {( t, t)}, with t R 0. Ψ L (F t ) Ψ L (G) Lq t 1+ 1 q W q (F t, G) t 1 4πσ Ψ σ (F t ) Ψ σ (G) L2 1 e t2 8σ t Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 20 of 25
35 Stability comparison: Experiment 2 Let F t = {( t, t)} and G t = {( t + 1, t + 1)}, with t R 0. Ψ L (F t ) Ψ L (G t ) Lq t 1 q 1 1 4πσ 1 e 2 8σ W q (F t, G t ) O(1) Ψ σ (F t ) Ψ σ (G t ) L2 1 t Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 21 of 25
36 Stability comparison: Experiment 3 Persistence landscapes emphasize high-persistence features. Noise at high-persistence points can hide significant mid-range features. It becomes hard to distinguish between Class 1 and Class 2. Class 1 Class 2 t Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 22 of 25
37 Experiments: PSS- vs. Landscape-kernel SHREC 2014 dataset of shapes: 300 meshes, 20 classes (poses). Morse-filtration of heat-signature scalar field on each mesh. An additional scale parameter ( frequency ). 2 Tasks: Retrieval: Get the nearest neighbor in Hilbert space. Classification: Train a SVM with the kernel. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Applications 23 of 25
38 Experiments: Retrieval 80 Frequency=6, Dimension= NN PSS Landscape e e e e e e e e-01 Time Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Applications 24 of 25
39 Experiments: Classification Freq. PSS [%] Landscapes [%] Diff ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Table : Comparison of the PSS- vs. Landscape-kernel using 5-fold cross-validation on the SHREC2014 (Synthetic) dataset for dimension 1. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Applications 25 of 25
40 Bibliography I Bubenik, P. (2013). Statistical topological data analysis using persistence landscapes. arxiv, available at Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Applications 26 of 25
Learning from persistence diagrams
Learning from persistence diagrams Ulrich Bauer TUM July 7, 2015 ACAT final meeting IST Austria Joint work with: Jan Reininghaus, Stefan Huber, Roland Kwitt 1 / 28 ? 2 / 28 ? 2 / 28 3 / 28 3 / 28 3 / 28
More informationA Gaussian Type Kernel for Persistence Diagrams
TRIPODS Summer Bootcamp: Topology and Machine Learning Brown University, August 2018 A Gaussian Type Kernel for Persistence Diagrams Mathieu Carrière joint work with S. Oudot and M. Cuturi Persistence
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationKernels and the Kernel Trick. Machine Learning Fall 2017
Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationReview: Support vector machines. Machine learning techniques and image analysis
Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationKernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1
Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,
More informationThe Representor Theorem, Kernels, and Hilbert Spaces
The Representor Theorem, Kernels, and Hilbert Spaces We will now work with infinite dimensional feature vectors and parameter vectors. The space l is defined to be the set of sequences f 1, f, f 3,...
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationEach new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!
Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new
More informationElements of Positive Definite Kernel and Reproducing Kernel Hilbert Space
Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationSupport Vector Machines
Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification
More informationA Generalized Kernel Approach to Dissimilarity-based Classification
Journal of Machine Learning Research 2 (21) 175-211 Submitted 3/1; Published 12/1 A Generalized Kernel Approach to Dissimilarity-based Classification Elżbieta P ekalska Pavel Paclík Robert P.W. Duin Pattern
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationDeviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation
Deviations from linear separability Kernel methods CSE 250B Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Systematic deviation
More informationSupport Vector Learning for Ordinal Regression
Support Vector Learning for Ordinal Regression Ralf Herbrich, Thore Graepel, Klaus Obermayer Technical University of Berlin Department of Computer Science Franklinstr. 28/29 10587 Berlin ralfh graepel2
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationLinking non-binned spike train kernels to several existing spike train metrics
Linking non-binned spike train kernels to several existing spike train metrics Benjamin Schrauwen Jan Van Campenhout ELIS, Ghent University, Belgium Benjamin.Schrauwen@UGent.be Abstract. This work presents
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationKernel methods CSE 250B
Kernel methods CSE 250B Deviations from linear separability Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Deviations from
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationLecture 7: Kernels for Classification and Regression
Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More informationData dependent operators for the spatial-spectral fusion problem
Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.
More informationMachine Learning : Support Vector Machines
Machine Learning Support Vector Machines 05/01/2014 Machine Learning : Support Vector Machines Linear Classifiers (recap) A building block for almost all a mapping, a partitioning of the input space into
More informationJoint distribution optimal transportation for domain adaptation
Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationBasis Expansion and Nonlinear SVM. Kai Yu
Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion
More informationDistance Metric Learning
Distance Metric Learning Technical University of Munich Department of Informatics Computer Vision Group November 11, 2016 M.Sc. John Chiotellis: Distance Metric Learning 1 / 36 Outline Computer Vision
More informationSparse Kernel Machines - SVM
Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Support
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationThe Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationDiscriminative Learning and Big Data
AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationDistributions of Persistence Diagrams and Approximations
Distributions of Persistence Diagrams and Approximations Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August 31, 2018 Thanks V. Maroulas
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSynthesis of Maximum Margin and Multiview Learning using Unlabeled Data
Synthesis of Maximum Margin and Multiview Learning using Unlabeled Data Sandor Szedma 1 and John Shawe-Taylor 1 1 - Electronics and Computer Science, ISIS Group University of Southampton, SO17 1BJ, United
More informationScale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract
Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses
More informationL5 Support Vector Classification
L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationPrincipal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationKernel Methods in Machine Learning
Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationClassification on Pairwise Proximity Data
Classification on Pairwise Proximity Data Thore Graepel, Ralf Herbrich, Peter Bollmann-Sdorra y and Klaus Obermayer yy This paper is a submission to the Neural Information Processing Systems 1998. Technical
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationStructured Sparse Principal Component Analysis
Structured Sparse Principal Component Analysis R. Jenatton, G. Obozinski, and F. Bach AISTATS 2010 Discussion by Xuejun Liao September 9, 2010 Principal component analysis Given zero-mean data, the first
More informationLearning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More information5.6 Nonparametric Logistic Regression
5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that
More informationCITS 4402 Computer Vision
CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images
More informationKernel-based Feature Extraction under Maximum Margin Criterion
Kernel-based Feature Extraction under Maximum Margin Criterion Jiangping Wang, Jieyan Fan, Huanghuang Li, and Dapeng Wu 1 Department of Electrical and Computer Engineering, University of Florida, Gainesville,
More information(i) The optimisation problem solved is 1 min
STATISTICAL LEARNING IN PRACTICE Part III / Lent 208 Example Sheet 3 (of 4) By Dr T. Wang You have the option to submit your answers to Questions and 4 to be marked. If you want your answers to be marked,
More informationKaggle.
Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project
More informationSupport Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar
Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane
More informationWhat is semi-supervised learning?
What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,
More informationData Preprocessing Tasks
Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can
More informationSupport Vector Machine
Support Vector Machine Kernel: Kernel is defined as a function returning the inner product between the images of the two arguments k(x 1, x 2 ) = ϕ(x 1 ), ϕ(x 2 ) k(x 1, x 2 ) = k(x 2, x 1 ) modularity-
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationTopological Simplification in 3-Dimensions
Topological Simplification in 3-Dimensions David Letscher Saint Louis University Computational & Algorithmic Topology Sydney Letscher (SLU) Topological Simplification CATS 2017 1 / 33 Motivation Original
More informationFinite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product
Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )
More informationBayesian Linear Regression. Sargur Srihari
Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear
More informationKronecker Decomposition for Image Classification
university of innsbruck institute of computer science intelligent and interactive systems Kronecker Decomposition for Image Classification Sabrina Fontanella 1,2, Antonio Rodríguez-Sánchez 1, Justus Piater
More informationSupport Vector Machines: Kernels
Support Vector Machines: Kernels CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 14.1, 14.2, 14.4 Schoelkopf/Smola Chapter 7.4, 7.6, 7.8 Non-Linear Problems
More informationarxiv: v1 [cs.cv] 13 Jul 2017
Deep Learning with Topological Signatures Christoph Hofer Department of Computer Science University of Salzburg Salzburg, Austria, 5020 chofer@cosy.sbg.ac.at Roland Kwitt Department of Computer Science
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationMATHEMATICAL entities that do not form Euclidean
Kernel Methods on Riemannian Manifolds with Gaussian RBF Kernels Sadeep Jayasumana, Student Member, IEEE, Richard Hartley, Fellow, IEEE, Mathieu Salzmann, Member, IEEE, Hongdong Li, Member, IEEE, and Mehrtash
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationMultiple Kernel Learning
CS 678A Course Project Vivek Gupta, 1 Anurendra Kumar 2 Sup: Prof. Harish Karnick 1 1 Department of Computer Science and Engineering 2 Department of Electrical Engineering Indian Institute of Technology,
More informationLecture 10: A brief introduction to Support Vector Machine
Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationCausal Inference by Minimizing the Dual Norm of Bias. Nathan Kallus. Cornell University and Cornell Tech
Causal Inference by Minimizing the Dual Norm of Bias Nathan Kallus Cornell University and Cornell Tech www.nathankallus.com Matching Zoo It s a zoo of matching estimators for causal effects: PSM, NN, CM,
More information