A Kernel on Persistence Diagrams for Machine Learning

Size: px
Start display at page:

Download "A Kernel on Persistence Diagrams for Machine Learning"

Transcription

1 A Kernel on Persistence Diagrams for Machine Learning Jan Reininghaus 1 Stefan Huber 1 Roland Kwitt 2 Ulrich Bauer 1 1 Institute of Science and Technology Austria 2 FB Computer Science Universität Salzburg, Austria Toposys Annual Meeting IST Austria September 9, 2014 Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams 1 of 25

2 Pipeline in topological machine learning Task: shape retrieval Task: object recognition Task: texture recognition Topological data analysis Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Introduction 2 of 25

3 Pipeline in topological machine learning Task: shape retrieval SVM PCA Task: object recognition k-means Task: texture recognition Topological data analysis Machine learning Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Introduction 2 of 25

4 Pipeline in topological machine learning Task: shape retrieval SVM Kernel PCA Task: object recognition k-means Task: texture recognition Topological data analysis Machine learning We want a robust, multi-scale kernel on the set of persistence diagrams! Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Introduction 2 of 25

5 Background on kernels Definition Given a non-empty set X, a (positive definite) kernel is a symmetric, positive-definite function k : X X R, i.e. k(x, x ) = k(x, x) x, x X (1) [k(x i, x j )] n,n i,j=1,1 is pos. semi-def. x 1,..., x n X, n > 0. (2) Example: An inner product on any vector space V is a kernel. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Kernels 3 of 25

6 Background on kernels Theorem For any kernel k on a set X there exists a Hilbert space H and a mapping ϕ: X H with k(x, x ) = (ϕ(x), ϕ(x )) H x, x X. (3) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Kernels 4 of 25

7 Background on kernels: Examples Given a Vector space V with inner product (, ). k(x, x ) = (x, x ) is a kernel. k(x, x ) = (x, x ) d, with d 0, is a kernel. k(x, x ) = e λ x x 2 is a kernel for all λ > 0. Radial basis function kernel (RBF-kernel). Let k 1,..., k m be kernels on a set X. m k=1 λ ik i, with λ i 0, is a kernel. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Kernels 5 of 25

8 Background on kernels: Optimal assignment kernel Goal: A kernel on the set of finite point sets in R 2. Definition Let X, Y be two finite point sets in R 2. The optimal assignment kernel is given as {max π : X Y x X k(x, π(x)) for X Y k A (X, Y ) = max π : Y X y Y k(π(y), y) otherwise, (4) with k(, ) being a positive definite kernel for points in R 2. The optimal assignment kernel with k(x, y) = e λ x y 2 positive definite, i.e., not a kernel. and λ > 0 is not Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Kernels 6 of 25

9 Hilbert-space embedding Goal: A kernel on persistence diagrams, i.e., on finite multi-sets of points in R 2. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Hilbert space 7 of 25

10 Hilbert-space embedding Goal: A kernel on persistence diagrams, i.e., on finite multi-sets of points in R 2. Goal: An embedding Ψ: D H from the set D of persistence diagrams into a Hilbert space H. The inner product of H will serve as kernel: with F, G D. k(f, G) = (Ψ(F ), Ψ(G)) H Or we use the RBF-kernel on H, or a linear-combination of kernels, or... Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Hilbert space 7 of 25

11 Embedding a multi-set of points Given a persistence diagram D: Replace each point p D with a Dirac delta δ p centered at p: D H 2 (R 2 ): D p D δ p However, the induced metric in H 2 (R 2 ) is not stable. We apply heat diffusion with p D δ p as initial condition and Dirichlet boundary conditions at the diagonal. Solutions are in L2(R 2 ). Claim: Solutions are stable. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Hilbert space 8 of 25

12 Heat diffusion: Definition Let Ω = {(x, y) R 2 : y x}. Heat-diffusion PDE For a given diagram D we consider the solution u : Ω R 0 R of xx u + yy u = t u in Ω R >0, (5) u = 0 on Ω R 0, (6) u = δ p p D on Ω {0}. (7) At scale σ > 0, the embedding of diagrams is given as: Ψ σ (D) is in L 2 (Ω), that is, Ψ σ is an L 2 -embedding. Ψ σ (D) = u(,, σ) (8) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 9 of 25

13 Heat diffusion: The solution Heat-diffusion PDE For a given diagram D we consider the solution u : Ω R 0 R of xx u + yy u = t u in Ω R >0, u = 0 on Ω R 0, u = δ p on Ω {0}. p D Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 10 of 25

14 Heat diffusion: The solution Heat-diffusion PDE For a given diagram D we consider the solution u : R 2 R 0 R of xx u + yy u = t u in R 2 R >0, (9) u = p D δ p δ p on R 2 {0}. (10) Extend Ω to R 2 : Mirror and negate each δ p at diagonal. For p = (a, b), we set p = (b, a). Restricting its solution to Ω solves the original problem. Solution is given by convolution with Gaussian kernel. u(x, y, t) = 1 4πt p D e (x,y) p 2 4t e (x,y) p 2 4t (11) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 10 of 25

15 Heat diffusion: The solution The L 2 -embedding at scale σ results in: Ψ σ (D): Ω R: (x, y) 1 4πσ p D e (x,y) p 2 4σ e (x,y) p 2 4σ (12) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 11 of 25

16 Heat diffusion: The solution The L 2 -embedding at scale σ results in: Ψ σ (D): Ω R: (x, y) 1 4πσ p D e (x,y) p 2 4σ e (x,y) p 2 4σ (12) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 11 of 25

17 Heat diffusion: The solution The L 2 -embedding at scale σ results in: Ψ σ (D): Ω R: (x, y) 1 4πσ p D e (x,y) p 2 4σ e (x,y) p 2 4σ (12) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 11 of 25

18 Heat diffusion: The solution The L 2 -embedding at scale σ results in: Ψ σ (D): Ω R: (x, y) 1 4πσ p D e (x,y) p 2 4σ e (x,y) p 2 4σ (12) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 11 of 25

19 Heat diffusion: The solution The L 2 -embedding at scale σ results in: Ψ σ (D): Ω R: (x, y) 1 4πσ p D e (x,y) p 2 4σ e (x,y) p 2 4σ (12) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 11 of 25

20 The inner product Given two diagrams F and G. We compute the inner product explicitly: (Ψ σ (F ), Ψ σ (G)) = Ψ σ (F )Ψ σ (G) Ω = 1 8πσ p F q G e p q 2 8σ e p q 2 8σ (13) O( F G ) time, no approximation. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 12 of 25

21 Robustness Goal: We want to bound Ψ σ (F ) Ψ σ (G) L2 by W q (F, G). Observation A linear embedding Ψ: D V, with a normed vector space V, cannot be stable w.r.t W q for q {2,..., }. By linear we mean Ψ(F + G) = Ψ(F ) + Ψ(G) and Ψ( ) = 0. Set G = and F n = {u,..., u}. }{{} n times We have W q (F n, G) = min η : F n G q (u,v) η But Ψ(F n ) Ψ(G) V = n Ψ(F 1 ) V. u v q = q n W q (F 1, G). Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Robustness 13 of 25

22 Robustness Theorem For any two persistence diagrams F and G, and any σ > 0 we have Ψ σ (F ) Ψ σ (G) L2(Ω) 8 σ π W 1(F, G). (14) Multi-scale and noise: Noise on the input data causes W 1 (F, G) to increase. We can counteract by increasing σ. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Robustness 14 of 25

23 Robustness: Proof I Proof. Let η = arg min η : F G (u,v) η u v. Augmenting D with diagonal points leaves Ψσ(D) unchanged. We denote by N u (x) = 1 4πσ x u 2 e 4σ. Ψ σ (F ) Ψ σ (G) L2(Ω) = (N u N u ) (N v N v ) (u,v) η 2 N u N v L2(Ω) (u,v) η L2(Ω) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Robustness 15 of 25

24 Robustness: Proof II Using N u N v L2(Ω) = 1 4πσ 1 e u v 2 2 8σ Ψ σ (F ) Ψ σ (G) L2(Ω) 1 πσ we have (u,v) η 1 e u v 2 2 8σ. By the convexity of x e x, we have e λx 1 λx, and hence Ψ σ (F ) Ψ σ (G) L2(Ω) 1 πσ 1 σ 8π (u,v) η 1 σ 8π W 1(F, G) 1 8σ u v 2 2 (u,v) η u v Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Robustness 16 of 25

25 Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death birth Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25

26 Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death birth Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25

27 Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. λ k (x) death (a, b) x a b birth Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25

28 Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death (a, b) λ k (x) a b x Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25

29 Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death (a, b) λ k (x) λ 1 a b x Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25

30 Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death (a, b) λ k (x) λ 2 λ 1 a b x Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25

31 Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death (a, b) λ k (x) λ 2 λ 1 a λ 3 b x Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25

32 Persistence landscapes: Definition The persistence landscape of a persistence diagram can be interpreted as a function in L p (R 2 ). Let Ψ L : D L p (R 2 ): D λ(, ) denote this embedding. λ(k, x) 0 x 1 2 k... Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 18 of 25

33 Persistence landscapes: Stability Lemma (Bottleneck stability) For two persistence diagrams F and G it holds that Ψ L (F ) Ψ L (G) L (R 2 ) W (F, G) (15) Lemma (Weighted Wasserstein stability) For two persistence diagrams F and G it holds that Ψ L (F ) Ψ L (G) Lq(R 2 ) q min η : F G (u,v) η pers(u) u v q + 2 u v q+1 (16) q + 1 Ψ L : D L q (R 2 ) is not a linear embedding. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 19 of 25

34 Stability comparison: Experiment 1 Let G = and let F t = {( t, t)}, with t R 0. Ψ L (F t ) Ψ L (G) Lq t 1+ 1 q W q (F t, G) t 1 4πσ Ψ σ (F t ) Ψ σ (G) L2 1 e t2 8σ t Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 20 of 25

35 Stability comparison: Experiment 2 Let F t = {( t, t)} and G t = {( t + 1, t + 1)}, with t R 0. Ψ L (F t ) Ψ L (G t ) Lq t 1 q 1 1 4πσ 1 e 2 8σ W q (F t, G t ) O(1) Ψ σ (F t ) Ψ σ (G t ) L2 1 t Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 21 of 25

36 Stability comparison: Experiment 3 Persistence landscapes emphasize high-persistence features. Noise at high-persistence points can hide significant mid-range features. It becomes hard to distinguish between Class 1 and Class 2. Class 1 Class 2 t Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 22 of 25

37 Experiments: PSS- vs. Landscape-kernel SHREC 2014 dataset of shapes: 300 meshes, 20 classes (poses). Morse-filtration of heat-signature scalar field on each mesh. An additional scale parameter ( frequency ). 2 Tasks: Retrieval: Get the nearest neighbor in Hilbert space. Classification: Train a SVM with the kernel. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Applications 23 of 25

38 Experiments: Retrieval 80 Frequency=6, Dimension= NN PSS Landscape e e e e e e e e-01 Time Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Applications 24 of 25

39 Experiments: Classification Freq. PSS [%] Landscapes [%] Diff ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Table : Comparison of the PSS- vs. Landscape-kernel using 5-fold cross-validation on the SHREC2014 (Synthetic) dataset for dimension 1. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Applications 25 of 25

40 Bibliography I Bubenik, P. (2013). Statistical topological data analysis using persistence landscapes. arxiv, available at Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Applications 26 of 25

Learning from persistence diagrams

Learning from persistence diagrams Learning from persistence diagrams Ulrich Bauer TUM July 7, 2015 ACAT final meeting IST Austria Joint work with: Jan Reininghaus, Stefan Huber, Roland Kwitt 1 / 28 ? 2 / 28 ? 2 / 28 3 / 28 3 / 28 3 / 28

More information

A Gaussian Type Kernel for Persistence Diagrams

A Gaussian Type Kernel for Persistence Diagrams TRIPODS Summer Bootcamp: Topology and Machine Learning Brown University, August 2018 A Gaussian Type Kernel for Persistence Diagrams Mathieu Carrière joint work with S. Oudot and M. Cuturi Persistence

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Kernels and the Kernel Trick. Machine Learning Fall 2017

Kernels and the Kernel Trick. Machine Learning Fall 2017 Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

CS798: Selected topics in Machine Learning

CS798: Selected topics in Machine Learning CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

Review: Support vector machines. Machine learning techniques and image analysis

Review: Support vector machines. Machine learning techniques and image analysis Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1 Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,

More information

The Representor Theorem, Kernels, and Hilbert Spaces

The Representor Theorem, Kernels, and Hilbert Spaces The Representor Theorem, Kernels, and Hilbert Spaces We will now work with infinite dimensional feature vectors and parameter vectors. The space l is defined to be the set of sequences f 1, f, f 3,...

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up! Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new

More information

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Support Vector Machines

Support Vector Machines Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification

More information

A Generalized Kernel Approach to Dissimilarity-based Classification

A Generalized Kernel Approach to Dissimilarity-based Classification Journal of Machine Learning Research 2 (21) 175-211 Submitted 3/1; Published 12/1 A Generalized Kernel Approach to Dissimilarity-based Classification Elżbieta P ekalska Pavel Paclík Robert P.W. Duin Pattern

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Deviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation

Deviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation Deviations from linear separability Kernel methods CSE 250B Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Systematic deviation

More information

Support Vector Learning for Ordinal Regression

Support Vector Learning for Ordinal Regression Support Vector Learning for Ordinal Regression Ralf Herbrich, Thore Graepel, Klaus Obermayer Technical University of Berlin Department of Computer Science Franklinstr. 28/29 10587 Berlin ralfh graepel2

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Linking non-binned spike train kernels to several existing spike train metrics

Linking non-binned spike train kernels to several existing spike train metrics Linking non-binned spike train kernels to several existing spike train metrics Benjamin Schrauwen Jan Van Campenhout ELIS, Ghent University, Belgium Benjamin.Schrauwen@UGent.be Abstract. This work presents

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Kernel methods CSE 250B

Kernel methods CSE 250B Kernel methods CSE 250B Deviations from linear separability Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Deviations from

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Lecture 7: Kernels for Classification and Regression

Lecture 7: Kernels for Classification and Regression Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.

More information

Machine Learning : Support Vector Machines

Machine Learning : Support Vector Machines Machine Learning Support Vector Machines 05/01/2014 Machine Learning : Support Vector Machines Linear Classifiers (recap) A building block for almost all a mapping, a partitioning of the input space into

More information

Joint distribution optimal transportation for domain adaptation

Joint distribution optimal transportation for domain adaptation Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

ML (cont.): SUPPORT VECTOR MACHINES

ML (cont.): SUPPORT VECTOR MACHINES ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information

Distance Metric Learning

Distance Metric Learning Distance Metric Learning Technical University of Munich Department of Informatics Computer Vision Group November 11, 2016 M.Sc. John Chiotellis: Distance Metric Learning 1 / 36 Outline Computer Vision

More information

Sparse Kernel Machines - SVM

Sparse Kernel Machines - SVM Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Support

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

Discriminative Learning and Big Data

Discriminative Learning and Big Data AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Distributions of Persistence Diagrams and Approximations

Distributions of Persistence Diagrams and Approximations Distributions of Persistence Diagrams and Approximations Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August 31, 2018 Thanks V. Maroulas

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Synthesis of Maximum Margin and Multiview Learning using Unlabeled Data

Synthesis of Maximum Margin and Multiview Learning using Unlabeled Data Synthesis of Maximum Margin and Multiview Learning using Unlabeled Data Sandor Szedma 1 and John Shawe-Taylor 1 1 - Electronics and Computer Science, ISIS Group University of Southampton, SO17 1BJ, United

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

Kernel Methods in Machine Learning

Kernel Methods in Machine Learning Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Classification on Pairwise Proximity Data

Classification on Pairwise Proximity Data Classification on Pairwise Proximity Data Thore Graepel, Ralf Herbrich, Peter Bollmann-Sdorra y and Klaus Obermayer yy This paper is a submission to the Neural Information Processing Systems 1998. Technical

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Structured Sparse Principal Component Analysis

Structured Sparse Principal Component Analysis Structured Sparse Principal Component Analysis R. Jenatton, G. Obozinski, and F. Bach AISTATS 2010 Discussion by Xuejun Liao September 9, 2010 Principal component analysis Given zero-mean data, the first

More information

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31 Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

5.6 Nonparametric Logistic Regression

5.6 Nonparametric Logistic Regression 5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that

More information

CITS 4402 Computer Vision

CITS 4402 Computer Vision CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images

More information

Kernel-based Feature Extraction under Maximum Margin Criterion

Kernel-based Feature Extraction under Maximum Margin Criterion Kernel-based Feature Extraction under Maximum Margin Criterion Jiangping Wang, Jieyan Fan, Huanghuang Li, and Dapeng Wu 1 Department of Electrical and Computer Engineering, University of Florida, Gainesville,

More information

(i) The optimisation problem solved is 1 min

(i) The optimisation problem solved is 1 min STATISTICAL LEARNING IN PRACTICE Part III / Lent 208 Example Sheet 3 (of 4) By Dr T. Wang You have the option to submit your answers to Questions and 4 to be marked. If you want your answers to be marked,

More information

Kaggle.

Kaggle. Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project

More information

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane

More information

What is semi-supervised learning?

What is semi-supervised learning? What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Kernel: Kernel is defined as a function returning the inner product between the images of the two arguments k(x 1, x 2 ) = ϕ(x 1 ), ϕ(x 2 ) k(x 1, x 2 ) = k(x 2, x 1 ) modularity-

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Topological Simplification in 3-Dimensions

Topological Simplification in 3-Dimensions Topological Simplification in 3-Dimensions David Letscher Saint Louis University Computational & Algorithmic Topology Sydney Letscher (SLU) Topological Simplification CATS 2017 1 / 33 Motivation Original

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

Bayesian Linear Regression. Sargur Srihari

Bayesian Linear Regression. Sargur Srihari Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear

More information

Kronecker Decomposition for Image Classification

Kronecker Decomposition for Image Classification university of innsbruck institute of computer science intelligent and interactive systems Kronecker Decomposition for Image Classification Sabrina Fontanella 1,2, Antonio Rodríguez-Sánchez 1, Justus Piater

More information

Support Vector Machines: Kernels

Support Vector Machines: Kernels Support Vector Machines: Kernels CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 14.1, 14.2, 14.4 Schoelkopf/Smola Chapter 7.4, 7.6, 7.8 Non-Linear Problems

More information

arxiv: v1 [cs.cv] 13 Jul 2017

arxiv: v1 [cs.cv] 13 Jul 2017 Deep Learning with Topological Signatures Christoph Hofer Department of Computer Science University of Salzburg Salzburg, Austria, 5020 chofer@cosy.sbg.ac.at Roland Kwitt Department of Computer Science

More information

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems

More information

MATHEMATICAL entities that do not form Euclidean

MATHEMATICAL entities that do not form Euclidean Kernel Methods on Riemannian Manifolds with Gaussian RBF Kernels Sadeep Jayasumana, Student Member, IEEE, Richard Hartley, Fellow, IEEE, Mathieu Salzmann, Member, IEEE, Hongdong Li, Member, IEEE, and Mehrtash

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

Multiple Kernel Learning

Multiple Kernel Learning CS 678A Course Project Vivek Gupta, 1 Anurendra Kumar 2 Sup: Prof. Harish Karnick 1 1 Department of Computer Science and Engineering 2 Department of Electrical Engineering Indian Institute of Technology,

More information

Lecture 10: A brief introduction to Support Vector Machine

Lecture 10: A brief introduction to Support Vector Machine Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Causal Inference by Minimizing the Dual Norm of Bias. Nathan Kallus. Cornell University and Cornell Tech

Causal Inference by Minimizing the Dual Norm of Bias. Nathan Kallus. Cornell University and Cornell Tech Causal Inference by Minimizing the Dual Norm of Bias Nathan Kallus Cornell University and Cornell Tech www.nathankallus.com Matching Zoo It s a zoo of matching estimators for causal effects: PSM, NN, CM,

More information