A Gaussian Type Kernel for Persistence Diagrams

Size: px
Start display at page:

Download "A Gaussian Type Kernel for Persistence Diagrams"

Transcription

1 TRIPODS Summer Bootcamp: Topology and Machine Learning Brown University, August 2018 A Gaussian Type Kernel for Persistence Diagrams Mathieu Carrière joint work with S. Oudot and M. Cuturi

2 Persistence diagrams point cloud 1-parameter filtration (by scale) barcode / diagram 2

3 Persistence diagrams Def: A finite multiset in the open half-plane R + (2) (1) (1) 2

4 Persistence diagrams Def: A finite multiset in the open half-plane R + Given a partial matching M : X Y : cost of a matched pair (x, y) M: c p (x, y) := x y p cost of an unmatched point z X Y : c p (z) := z z p cost of M: c p (M) := (x, y) matched c p (x, y) + z unmatched c p (z) 1/p (2) p-th diagram distance (extended metric): z x y d p (X, Y ) := inf c p(m) M:X Y z bottleneck distance: d (X, Y ) := lim p d p(x, Y ) 2

5 Persistence diagrams Def: A finite multiset in the open half-plane R + Given a partial matching M : X Y : Prop.: [Cohen-Steiner, Edelsbrunner, Harer 2007] cost of a matched pair (x, y) M: c p (x, y) := x y p d p (D, D ) 2 1/p z z d (D, D ) 1 1/p cost of an unmatched point z z X Y X Y : c p (z) := z z p cost of M: c p (M) := (x, y) matched c p (x, y) + z unmatched c p (z) 1/p (2) p-th diagram distance (extended metric): z x y d p (X, Y ) := inf c p(m) M:X Y z bottleneck distance: d (X, Y ) := lim p d p(x, Y ) 2

6 Persistence diagrams Def: A finite multiset in the open half-plane R + Given a partial matching M : X Y : Prop.: [Cohen-Steiner, Edelsbrunner, Harer 2007] cost of a matched pair (x, y) M: c p (x, y) := x y p d p (D, D ) 2 1/p z z d (D, D ) 1 1/p cost of an unmatched point z z X Y X Y : c p (z) := z z p cost of M: c p (M) := similar to Wasserstein distances between probability measures... c p (x, y) + c p (z) (x, y) matched z unmatched (2) 1/p... but not quite the same (different masses, diagonal) p-th diagram distance (extended metric): z x y d p (X, Y ) := inf c p(m) M:X Y z bottleneck distance: d (X, Y ) := lim p d p(x, Y ) 2

7 Persistence diagrams as descriptors for data this diagram can serve as a de input data domain / filter persistence diagram Pros: topological descriptors carry information of a different nature strong invariance and stability properties flexible and versatile 0-dimensional homology generators 1-dimensional homology generators 2

8 Persistence diagrams as descriptors for data this diagram can serve as a de input data domain / filter persistence diagram Pros: topological descriptors carry information of a different nature strong invariance and stability properties flexible and versatile 0-dimensional homology generators 1-dimensional homology generators Cons: this implies the space that oflinear persistence classifiers diagrams cannotisbe not used a linear PDs space have been mostly used in unsuper bad for learning and statistics having descriptors long construction can be slow times to is compute fine as long and (more as comparisons importantly) canto becompare made fast, but here t bad for applications 2

9 Persistence diagrams as descriptors for data this diagram can serve as a de input data domain / filter persistence diagram Pros: topological descriptors carry information of a different nature strong invariance and stability properties flexible and versatile Cons: 0-dimensional homology generators 1-dimensional homology generators A solution: Hilbert space embedding this one possible solution among ot this implies the space that oflinear persistence classifiers diagrams cannotisbe not used a linear PDs space have been mostly used in unsuper bad for learning and statistics having descriptors long construction can be slow times to is compute fine as long and (more as comparisons importantly) canto becompare made fast, but here t bad for applications 2

10 Hilbert space embedding and kernel trick 1 data -1 class +1 class 1.5 svm classifier -1 (training) +1 (training) Support Vectors convex optimization pb. for training, immediate comput. for testing 3

11 Hilbert space embedding and kernel trick svm classifier polynomial (training) +1 (training) Support Vectors misclassified -1.5 feature -1 space (training) +1 (training) Support Vectors misclassified

12 Hilbert space embedding and kernel trick X : a space in which we want to compare/classify elements feature map φ : X H equipped with inner product, H lift training/testing data to H through φ then solve learning problem X φ (H,, H ) 4

13 Hilbert space embedding and kernel trick X : a space in which we want to compare/classify elements feature map φ : X H equipped with inner product, H lift training/testing data to H through φ then solve learning problem observation: many learning methods only use the inner product do not lift the data, instead compute the k(x, y) = φ(x), φ(y) H X φ (H,, H ) 4

14 Hilbert space embedding and kernel trick X : a space in which we want to compare/classify elements feature map φ : X H equipped with inner product, H lift training/testing data to H through φ then solve learning problem observation: many learning methods only use the inner product do not lift the data, instead compute the k(x, y) = φ(x), φ(y) H Def.: A reproducing kernel is a map k : X X R such that k(, ) = φ( ), φ( ) H for some pair (φ, H). Thm.: [Moore, Aronszajn] A pair (φ, H) exists (and is unique) whenever k is positive semidefinite, i.e. for all n N and x 1,, x n X the Gram matrix (k(x i, x j )) i,j is positive semidefinite. H is called the Reproducing Kernel Hilbert Space (RKHS) of X. 4

15 Hilbert space embedding and kernel trick classification / regression: input: (x 1, y 1 ),, (x n, y n ) X Y the regularization term min here isl to(y avoid i, w, the x i + optimal b) + γ hyperplane w p to make use of t w, b 1 n n i=1 (e.g. L(y, t) = max{0, 1 t.y} for SVMs, p = 1 for sparsity) 1 data -1 class +1 class 1.5 svm classifier -1 (training) +1 (training) Support Vectors

16 Hilbert space embedding and kernel trick classification / regression: input: (x 1, y 1 ),, (x n, y n ) X Y the regularization term min here isl to(y avoid i, w, the x i + optimal b) + γ hyperplane w p to make use of t w, b 1 n n i=1 φ(x i ) (e.g. L(y, t) = max{0, 1 t.y} for SVMs, p = 1 for sparsity) Thm. (Representer): [Schölkopf, Herbrich, Smola] The argmin w admits a representation of the form: thanks to the representer thm, we can restrict the opimization to the linear subsp w = n α j φ(x j ) = n α j k(x j, ) j=1 j=1 search argmin within span{φ(x 1 ),, φ(x n )} and optimize for the α i 5

17 Hilbert space embedding and kernel trick Examples: - linear kernel: k(x, y) = x, y - polynomial kernel: k(x, y) = (α x, y + 1) β ( ) x y 2 - Gaussian kernel: k(x, y) = exp 2 2σ 2 i.e. ones that depend on the distance between the observations many interesting properties, in particular: - infinite-dimensiona based learning in practice) - (the swiss knife for kernel 6

18 Hilbert space embedding and kernel trick Examples: - linear kernel: k(x, y) = x, y - polynomial kernel: k(x, y) = (α x, y + 1) β ( ) x y 2 - Gaussian kernel: k(x, y) = exp 2 2σ 2 Thm.: [Berg, Christensen, Ressel 1984] ypically, the Euclidean distance and its powers up to 2 (included) are cnsd If d : X X R + symmetric is conditionally negative semidefinite, i.e.: n N, x 1,, x n X, n α i = 0 = n n α i α j d(x i, x j ) 0, then k(x, y) := exp ( ) d(x,y) 2σ 2 i=1 i=1 j=1 is positive semidefinite. 6

19 Hilbert space embedding and kernel trick Q: does this apply to persistence diagrams? Pb: d p is not cnsd, for any p > 0 Thm.: [Berg, Christensen, Ressel 1984] ypically, the Euclidean distance and its powers up to 2 (included) are cnsd If d : X X R + symmetric is conditionally negative semidefinite, i.e.: n N, x 1,, x n X, n α i = 0 = n n α i α j d(x i, x j ) 0, then k(x, y) := exp ( ) d(x,y) 2σ 2 i=1 i=1 j=1 is positive semidefinite. 6

20 Kernels for persistence diagrams View persistence diagrams as: landscapes (collections of 1-d functions) [Bubenik 2012] [Bubenik, D lotko 2015] a landscapes is a polyline of slopes ±1 discrete measures: histogram [Bendich et al. 2014] convolution with fixed kernel [Chepushtanova et al. 2015] if you do the convolution naively as in [Chepushtanova et al. 2015] you don t get convolution with weighted kernel [Kusano, Fukumisu, Hiraoka ] heat diffusion [Reininghaus et al. 2015] + exponential [Kwit et al. 2015] finite metric spaces [C., Oudot, Ovsjanikov 2015] polynomial roots or evaluations [Di Fabio, Ferri 2015] [Kališnik 2016] 7

21 Kernels for persistence diagrams L 2 (N R) is defined by taking the product of the counting measure on N and th positive (semi-)definiteness Attention: L 2 is not an RKHS, just an ambient Hilbert space in which th ambient Hilbert space φ( ) φ( ) H f(d p ) injectivity landscapes discrete measures L 2 (N R) L 2 (R 2 ) (R d, 2 ) metric spaces polynomials l 2 (R) φ( ) φ( ) H g(d p )??? algorithmic cost O(n 2 ) O(n 2 ) f. map: O(n 2 ) kernel: O(d) f. map: O(nd) kernel: O(d) universality additivity 7

22 Kernels for persistence diagrams L 2 (N R) is defined by taking the product of the counting measure on N and th positive (semi-)definiteness Attention: L 2 is not an RKHS, just an ambient Hilbert space in which th ambient Hilbert space φ( ) φ( ) H f(d p ) injectivity landscapes discrete measures L 2 (N R) L 2 (R 2 ) (R d, 2 ) metric spaces polynomials l 2 (R) φ( ) φ( ) H g(d p )??? algorithmic cost O(n 2 ) O(n 2 ) f. map: O(n 2 ) kernel: O(d) f. map: O(nd) kernel: O(d) universality additivity 7

23 Kernels for persistence diagrams L 2 (N R) is defined by taking the product of the counting measure on N and th positive (semi-)definiteness Attention: L 2 is not an RKHS, just an ambient Hilbert space in which th ambient Hilbert space φ( ) φ( ) H f(d p ) injectivity landscapes discrete measures L 2 (N R) L 2 (R 2 ) (R d, 2 ) metric spaces polynomials l 2 (R) φ( ) φ( ) H g(d p )??? algorithmic cost O(n 2 ) O(n 2 ) f. map: O(n 2 ) kernel: O(d) f. map: O(nd) kernel: O(d) universality additivity 7

24 Kernels for persistence diagrams View persistence diagrams as: landscapes (collections of 1-d functions) [Bubenik 2012] [Bubenik, D lotko 2015] a landscapes is a polyline of slopes ±1 discrete measures: histogram [Bendich et al. 2014] convolution with fixed kernel [Chepushtanova et al. 2015] if you do the convolution naively as in [Chepushtanova et al. 2015] you don t get convolution with weighted kernel [Kusano, Fukumisu, Hiraoka ] heat diffusion [Reininghaus et al. 2015] + exponential [Kwit et al. 2015] finite metric spaces [C., Oudot, Ovsjanikov 2015] polynomial roots or evaluations [Di Fabio, Ferri 2015] [Kališnik 2016] 7

25 Persistence diagrams as discrete measures (I) treat diagrams as measures and to take their densities as feature vectors (to build the feature map, from which the kernel itself is then derived) death D discrete death µ D δx x measure birth µ D := x D δ x birth 8

26 Persistence diagrams as discrete measures (I) treat diagrams as measures and to take their densities as feature vectors (to build the feature map, from which the kernel itself is then derived) death D discrete death µ D δx x weighting death measure birth birth birth µ D := x D δ x µ w D := x D w(x)δ x Pb: µ D is unstable (points on diagonal disappear) w(x) := arctan (c d(x, ) r ), c, r > 0 8

27 Persistence diagrams as discrete measures (I) treat diagrams as measures and to take their densities as feature vectors (to build the feature map, from which the kernel itself is then derived) death D discrete death µ D δx x weighting death measure convolution birth birth birth µ D := x D δ x Pb: µ D is unstable (points on diagonal disappear) w(x) := arctan (c d(x, ) r ), c, r > 0 µ w D := x D w(x)δ x µ w D := µ w D N (0, σ) Def: φ(d) is the density function of µ w D N (0, σ) w.r.t. Lebesgue measure: φ(d) := 1 arctan(c d(x, ) r ) exp ( ) x 2 2πσ 2σ 2 x D k(d, D ) := φ(d), φ(d ) L2 ( R + ) 8

28 Persistence diagrams as discrete measures (I) treat diagrams as measures and to take their densities as feature vectors (to build the feature map, from which the kernel itself is then derived) death D discrete death µ D δx x weighting death measure convolution birth birth birth µ D := x D δ x µ w D := x D w(x)δ x µ w D := µ w D N (0, σ) Prop.: [Kusano, Fukumisu, Hiraoka ] φ(d) φ(d ) H C d p (D, D ). φ is injective and exp(k) is universal... because Pb: convolution points/modes reduces start discriminativity mixing up together. use discrete Weighting measure also reduces insteaddiscr φ(d) := 1 2πσ x D arctan(c d(x, ) r ) exp ( x 2 2σ 2 ) k(d, D ) := φ(d), φ(d ) L2 ( R + ) 8

29 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea δx death death x birth birth µ D := x D δ x Pb: d p (D, D ) W p (µ D, µ D ) (W p does not even make sense here) this is because the two discrete measures have different masses. O 9

30 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea δx death death x π (x) birth birth µ D := x D δ x Pb: d p (D, D ) W p (µ D, µ D ) (W p does not even make sense here) this is because the two discrete measures have different masses. O given D, D, let µ D := x D µ D := δ x + y D δ y + y D δ π (y) x D δ π (x) this Then, is a quasi-isometric d p (D, D ) Wembedding p ( µ D, µ D ) 2 d p (D, D ) 9

31 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea δx death death x π (x) birth birth µ D := x D δ x Pb: d p (D, D ) W p (µ D, µ D ) (W p does not even make sense here) this is because the two discrete measures have different masses. O given D, D, let µ D := x D µ D := δ x + y D δ y + y D δ π (y) x D δ π (x) this Then, is a quasi-isometric d p (D, D ) Wembedding p ( µ D, µ D ) 2 d p (D, D ) Pb: µ D depends on D This is bad in practice because, basically, you need to make µ D 9

32 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea δx death death x π (x) birth birth µ D := x D δ x Pb: d p (D, D ) W p (µ D, µ D ) (W p does not even make sense here) this is because the two discrete measures have different masses. O Solution: transfer mass negatively to µ D : µ D := x D δ x x D δ π (x) M 0 (R 2 ) this is the space of measur signed disrete measure of total mass zero 9

33 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea δx death death x π (x) birth birth µ D := x D δ x Pb: d p (D, D ) W p (µ D, µ D ) (W p does not even make sense here) this is because the two discrete measures have different masses. O Solution: transfer mass negatively to µ D : µ D := x D δ x x D δ π (x) M 0 (R 2 ) this is the space of measur signed disrete measure of total mass zero Q: What the Wasserstein metric should metric we is use? certain A: Indeed, Kantorovich the Kantorovich norm K norm 9

34 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea Hahn decomposition thm.: For any µ M 0 (X, Σ), there exist measurable sets P, N such that: (i) P N = X and P N = (ii) µ(b) 0 for every measureable set B P (iii) µ(b) 0 for every measureable set B N Moreover, the decomposition is essentially unique. P P is a positive set for µ N is a negative set for µ unique in the sense that for any N Solution: transfer mass negatively to µ D : µ D := x D δ x x D δ π (x) M 0 (R 2 ) this is the space of measur signed disrete measure of total mass zero Q: What the Wasserstein metric should metric we is use? certain A: Indeed, Kantorovich the Kantorovich norm K norm 9

35 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea Hahn decomposition thm.: For any µ M 0 (X, Σ), there exist measurable sets P, N such that: (i) P N = X and P N = (ii) µ(b) 0 for every measureable set B P (iii) µ(b) 0 for every measureable set B N Moreover, the decomposition is essentially unique. P P is a positive set for µ N is a negative set for µ unique in the sense that for any N B Σ, let µ + (B) := µ(b P ) and µ (B) := µ(bµ + = positive N) M part of µ, and µ + (X) Def.: µ K := W 1 (µ +, µ ) Prop.: µ, ν M 0 (X), W 1 (µ + + ν, ν + + µ ) = µ ν K 9

36 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea Hahn decomposition thm.: For any µ M 0 (X, Σ), there exist measurable sets P, N such that: (i) P N = X and P N = (ii) µ(b) 0 for every measureable set B P (iii) µ(b) 0 for every measureable set B N Moreover, the decomposition is essentially unique. P P is a positive set for µ N is a negative set for µ unique in the sense that for any N B Σ, let µ + (B) := µ(b P ) and µ (B) := µ(bµ + = positive N) M part of µ, and µ + (X) Def.: µ K := W 1 (µ +, µ ) Prop.: µ, ν M 0 (X), W 1 (µ + + ν, ν + + µ ) = µ ν K for persistence diagrams: W 1 ( µ D, µ D ) = µ D µ D K µ D µ D µ D µ D now we can identify the space of PDs to a space 9

37 A Wasserstein Gaussian kernel for PDs? Thm.: [Berg, Christensen, Ressel 1984] If d : X X R + symmetric is conditionally negative semidefinite, i.e.: n N, x 1,, x n X, then k(x, y) := exp ( ) d(x,y) 2σ 2 n n n α i = 0 = i=1 i=1 j=1 is positive semidefinite. α i α j d(x i, x j ) 0, Pb: W 1 is not cnsd, it neither is indeed is d 1 easy to generate counterexamples by randomly sam Solutions: relax the measures (e.g. convolution) this is the path taken by [Aguey, Carlier] and [Oh relax the metric (e.g. regularization, slicing) 10

38 Sliced Wasserstein metric Special case: X = R, µ, ν discrete measures of mass n µ := n i=1 δ x i, ν := n i=1 δ y i Sort the atoms of µ, ν along the real line: onex i can xthen i+1 and see Py i and y i+1 Q as forn-dimensiona all i Then: W 1 (µ, ν) = n i=1 x i y i = (x 1,, x n ) (y 1,, y n ) 1 µ ν 11

39 Sliced Wasserstein metric Special case: X = R, µ, ν discrete measures of mass n µ := n i=1 δ x i, ν := n i=1 δ y i Sort the atoms of µ, ν along the real line: onex i can xthen i+1 and see Py i and y i+1 Q as forn-dimensiona all i Then: W 1 (µ, ν) = n i=1 x i y i = (x 1,, x n ) (y 1,, y n ) 1 µ ν W 1 is cnsd and easy to compute (same with K for signed measures) 11

40 Sliced Wasserstein metric Def (sliced Wasserstein distance): for µ, ν M + (R 2 ), SW 1 (µ, ν) := 1 W 1 (π θ #µ, π θ #ν) dθ 2π θ S 1 Note : une idee naturelle serait d integrer sur toutes les droites du plan, soit RP 1 where π θ = orthogonal projection onto line passing through origin with angle θ. θ 11

41 Sliced Wasserstein metric Def (sliced Wasserstein distance): for µ, ν M + (R 2 ), SW 1 (µ, ν) := 1 W 1 (π θ #µ, π θ #ν) dθ 2π θ S 1 Note : une idee naturelle serait d integrer sur toutes les droites du plan, soit RP 1 where π θ = orthogonal projection onto line passing through origin with angle θ. Props: (inherited from W 1 over R) [Rabin, Peyré, Delon, Bernot 2011] - satisfies the axioms of a metric - well-defined barycenters, fast to compute via stochastic gradient descent, etc. - conditionally negative semidefinite 11

42 Sliced Wasserstein kernel Def: Given σ > 0, for any µ, ν M + (R 2 ): k SW (µ, ν) := exp ( SW ) 1(µ, ν) 2σ 2 Corllary: [Kolouri, Zou, Rohde] k SW is positive semidefinite. (from SW cnsd and Berg s theorem) 12

43 Sliced Wasserstein kernel Def: Given σ > 0, for any µ, ν M + (R 2 ): k SW (µ, ν) := exp ( SW ) 1(µ, ν) 2σ 2 Corllary: [Kolouri, Zou, Rohde] k SW is positive semidefinite. application to persistence diagrams: (from SW cnsd and Berg s theorem) death δx x D µ D := x D δ x π (x) µ D := µ D π #µ D birth SW 1 (D, D ) := this π θ # µ the D same π θ as # µ SW D 1 (µ K dθ D + π #µ D, µ D + π #µ D ) θ S 1 ( k SW (D, D ) := exp SW ) 1(D, D ) 2σ 2 12

44 Sliced Wasserstein kernel Def: Given σ > 0, for any µ, ν M + (R 2 ): k SW (µ, ν) := exp ( SW ) 1(µ, ν) 2σ 2 Corllary: [Kolouri, Zou, Rohde] k SW is positive semidefinite. application to persistence diagrams: (from SW cnsd and Berg s theorem) death δx x D µ D := x D δ x π (x) µ D := µ D π #µ D birth SW 1 (D, D ) := this π θ # µ the D same π θ as # µ SW D 1 (µ K dθ D + π #µ D, µ D + π #µ D ) θ S 1 ( k SW (D, D ) := exp SW ) 1(D, D ) - positive semidefinite 2σ 2 - simple and fast (vector compute norm + d 12

45 Sliced Wasserstein kernel Thm.: [C., Cuturi, Oudot] The metrics d 1 and SW 1 on the space D N of persistence diagrams of size bounded by N are strongly equivalent, namely: for D, D D N, N(2N 1) d 1(D, D ) SW 1 (D, D ) 2 2 d 1 (D, D ) δx application to persistence diagrams: death x D µ D := x D δ x π (x) µ D := µ D π #µ D birth SW 1 (D, D ) := this π θ # µ the D same π θ as # µ SW D 1 (µ K dθ D + π #µ D, µ D + π #µ D ) θ S 1 ( k SW (D, D ) := exp SW ) 1(D, D ) 2σ 2 12

46 Sliced Wasserstein kernel Thm.: [C., Cuturi, Oudot] The metrics d 1 and SW 1 on the space D N of persistence diagrams of size bounded by N are strongly equivalent, namely: for D, D D N, N(2N 1) d 1(D, D ) SW 1 (D, D ) 2 2 d 1 (D, D ) Corollary: the feature map φ associated with k SW is weakly metric-preserving: g, h nonzero except at 0 such that g d 1 φ( ) φ( ) H h d 1. 12

47 Sliced Wasserstein kernel Thm.: [C., Cuturi, Oudot] The metrics d 1 and SW 1 on the space D N of persistence diagrams of size bounded by N are strongly equivalent, namely: for D, D D N, N(2N 1) d 1(D, D ) SW 1 (D, D ) 2 2 d 1 (D, D ) Corollary: the feature map φ associated with k SW is weakly metric-preserving: g, h nonzero except at 0 such that g d 1 φ( ) φ( ) H h d 1. Thm.: [C., Bauer] Any feature map φ : D N R d cannot preserve metrics, whatever d and N are, i.e. either lower bound is 0 or upper bound is + 12

48 Metric distortion in practice 13

49 Application to supervised orbits classification Goal: classify orbits of linked twisted map, modelling fluid flow dynamics Orbits described by (depending on parameter r): This is a discrete dynamical system. The behavior of { xn+1 = x n + r y n (1 y n ) mod 1 y n+1 = y n + r x n+1 (1 x n+1 ) mod 1 Label = 1 Label = 5 Label = 2 Label = 3 Label = 4 14

50 Application to supervised orbits classification Goal: classify orbits of linked twisted map, modelling fluid flow dynamics Orbits described by (depending on parameter r): { xn+1 = x n + r y n (1 y n ) mod 1 y n+1 = y n + r x n+1 (1 x n+1 ) mod 1 Accuracies (%) using only TDA descriptors (kernels on barcodes): k PSS k PWG k SW Orbit 64.0 ± ± ± 1.1 (PDs as discrete measures) 14

51 Application to supervised orbits classification Goal: classify orbits of linked twisted map, modelling fluid flow dynamics Orbits described by (depending on parameter r): { xn+1 = x n + r y n (1 y n ) mod 1 y n+1 = y n + r x n+1 (1 x n+1 ) mod 1 Accuracies (%) using only TDA descriptors (kernels on barcodes): k PSS k PWG k SW Orbit 64.0 ± ± ± 1.1 (PDs as discrete measures) Running times (in seconds on N-sized parameter space from 100 orbits): k PSS k PWG k SW Orbit N ± 65.6 N 69.2 ± ± NC (φ( ) recomputed for each σ) (SW 1 computed only once) 14

52 Application to supervised shape segmentation Goal: segment 3d shapes based on examples Approach: - train a (multiclass) classifier on PDs extracted from the training shapes - apply classifier to PDs extracted from query shape Head Torso Label =? Foot Hand Training Test 15

53 Application to supervised shape segmentation Goal: segment 3d shapes based on examples Approach: - train a (multiclass) classifier on PDs extracted from the training shapes - apply classifier to PDs extracted from query shape Accuracies (%) using TDA descriptors (kernels on barcodes): Human Airplane Ant FourLeg Octopus Bird Fish kpss 68.5 ± ± ± ± ± ± ± 1.6 kpwg 64.2 ± ± ± ± ± ± ± 0.5 ksw 74.0 ± ± ± ± ± ± ±

54 Recap input data domain / filter persistence diagram kernels for persistence diagrams: - stable 0-dimensional homology generators 1-dimensional homology generators - additive, universal, etc. - easy to compute - distance-preserving (approximately) - (approximate) Gaussian kernel δx x π (x) signed measure 16

55 Recap input data domain / filter persistence diagram theoretical framework: - quasi-isometric embedding of diagrams as signed measures 0-dimensional homology generators 1-dimensional homology generators - Wasserstein metric relaxation (sliced Wasserstein) cnsd δx x Thank you!! π (x) signed measure 16

Learning from persistence diagrams

Learning from persistence diagrams Learning from persistence diagrams Ulrich Bauer TUM July 7, 2015 ACAT final meeting IST Austria Joint work with: Jan Reininghaus, Stefan Huber, Roland Kwitt 1 / 28 ? 2 / 28 ? 2 / 28 3 / 28 3 / 28 3 / 28

More information

A Kernel on Persistence Diagrams for Machine Learning

A Kernel on Persistence Diagrams for Machine Learning A Kernel on Persistence Diagrams for Machine Learning Jan Reininghaus 1 Stefan Huber 1 Roland Kwitt 2 Ulrich Bauer 1 1 Institute of Science and Technology Austria 2 FB Computer Science Universität Salzburg,

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML) Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

This is an author-deposited version published in : Eprints ID : 17710

This is an author-deposited version published in :   Eprints ID : 17710 Open Archive TOULOUSE Archive Ouverte (OATAO) OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

Support Vector Machines

Support Vector Machines Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

RegML 2018 Class 2 Tikhonov regularization and kernels

RegML 2018 Class 2 Tikhonov regularization and kernels RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,

More information

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 10: Support Vector Machine and Large Margin Classifier Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives

More information

Distributions of Persistence Diagrams and Approximations

Distributions of Persistence Diagrams and Approximations Distributions of Persistence Diagrams and Approximations Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August 31, 2018 Thanks V. Maroulas

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Kernelized Perceptron Support Vector Machines

Kernelized Perceptron Support Vector Machines Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:

More information

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods) Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

On Inverse Problems in TDA

On Inverse Problems in TDA Abel Symposium Geiranger, June 2018 On Inverse Problems in TDA Steve Oudot joint work with E. Solomon (Brown Math.) arxiv: 1712.03630 Features Data Rn The preimage problem in the data Sciences feature

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

CS798: Selected topics in Machine Learning

CS798: Selected topics in Machine Learning CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning

More information

arxiv: v1 [cs.cv] 13 Jul 2017

arxiv: v1 [cs.cv] 13 Jul 2017 Deep Learning with Topological Signatures Christoph Hofer Department of Computer Science University of Salzburg Salzburg, Austria, 5020 chofer@cosy.sbg.ac.at Roland Kwitt Department of Computer Science

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

The Representor Theorem, Kernels, and Hilbert Spaces

The Representor Theorem, Kernels, and Hilbert Spaces The Representor Theorem, Kernels, and Hilbert Spaces We will now work with infinite dimensional feature vectors and parameter vectors. The space l is defined to be the set of sequences f 1, f, f 3,...

More information

Kernel Methods in Machine Learning

Kernel Methods in Machine Learning Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Neural networks and support vector machines

Neural networks and support vector machines Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith

More information

Support Vector Machine for Classification and Regression

Support Vector Machine for Classification and Regression Support Vector Machine for Classification and Regression Ahlame Douzal AMA-LIG, Université Joseph Fourier Master 2R - MOSIG (2013) November 25, 2013 Loss function, Separating Hyperplanes, Canonical Hyperplan

More information

Kernels MIT Course Notes

Kernels MIT Course Notes Kernels MIT 15.097 Course Notes Cynthia Rudin Credits: Bartlett, Schölkopf and Smola, Cristianini and Shawe-Taylor The kernel trick that I m going to show you applies much more broadly than SVM, but we

More information

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

Oslo Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n

More information

Representer theorem and kernel examples

Representer theorem and kernel examples CS81B/Stat41B Spring 008) Statistical Learning Theory Lecture: 8 Representer theorem and kernel examples Lecturer: Peter Bartlett Scribe: Howard Lei 1 Representer Theorem Recall that the SVM optimization

More information

softened probabilistic

softened probabilistic Justin Solomon MIT Understand geometry from a softened probabilistic standpoint. Somewhere over here. Exactly here. One of these two places. Query 1 2 Which is closer, 1 or 2? Query 1 2 Which is closer,

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

MATHEMATICAL entities that do not form Euclidean

MATHEMATICAL entities that do not form Euclidean Kernel Methods on Riemannian Manifolds with Gaussian RBF Kernels Sadeep Jayasumana, Student Member, IEEE, Richard Hartley, Fellow, IEEE, Mathieu Salzmann, Member, IEEE, Hongdong Li, Member, IEEE, and Mehrtash

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

Kernels and the Kernel Trick. Machine Learning Fall 2017

Kernels and the Kernel Trick. Machine Learning Fall 2017 Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

More information

Regularized Least Squares

Regularized Least Squares Regularized Least Squares Ryan M. Rifkin Google, Inc. 2008 Basics: Data Data points S = {(X 1, Y 1 ),...,(X n, Y n )}. We let X simultaneously refer to the set {X 1,...,X n } and to the n by d matrix whose

More information

Multivariate Topological Data Analysis

Multivariate Topological Data Analysis Cleveland State University November 20, 2008, Duke University joint work with Gunnar Carlsson (Stanford), Peter Kim and Zhiming Luo (Guelph), and Moo Chung (Wisconsin Madison) Framework Ideal Truth (Parameter)

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1 Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT) Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information

Statistical learning on graphs

Statistical learning on graphs Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Topological Simplification in 3-Dimensions

Topological Simplification in 3-Dimensions Topological Simplification in 3-Dimensions David Letscher Saint Louis University Computational & Algorithmic Topology Sydney Letscher (SLU) Topological Simplification CATS 2017 1 / 33 Motivation Original

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Lecture 4 February 2

Lecture 4 February 2 4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have

More information

Metric Embedding for Kernel Classification Rules

Metric Embedding for Kernel Classification Rules Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Regularized Least Squares

Regularized Least Squares Regularized Least Squares Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Summary In RLS, the Tikhonov minimization problem boils down to solving a linear system (and this

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Surrogate loss functions, divergences and decentralized detection

Surrogate loss functions, divergences and decentralized detection Surrogate loss functions, divergences and decentralized detection XuanLong Nguyen Department of Electrical Engineering and Computer Science U.C. Berkeley Advisors: Michael Jordan & Martin Wainwright 1

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

Kernel methods and the exponential family

Kernel methods and the exponential family Kernel methods and the exponential family Stéphane Canu 1 and Alex J. Smola 2 1- PSI - FRE CNRS 2645 INSA de Rouen, France St Etienne du Rouvray, France Stephane.Canu@insa-rouen.fr 2- Statistical Machine

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444 Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 04: Features and Kernels Lorenzo Rosasco Linear functions Let H lin be the space of linear functions f(x) = w x. f w is one

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

SVMs, Duality and the Kernel Trick

SVMs, Duality and the Kernel Trick SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

SVMs: nonlinearity through kernels

SVMs: nonlinearity through kernels Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly

More information