A Gaussian Type Kernel for Persistence Diagrams
|
|
- Wendy Porter
- 5 years ago
- Views:
Transcription
1 TRIPODS Summer Bootcamp: Topology and Machine Learning Brown University, August 2018 A Gaussian Type Kernel for Persistence Diagrams Mathieu Carrière joint work with S. Oudot and M. Cuturi
2 Persistence diagrams point cloud 1-parameter filtration (by scale) barcode / diagram 2
3 Persistence diagrams Def: A finite multiset in the open half-plane R + (2) (1) (1) 2
4 Persistence diagrams Def: A finite multiset in the open half-plane R + Given a partial matching M : X Y : cost of a matched pair (x, y) M: c p (x, y) := x y p cost of an unmatched point z X Y : c p (z) := z z p cost of M: c p (M) := (x, y) matched c p (x, y) + z unmatched c p (z) 1/p (2) p-th diagram distance (extended metric): z x y d p (X, Y ) := inf c p(m) M:X Y z bottleneck distance: d (X, Y ) := lim p d p(x, Y ) 2
5 Persistence diagrams Def: A finite multiset in the open half-plane R + Given a partial matching M : X Y : Prop.: [Cohen-Steiner, Edelsbrunner, Harer 2007] cost of a matched pair (x, y) M: c p (x, y) := x y p d p (D, D ) 2 1/p z z d (D, D ) 1 1/p cost of an unmatched point z z X Y X Y : c p (z) := z z p cost of M: c p (M) := (x, y) matched c p (x, y) + z unmatched c p (z) 1/p (2) p-th diagram distance (extended metric): z x y d p (X, Y ) := inf c p(m) M:X Y z bottleneck distance: d (X, Y ) := lim p d p(x, Y ) 2
6 Persistence diagrams Def: A finite multiset in the open half-plane R + Given a partial matching M : X Y : Prop.: [Cohen-Steiner, Edelsbrunner, Harer 2007] cost of a matched pair (x, y) M: c p (x, y) := x y p d p (D, D ) 2 1/p z z d (D, D ) 1 1/p cost of an unmatched point z z X Y X Y : c p (z) := z z p cost of M: c p (M) := similar to Wasserstein distances between probability measures... c p (x, y) + c p (z) (x, y) matched z unmatched (2) 1/p... but not quite the same (different masses, diagonal) p-th diagram distance (extended metric): z x y d p (X, Y ) := inf c p(m) M:X Y z bottleneck distance: d (X, Y ) := lim p d p(x, Y ) 2
7 Persistence diagrams as descriptors for data this diagram can serve as a de input data domain / filter persistence diagram Pros: topological descriptors carry information of a different nature strong invariance and stability properties flexible and versatile 0-dimensional homology generators 1-dimensional homology generators 2
8 Persistence diagrams as descriptors for data this diagram can serve as a de input data domain / filter persistence diagram Pros: topological descriptors carry information of a different nature strong invariance and stability properties flexible and versatile 0-dimensional homology generators 1-dimensional homology generators Cons: this implies the space that oflinear persistence classifiers diagrams cannotisbe not used a linear PDs space have been mostly used in unsuper bad for learning and statistics having descriptors long construction can be slow times to is compute fine as long and (more as comparisons importantly) canto becompare made fast, but here t bad for applications 2
9 Persistence diagrams as descriptors for data this diagram can serve as a de input data domain / filter persistence diagram Pros: topological descriptors carry information of a different nature strong invariance and stability properties flexible and versatile Cons: 0-dimensional homology generators 1-dimensional homology generators A solution: Hilbert space embedding this one possible solution among ot this implies the space that oflinear persistence classifiers diagrams cannotisbe not used a linear PDs space have been mostly used in unsuper bad for learning and statistics having descriptors long construction can be slow times to is compute fine as long and (more as comparisons importantly) canto becompare made fast, but here t bad for applications 2
10 Hilbert space embedding and kernel trick 1 data -1 class +1 class 1.5 svm classifier -1 (training) +1 (training) Support Vectors convex optimization pb. for training, immediate comput. for testing 3
11 Hilbert space embedding and kernel trick svm classifier polynomial (training) +1 (training) Support Vectors misclassified -1.5 feature -1 space (training) +1 (training) Support Vectors misclassified
12 Hilbert space embedding and kernel trick X : a space in which we want to compare/classify elements feature map φ : X H equipped with inner product, H lift training/testing data to H through φ then solve learning problem X φ (H,, H ) 4
13 Hilbert space embedding and kernel trick X : a space in which we want to compare/classify elements feature map φ : X H equipped with inner product, H lift training/testing data to H through φ then solve learning problem observation: many learning methods only use the inner product do not lift the data, instead compute the k(x, y) = φ(x), φ(y) H X φ (H,, H ) 4
14 Hilbert space embedding and kernel trick X : a space in which we want to compare/classify elements feature map φ : X H equipped with inner product, H lift training/testing data to H through φ then solve learning problem observation: many learning methods only use the inner product do not lift the data, instead compute the k(x, y) = φ(x), φ(y) H Def.: A reproducing kernel is a map k : X X R such that k(, ) = φ( ), φ( ) H for some pair (φ, H). Thm.: [Moore, Aronszajn] A pair (φ, H) exists (and is unique) whenever k is positive semidefinite, i.e. for all n N and x 1,, x n X the Gram matrix (k(x i, x j )) i,j is positive semidefinite. H is called the Reproducing Kernel Hilbert Space (RKHS) of X. 4
15 Hilbert space embedding and kernel trick classification / regression: input: (x 1, y 1 ),, (x n, y n ) X Y the regularization term min here isl to(y avoid i, w, the x i + optimal b) + γ hyperplane w p to make use of t w, b 1 n n i=1 (e.g. L(y, t) = max{0, 1 t.y} for SVMs, p = 1 for sparsity) 1 data -1 class +1 class 1.5 svm classifier -1 (training) +1 (training) Support Vectors
16 Hilbert space embedding and kernel trick classification / regression: input: (x 1, y 1 ),, (x n, y n ) X Y the regularization term min here isl to(y avoid i, w, the x i + optimal b) + γ hyperplane w p to make use of t w, b 1 n n i=1 φ(x i ) (e.g. L(y, t) = max{0, 1 t.y} for SVMs, p = 1 for sparsity) Thm. (Representer): [Schölkopf, Herbrich, Smola] The argmin w admits a representation of the form: thanks to the representer thm, we can restrict the opimization to the linear subsp w = n α j φ(x j ) = n α j k(x j, ) j=1 j=1 search argmin within span{φ(x 1 ),, φ(x n )} and optimize for the α i 5
17 Hilbert space embedding and kernel trick Examples: - linear kernel: k(x, y) = x, y - polynomial kernel: k(x, y) = (α x, y + 1) β ( ) x y 2 - Gaussian kernel: k(x, y) = exp 2 2σ 2 i.e. ones that depend on the distance between the observations many interesting properties, in particular: - infinite-dimensiona based learning in practice) - (the swiss knife for kernel 6
18 Hilbert space embedding and kernel trick Examples: - linear kernel: k(x, y) = x, y - polynomial kernel: k(x, y) = (α x, y + 1) β ( ) x y 2 - Gaussian kernel: k(x, y) = exp 2 2σ 2 Thm.: [Berg, Christensen, Ressel 1984] ypically, the Euclidean distance and its powers up to 2 (included) are cnsd If d : X X R + symmetric is conditionally negative semidefinite, i.e.: n N, x 1,, x n X, n α i = 0 = n n α i α j d(x i, x j ) 0, then k(x, y) := exp ( ) d(x,y) 2σ 2 i=1 i=1 j=1 is positive semidefinite. 6
19 Hilbert space embedding and kernel trick Q: does this apply to persistence diagrams? Pb: d p is not cnsd, for any p > 0 Thm.: [Berg, Christensen, Ressel 1984] ypically, the Euclidean distance and its powers up to 2 (included) are cnsd If d : X X R + symmetric is conditionally negative semidefinite, i.e.: n N, x 1,, x n X, n α i = 0 = n n α i α j d(x i, x j ) 0, then k(x, y) := exp ( ) d(x,y) 2σ 2 i=1 i=1 j=1 is positive semidefinite. 6
20 Kernels for persistence diagrams View persistence diagrams as: landscapes (collections of 1-d functions) [Bubenik 2012] [Bubenik, D lotko 2015] a landscapes is a polyline of slopes ±1 discrete measures: histogram [Bendich et al. 2014] convolution with fixed kernel [Chepushtanova et al. 2015] if you do the convolution naively as in [Chepushtanova et al. 2015] you don t get convolution with weighted kernel [Kusano, Fukumisu, Hiraoka ] heat diffusion [Reininghaus et al. 2015] + exponential [Kwit et al. 2015] finite metric spaces [C., Oudot, Ovsjanikov 2015] polynomial roots or evaluations [Di Fabio, Ferri 2015] [Kališnik 2016] 7
21 Kernels for persistence diagrams L 2 (N R) is defined by taking the product of the counting measure on N and th positive (semi-)definiteness Attention: L 2 is not an RKHS, just an ambient Hilbert space in which th ambient Hilbert space φ( ) φ( ) H f(d p ) injectivity landscapes discrete measures L 2 (N R) L 2 (R 2 ) (R d, 2 ) metric spaces polynomials l 2 (R) φ( ) φ( ) H g(d p )??? algorithmic cost O(n 2 ) O(n 2 ) f. map: O(n 2 ) kernel: O(d) f. map: O(nd) kernel: O(d) universality additivity 7
22 Kernels for persistence diagrams L 2 (N R) is defined by taking the product of the counting measure on N and th positive (semi-)definiteness Attention: L 2 is not an RKHS, just an ambient Hilbert space in which th ambient Hilbert space φ( ) φ( ) H f(d p ) injectivity landscapes discrete measures L 2 (N R) L 2 (R 2 ) (R d, 2 ) metric spaces polynomials l 2 (R) φ( ) φ( ) H g(d p )??? algorithmic cost O(n 2 ) O(n 2 ) f. map: O(n 2 ) kernel: O(d) f. map: O(nd) kernel: O(d) universality additivity 7
23 Kernels for persistence diagrams L 2 (N R) is defined by taking the product of the counting measure on N and th positive (semi-)definiteness Attention: L 2 is not an RKHS, just an ambient Hilbert space in which th ambient Hilbert space φ( ) φ( ) H f(d p ) injectivity landscapes discrete measures L 2 (N R) L 2 (R 2 ) (R d, 2 ) metric spaces polynomials l 2 (R) φ( ) φ( ) H g(d p )??? algorithmic cost O(n 2 ) O(n 2 ) f. map: O(n 2 ) kernel: O(d) f. map: O(nd) kernel: O(d) universality additivity 7
24 Kernels for persistence diagrams View persistence diagrams as: landscapes (collections of 1-d functions) [Bubenik 2012] [Bubenik, D lotko 2015] a landscapes is a polyline of slopes ±1 discrete measures: histogram [Bendich et al. 2014] convolution with fixed kernel [Chepushtanova et al. 2015] if you do the convolution naively as in [Chepushtanova et al. 2015] you don t get convolution with weighted kernel [Kusano, Fukumisu, Hiraoka ] heat diffusion [Reininghaus et al. 2015] + exponential [Kwit et al. 2015] finite metric spaces [C., Oudot, Ovsjanikov 2015] polynomial roots or evaluations [Di Fabio, Ferri 2015] [Kališnik 2016] 7
25 Persistence diagrams as discrete measures (I) treat diagrams as measures and to take their densities as feature vectors (to build the feature map, from which the kernel itself is then derived) death D discrete death µ D δx x measure birth µ D := x D δ x birth 8
26 Persistence diagrams as discrete measures (I) treat diagrams as measures and to take their densities as feature vectors (to build the feature map, from which the kernel itself is then derived) death D discrete death µ D δx x weighting death measure birth birth birth µ D := x D δ x µ w D := x D w(x)δ x Pb: µ D is unstable (points on diagonal disappear) w(x) := arctan (c d(x, ) r ), c, r > 0 8
27 Persistence diagrams as discrete measures (I) treat diagrams as measures and to take their densities as feature vectors (to build the feature map, from which the kernel itself is then derived) death D discrete death µ D δx x weighting death measure convolution birth birth birth µ D := x D δ x Pb: µ D is unstable (points on diagonal disappear) w(x) := arctan (c d(x, ) r ), c, r > 0 µ w D := x D w(x)δ x µ w D := µ w D N (0, σ) Def: φ(d) is the density function of µ w D N (0, σ) w.r.t. Lebesgue measure: φ(d) := 1 arctan(c d(x, ) r ) exp ( ) x 2 2πσ 2σ 2 x D k(d, D ) := φ(d), φ(d ) L2 ( R + ) 8
28 Persistence diagrams as discrete measures (I) treat diagrams as measures and to take their densities as feature vectors (to build the feature map, from which the kernel itself is then derived) death D discrete death µ D δx x weighting death measure convolution birth birth birth µ D := x D δ x µ w D := x D w(x)δ x µ w D := µ w D N (0, σ) Prop.: [Kusano, Fukumisu, Hiraoka ] φ(d) φ(d ) H C d p (D, D ). φ is injective and exp(k) is universal... because Pb: convolution points/modes reduces start discriminativity mixing up together. use discrete Weighting measure also reduces insteaddiscr φ(d) := 1 2πσ x D arctan(c d(x, ) r ) exp ( x 2 2σ 2 ) k(d, D ) := φ(d), φ(d ) L2 ( R + ) 8
29 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea δx death death x birth birth µ D := x D δ x Pb: d p (D, D ) W p (µ D, µ D ) (W p does not even make sense here) this is because the two discrete measures have different masses. O 9
30 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea δx death death x π (x) birth birth µ D := x D δ x Pb: d p (D, D ) W p (µ D, µ D ) (W p does not even make sense here) this is because the two discrete measures have different masses. O given D, D, let µ D := x D µ D := δ x + y D δ y + y D δ π (y) x D δ π (x) this Then, is a quasi-isometric d p (D, D ) Wembedding p ( µ D, µ D ) 2 d p (D, D ) 9
31 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea δx death death x π (x) birth birth µ D := x D δ x Pb: d p (D, D ) W p (µ D, µ D ) (W p does not even make sense here) this is because the two discrete measures have different masses. O given D, D, let µ D := x D µ D := δ x + y D δ y + y D δ π (y) x D δ π (x) this Then, is a quasi-isometric d p (D, D ) Wembedding p ( µ D, µ D ) 2 d p (D, D ) Pb: µ D depends on D This is bad in practice because, basically, you need to make µ D 9
32 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea δx death death x π (x) birth birth µ D := x D δ x Pb: d p (D, D ) W p (µ D, µ D ) (W p does not even make sense here) this is because the two discrete measures have different masses. O Solution: transfer mass negatively to µ D : µ D := x D δ x x D δ π (x) M 0 (R 2 ) this is the space of measur signed disrete measure of total mass zero 9
33 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea δx death death x π (x) birth birth µ D := x D δ x Pb: d p (D, D ) W p (µ D, µ D ) (W p does not even make sense here) this is because the two discrete measures have different masses. O Solution: transfer mass negatively to µ D : µ D := x D δ x x D δ π (x) M 0 (R 2 ) this is the space of measur signed disrete measure of total mass zero Q: What the Wasserstein metric should metric we is use? certain A: Indeed, Kantorovich the Kantorovich norm K norm 9
34 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea Hahn decomposition thm.: For any µ M 0 (X, Σ), there exist measurable sets P, N such that: (i) P N = X and P N = (ii) µ(b) 0 for every measureable set B P (iii) µ(b) 0 for every measureable set B N Moreover, the decomposition is essentially unique. P P is a positive set for µ N is a negative set for µ unique in the sense that for any N Solution: transfer mass negatively to µ D : µ D := x D δ x x D δ π (x) M 0 (R 2 ) this is the space of measur signed disrete measure of total mass zero Q: What the Wasserstein metric should metric we is use? certain A: Indeed, Kantorovich the Kantorovich norm K norm 9
35 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea Hahn decomposition thm.: For any µ M 0 (X, Σ), there exist measurable sets P, N such that: (i) P N = X and P N = (ii) µ(b) 0 for every measureable set B P (iii) µ(b) 0 for every measureable set B N Moreover, the decomposition is essentially unique. P P is a positive set for µ N is a negative set for µ unique in the sense that for any N B Σ, let µ + (B) := µ(b P ) and µ (B) := µ(bµ + = positive N) M part of µ, and µ + (X) Def.: µ K := W 1 (µ +, µ ) Prop.: µ, ν M 0 (X), W 1 (µ + + ν, ν + + µ ) = µ ν K 9
36 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea Hahn decomposition thm.: For any µ M 0 (X, Σ), there exist measurable sets P, N such that: (i) P N = X and P N = (ii) µ(b) 0 for every measureable set B P (iii) µ(b) 0 for every measureable set B N Moreover, the decomposition is essentially unique. P P is a positive set for µ N is a negative set for µ unique in the sense that for any N B Σ, let µ + (B) := µ(b P ) and µ (B) := µ(bµ + = positive N) M part of µ, and µ + (X) Def.: µ K := W 1 (µ +, µ ) Prop.: µ, ν M 0 (X), W 1 (µ + + ν, ν + + µ ) = µ ν K for persistence diagrams: W 1 ( µ D, µ D ) = µ D µ D K µ D µ D µ D µ D now we can identify the space of PDs to a space 9
37 A Wasserstein Gaussian kernel for PDs? Thm.: [Berg, Christensen, Ressel 1984] If d : X X R + symmetric is conditionally negative semidefinite, i.e.: n N, x 1,, x n X, then k(x, y) := exp ( ) d(x,y) 2σ 2 n n n α i = 0 = i=1 i=1 j=1 is positive semidefinite. α i α j d(x i, x j ) 0, Pb: W 1 is not cnsd, it neither is indeed is d 1 easy to generate counterexamples by randomly sam Solutions: relax the measures (e.g. convolution) this is the path taken by [Aguey, Carlier] and [Oh relax the metric (e.g. regularization, slicing) 10
38 Sliced Wasserstein metric Special case: X = R, µ, ν discrete measures of mass n µ := n i=1 δ x i, ν := n i=1 δ y i Sort the atoms of µ, ν along the real line: onex i can xthen i+1 and see Py i and y i+1 Q as forn-dimensiona all i Then: W 1 (µ, ν) = n i=1 x i y i = (x 1,, x n ) (y 1,, y n ) 1 µ ν 11
39 Sliced Wasserstein metric Special case: X = R, µ, ν discrete measures of mass n µ := n i=1 δ x i, ν := n i=1 δ y i Sort the atoms of µ, ν along the real line: onex i can xthen i+1 and see Py i and y i+1 Q as forn-dimensiona all i Then: W 1 (µ, ν) = n i=1 x i y i = (x 1,, x n ) (y 1,, y n ) 1 µ ν W 1 is cnsd and easy to compute (same with K for signed measures) 11
40 Sliced Wasserstein metric Def (sliced Wasserstein distance): for µ, ν M + (R 2 ), SW 1 (µ, ν) := 1 W 1 (π θ #µ, π θ #ν) dθ 2π θ S 1 Note : une idee naturelle serait d integrer sur toutes les droites du plan, soit RP 1 where π θ = orthogonal projection onto line passing through origin with angle θ. θ 11
41 Sliced Wasserstein metric Def (sliced Wasserstein distance): for µ, ν M + (R 2 ), SW 1 (µ, ν) := 1 W 1 (π θ #µ, π θ #ν) dθ 2π θ S 1 Note : une idee naturelle serait d integrer sur toutes les droites du plan, soit RP 1 where π θ = orthogonal projection onto line passing through origin with angle θ. Props: (inherited from W 1 over R) [Rabin, Peyré, Delon, Bernot 2011] - satisfies the axioms of a metric - well-defined barycenters, fast to compute via stochastic gradient descent, etc. - conditionally negative semidefinite 11
42 Sliced Wasserstein kernel Def: Given σ > 0, for any µ, ν M + (R 2 ): k SW (µ, ν) := exp ( SW ) 1(µ, ν) 2σ 2 Corllary: [Kolouri, Zou, Rohde] k SW is positive semidefinite. (from SW cnsd and Berg s theorem) 12
43 Sliced Wasserstein kernel Def: Given σ > 0, for any µ, ν M + (R 2 ): k SW (µ, ν) := exp ( SW ) 1(µ, ν) 2σ 2 Corllary: [Kolouri, Zou, Rohde] k SW is positive semidefinite. application to persistence diagrams: (from SW cnsd and Berg s theorem) death δx x D µ D := x D δ x π (x) µ D := µ D π #µ D birth SW 1 (D, D ) := this π θ # µ the D same π θ as # µ SW D 1 (µ K dθ D + π #µ D, µ D + π #µ D ) θ S 1 ( k SW (D, D ) := exp SW ) 1(D, D ) 2σ 2 12
44 Sliced Wasserstein kernel Def: Given σ > 0, for any µ, ν M + (R 2 ): k SW (µ, ν) := exp ( SW ) 1(µ, ν) 2σ 2 Corllary: [Kolouri, Zou, Rohde] k SW is positive semidefinite. application to persistence diagrams: (from SW cnsd and Berg s theorem) death δx x D µ D := x D δ x π (x) µ D := µ D π #µ D birth SW 1 (D, D ) := this π θ # µ the D same π θ as # µ SW D 1 (µ K dθ D + π #µ D, µ D + π #µ D ) θ S 1 ( k SW (D, D ) := exp SW ) 1(D, D ) - positive semidefinite 2σ 2 - simple and fast (vector compute norm + d 12
45 Sliced Wasserstein kernel Thm.: [C., Cuturi, Oudot] The metrics d 1 and SW 1 on the space D N of persistence diagrams of size bounded by N are strongly equivalent, namely: for D, D D N, N(2N 1) d 1(D, D ) SW 1 (D, D ) 2 2 d 1 (D, D ) δx application to persistence diagrams: death x D µ D := x D δ x π (x) µ D := µ D π #µ D birth SW 1 (D, D ) := this π θ # µ the D same π θ as # µ SW D 1 (µ K dθ D + π #µ D, µ D + π #µ D ) θ S 1 ( k SW (D, D ) := exp SW ) 1(D, D ) 2σ 2 12
46 Sliced Wasserstein kernel Thm.: [C., Cuturi, Oudot] The metrics d 1 and SW 1 on the space D N of persistence diagrams of size bounded by N are strongly equivalent, namely: for D, D D N, N(2N 1) d 1(D, D ) SW 1 (D, D ) 2 2 d 1 (D, D ) Corollary: the feature map φ associated with k SW is weakly metric-preserving: g, h nonzero except at 0 such that g d 1 φ( ) φ( ) H h d 1. 12
47 Sliced Wasserstein kernel Thm.: [C., Cuturi, Oudot] The metrics d 1 and SW 1 on the space D N of persistence diagrams of size bounded by N are strongly equivalent, namely: for D, D D N, N(2N 1) d 1(D, D ) SW 1 (D, D ) 2 2 d 1 (D, D ) Corollary: the feature map φ associated with k SW is weakly metric-preserving: g, h nonzero except at 0 such that g d 1 φ( ) φ( ) H h d 1. Thm.: [C., Bauer] Any feature map φ : D N R d cannot preserve metrics, whatever d and N are, i.e. either lower bound is 0 or upper bound is + 12
48 Metric distortion in practice 13
49 Application to supervised orbits classification Goal: classify orbits of linked twisted map, modelling fluid flow dynamics Orbits described by (depending on parameter r): This is a discrete dynamical system. The behavior of { xn+1 = x n + r y n (1 y n ) mod 1 y n+1 = y n + r x n+1 (1 x n+1 ) mod 1 Label = 1 Label = 5 Label = 2 Label = 3 Label = 4 14
50 Application to supervised orbits classification Goal: classify orbits of linked twisted map, modelling fluid flow dynamics Orbits described by (depending on parameter r): { xn+1 = x n + r y n (1 y n ) mod 1 y n+1 = y n + r x n+1 (1 x n+1 ) mod 1 Accuracies (%) using only TDA descriptors (kernels on barcodes): k PSS k PWG k SW Orbit 64.0 ± ± ± 1.1 (PDs as discrete measures) 14
51 Application to supervised orbits classification Goal: classify orbits of linked twisted map, modelling fluid flow dynamics Orbits described by (depending on parameter r): { xn+1 = x n + r y n (1 y n ) mod 1 y n+1 = y n + r x n+1 (1 x n+1 ) mod 1 Accuracies (%) using only TDA descriptors (kernels on barcodes): k PSS k PWG k SW Orbit 64.0 ± ± ± 1.1 (PDs as discrete measures) Running times (in seconds on N-sized parameter space from 100 orbits): k PSS k PWG k SW Orbit N ± 65.6 N 69.2 ± ± NC (φ( ) recomputed for each σ) (SW 1 computed only once) 14
52 Application to supervised shape segmentation Goal: segment 3d shapes based on examples Approach: - train a (multiclass) classifier on PDs extracted from the training shapes - apply classifier to PDs extracted from query shape Head Torso Label =? Foot Hand Training Test 15
53 Application to supervised shape segmentation Goal: segment 3d shapes based on examples Approach: - train a (multiclass) classifier on PDs extracted from the training shapes - apply classifier to PDs extracted from query shape Accuracies (%) using TDA descriptors (kernels on barcodes): Human Airplane Ant FourLeg Octopus Bird Fish kpss 68.5 ± ± ± ± ± ± ± 1.6 kpwg 64.2 ± ± ± ± ± ± ± 0.5 ksw 74.0 ± ± ± ± ± ± ±
54 Recap input data domain / filter persistence diagram kernels for persistence diagrams: - stable 0-dimensional homology generators 1-dimensional homology generators - additive, universal, etc. - easy to compute - distance-preserving (approximately) - (approximate) Gaussian kernel δx x π (x) signed measure 16
55 Recap input data domain / filter persistence diagram theoretical framework: - quasi-isometric embedding of diagrams as signed measures 0-dimensional homology generators 1-dimensional homology generators - Wasserstein metric relaxation (sliced Wasserstein) cnsd δx x Thank you!! π (x) signed measure 16
Learning from persistence diagrams
Learning from persistence diagrams Ulrich Bauer TUM July 7, 2015 ACAT final meeting IST Austria Joint work with: Jan Reininghaus, Stefan Huber, Roland Kwitt 1 / 28 ? 2 / 28 ? 2 / 28 3 / 28 3 / 28 3 / 28
More informationA Kernel on Persistence Diagrams for Machine Learning
A Kernel on Persistence Diagrams for Machine Learning Jan Reininghaus 1 Stefan Huber 1 Roland Kwitt 2 Ulrich Bauer 1 1 Institute of Science and Technology Austria 2 FB Computer Science Universität Salzburg,
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationMachine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)
Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationThis is an author-deposited version published in : Eprints ID : 17710
Open Archive TOULOUSE Archive Ouverte (OATAO) OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited
More informationStat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.
Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationRegML 2018 Class 2 Tikhonov regularization and kernels
RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,
More informationLecture 10: Support Vector Machine and Large Margin Classifier
Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives
More informationDistributions of Persistence Diagrams and Approximations
Distributions of Persistence Diagrams and Approximations Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August 31, 2018 Thanks V. Maroulas
More informationReproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto
Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationOn Inverse Problems in TDA
Abel Symposium Geiranger, June 2018 On Inverse Problems in TDA Steve Oudot joint work with E. Solomon (Brown Math.) arxiv: 1712.03630 Features Data Rn The preimage problem in the data Sciences feature
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationarxiv: v1 [cs.cv] 13 Jul 2017
Deep Learning with Topological Signatures Christoph Hofer Department of Computer Science University of Salzburg Salzburg, Austria, 5020 chofer@cosy.sbg.ac.at Roland Kwitt Department of Computer Science
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationThe Representor Theorem, Kernels, and Hilbert Spaces
The Representor Theorem, Kernels, and Hilbert Spaces We will now work with infinite dimensional feature vectors and parameter vectors. The space l is defined to be the set of sequences f 1, f, f 3,...
More informationKernel Methods in Machine Learning
Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationSupport Vector Machine for Classification and Regression
Support Vector Machine for Classification and Regression Ahlame Douzal AMA-LIG, Université Joseph Fourier Master 2R - MOSIG (2013) November 25, 2013 Loss function, Separating Hyperplanes, Canonical Hyperplan
More informationKernels MIT Course Notes
Kernels MIT 15.097 Course Notes Cynthia Rudin Credits: Bartlett, Schölkopf and Smola, Cristianini and Shawe-Taylor The kernel trick that I m going to show you applies much more broadly than SVM, but we
More informationElements of Positive Definite Kernel and Reproducing Kernel Hilbert Space
Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department
More informationOslo Class 2 Tikhonov regularization and kernels
RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n
More informationRepresenter theorem and kernel examples
CS81B/Stat41B Spring 008) Statistical Learning Theory Lecture: 8 Representer theorem and kernel examples Lecturer: Peter Bartlett Scribe: Howard Lei 1 Representer Theorem Recall that the SVM optimization
More informationsoftened probabilistic
Justin Solomon MIT Understand geometry from a softened probabilistic standpoint. Somewhere over here. Exactly here. One of these two places. Query 1 2 Which is closer, 1 or 2? Query 1 2 Which is closer,
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationMATHEMATICAL entities that do not form Euclidean
Kernel Methods on Riemannian Manifolds with Gaussian RBF Kernels Sadeep Jayasumana, Student Member, IEEE, Richard Hartley, Fellow, IEEE, Mathieu Salzmann, Member, IEEE, Hongdong Li, Member, IEEE, and Mehrtash
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationKernels and the Kernel Trick. Machine Learning Fall 2017
Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationRegularized Least Squares
Regularized Least Squares Ryan M. Rifkin Google, Inc. 2008 Basics: Data Data points S = {(X 1, Y 1 ),...,(X n, Y n )}. We let X simultaneously refer to the set {X 1,...,X n } and to the n by d matrix whose
More informationMultivariate Topological Data Analysis
Cleveland State University November 20, 2008, Duke University joint work with Gunnar Carlsson (Stanford), Peter Kim and Zhiming Luo (Guelph), and Moo Chung (Wisconsin Madison) Framework Ideal Truth (Parameter)
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationStochastic optimization in Hilbert spaces
Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationKernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1
Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationMetric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)
Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationStatistical learning on graphs
Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationTopological Simplification in 3-Dimensions
Topological Simplification in 3-Dimensions David Letscher Saint Louis University Computational & Algorithmic Topology Sydney Letscher (SLU) Topological Simplification CATS 2017 1 / 33 Motivation Original
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationLecture 4 February 2
4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have
More informationMetric Embedding for Kernel Classification Rules
Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationRegularized Least Squares
Regularized Least Squares Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Summary In RLS, the Tikhonov minimization problem boils down to solving a linear system (and this
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationSurrogate loss functions, divergences and decentralized detection
Surrogate loss functions, divergences and decentralized detection XuanLong Nguyen Department of Electrical Engineering and Computer Science U.C. Berkeley Advisors: Michael Jordan & Martin Wainwright 1
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationKernel methods and the exponential family
Kernel methods and the exponential family Stéphane Canu 1 and Alex J. Smola 2 1- PSI - FRE CNRS 2645 INSA de Rouen, France St Etienne du Rouvray, France Stephane.Canu@insa-rouen.fr 2- Statistical Machine
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationKernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444
Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 04: Features and Kernels Lorenzo Rosasco Linear functions Let H lin be the space of linear functions f(x) = w x. f w is one
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationTopics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families
Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines
More informationFinite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product
Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationKernels for Multi task Learning
Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationSVMs, Duality and the Kernel Trick
SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationSVMs: nonlinearity through kernels
Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly
More information