Kernel methods for Bayesian inference
|
|
- Angel Morris
- 5 years ago
- Views:
Transcription
1 Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014
2 Motivating Example: Bayesian inference without a model 3600 downsampled frames of RGB pixels (Y t [0,1] 1200 ) 1800 training frames, remaining for test. Gaussian noise added to Y t. Challenges: No parametric model of camera dynamics (only samples) No parametric model of map from camera angle to image (only samples) Want to do filtering: Bayesian inference
3 ABC: an approach to Bayesian inference without a model Bayes rule: P(y x) = P(x y)π(y) P(x y)π(y)dy P(x y) is likelihood π(y) is prior
4 ABC: an approach to Bayesian inference without a model Bayes rule: P(y x) = P(x y)π(y) P(x y)π(y)dy P(x y) is likelihood π(y) is prior One approach: Approximate Bayesian Computation (ABC)
5 ABC: an approach to Bayesian inference without a model Bayes rule: P(y x) = P(x y)π(y) P(x y)π(y)dy P(x y) is likelihood π(y) is prior One approach: Approximate Bayesian Computation (ABC) ABC generates a sample from p(y x ) as follows: 1. generate a sample y t from the prior π, 2. generate a sample x t from P(X y t ), 3. if D(x,x t ) < τ, accept y = y t ; otherwise reject, 4. go to (i). In step (3), D is a distance measure, and τ is a tolerance parameter.
6 Motivating example 2: simple Gaussian case p(x,y) is N((0,1 T d )T,V) with V a randomly generated covariance Posterior mean on x: ABC vs kernel approach CPU time vs Error (6 dim.) Mean Square Errors KBI COND ABC CPU time (sec)
7 Overview Introduction to reproducing kernel Hilbert spaces Hilbert space Kernels and feature spaces Reproducing property Mapping probabilities to feature space
8 Overview Introduction to reproducing kernel Hilbert spaces Hilbert space Kernels and feature spaces Reproducing property Mapping probabilities to feature space Nonparametric Bayesian inference Learning conditional probabilities: smooth regression to an RKHS Kernelized Bayesian inference
9 Functions in a reproducing kernel Hilbert space Kernel methods can control smoothness and avoid overfitting/underfitting.
10 Hilbert space Inner product Let H be a vector space over R. A function, H : H H R is an inner product on H if 1. Linear: α 1 f 1 +α 2 f 2,g H = α 1 f 1,g H +α 2 f 2,g H 2. Symmetric: f,g H = g,f H 3. f,f H 0 and f,f H = 0 if and only if f = 0.
11 Hilbert space Inner product Let H be a vector space over R. A function, H : H H R is an inner product on H if 1. Linear: α 1 f 1 +α 2 f 2,g H = α 1 f 1,g H +α 2 f 2,g H 2. Symmetric: f,g H = g,f H 3. f,f H 0 and f,f H = 0 if and only if f = 0. Norm induced by the inner product: f H := f,f H
12 Hilbert space Inner product Let H be a vector space over R. A function, H : H H R is an inner product on H if 1. Linear: α 1 f 1 +α 2 f 2,g H = α 1 f 1,g H +α 2 f 2,g H 2. Symmetric: f,g H = g,f H 3. f,f H 0 and f,f H = 0 if and only if f = 0. Norm induced by the inner product: f H := f,f H Hilbert space: Inner product space containing Cauchy sequence limits.
13 Kernel Kernel: Let X be a non-empty set. A function k : X X R is a kernel if there exists an R-Hilbert space and a map ϕ : X H such that x,x X, k(x,x ) := ϕ x,ϕ x H. Almost no conditions on X (eg, X itself doesn t need an inner product, eg. documents). A single kernel can correspond to several possible feature vectors. A trivial example for X := R: ϕ (1) x = x and ϕ (2) x = x/ 2 x/ 2
14 Finite dim. RKHS with polynomial features Example: A three dimensional space of features of points in R 2 : x = ϕ : R 2 R 3 x 1 x 2 ϕ x = x 1 x 2 x 1 x 2, with kernel k(x,y) = x 1 x 2 y 1 y 2 x 1 x 2 y 1 y 2 (the standard inner product in R 3 between features). Denote this feature space by H.
15 Finite dim. RKHS with polynomial features Define a linear function of the inputs x 1,x 2, and their product x 1 x 2, f(x) = f 1 x 1 +f 2 x 2 +f 3 x 1 x 2. f in a space of functions mapping from X = R 2 to R. Equivalent representation for f, [ ]. f( ) = f 1 f 2 f 3 f( ) refers to the function as an object (here as a vector in R 3 ) f(x) R is function evaluated at a point (a real number).
16 Finite dim. RKHS with polynomial features Define a linear function of the inputs x 1,x 2, and their product x 1 x 2, f(x) = f 1 x 1 +f 2 x 2 +f 3 x 1 x 2. f in a space of functions mapping from X = R 2 to R. Equivalent representation for f, [ ]. f( ) = f 1 f 2 f 3 f( ) refers to the function as an object (here as a vector in R 3 ) f(x) R is function evaluated at a point (a real number). f(x) = f( ) ϕ x = f( ),ϕ x H Evaluation of f at x is an inner product in feature space (here standard inner product in R 3 ) H is a space of functions mapping R 2 to R.
17 Finite dim. RKHS with polynomial features ϕ y is a mapping from R 2 to R which also parametrizes a function mapping R 2 to R. [ ] k(,y) := y 1 y 2 y 1 y 2 = ϕy, Given y, there is a vector k(,y) in H such that k(,y),ϕ x H = ax 1 +bx 2 +cx 1 x 2, where a = y 1, b = y 2, and c = y 1 y 2
18 Finite dim. RKHS with polynomial features ϕ y is a mapping from R 2 to R which also parametrizes a function mapping R 2 to R. [ ] k(,y) := y 1 y 2 y 1 y 2 = ϕy, Given y, there is a vector k(,y) in H such that where a = y 1, b = y 2, and c = y 1 y 2 Due to symmetry, k(,y),ϕ x H = ax 1 +bx 2 +cx 1 x 2, k(,x),ϕ y = uy 1 +vy 2 +wy 1 y 2 = k(x,y). We can write ϕ x = k(,x) and ϕ y = k(,y) without ambiguity: canonical feature map
19 The reproducing property This example illustrates the two defining features of an RKHS: The reproducing property: x X, f( ) H, f( ),k(,x) H = f(x)...or use shorter notation f,ϕ x H. In particular, for any x,y X, k(x,y) = k(,x),k(,y) H. Note: the feature map of every point is in the feature space: x X, k(,x) = ϕ x H,
20 Infinite dimensional feature space Reproducing property for function with Gaussian kernel: f(x) := m i=1 α ik(x i,x) = m i=1 α iϕ xi,ϕ x H f(x) x
21 Infinite dimensional feature space Reproducing property for function with Gaussian kernel: f(x) := m i=1 α ik(x i,x) = m i=1 α iϕ xi,ϕ x H f(x) What do the features ϕ x look like (there are infinitely many of them, and they are not unique!) What do these features have to do with smoothness? x
22 Infinite dimensional feature space Gaussian kernel, k(x,x ) = exp ( x x 2 2σ 2 ) λ k b k b < 1 e k (x) exp( (c a)x 2 )H k (x 2c), a,b,c are functions of σ, and H k is kth order Hermite polynomial., k(x,x ) = λ i e i (x)e i (x ) = = i=1 i=1 ( )( λi e i (x) λi e i (x )) ϕ x ϕ x i=1
23 Infinite dimensional feature space (Mercer) Let X be a compact metric space, k be a continous kernel, and µ be a finite Borel measure with supp{µ} = X. Then the convergence of k(x,y) = j λ j e j (x)e j (y) is absolute and uniform (e j is the continuous element of the L 2 equivalence class e j.). [ ] The feature map is ϕ x =... λi e i (x)...
24 Infinite dimensional feature space (Mercer RKHS)(Steinwart and Christmann, Theorem 4.51) Under the assumptions of Mercer s theorem, { } H := a i λi e i : a i l 2 (1) is an RKHS with kernel k. Given two functions in the RKHS i f(x) := i a i λi e i (x), g(x) := i b i λi e i (x), the inner product is f,g H = i a ib i
25 Infinite dimensional feature space Proof: There are two aspects requiring care: 1. Is k(x, ) H x X? Requires Mercer s theorem 2. Does the reproducing property hold? f,k(,x) H = f(x).
26 Infinite dimensional feature space Proof: There are two aspects requiring care: 1. Is k(x, ) H x X? Requires Mercer s theorem 2. Does the reproducing property hold? f,k(,x) H = f(x). First part: By the definition of H, the function in H indexed by x is k(x, ) = ( )( λi e i (x) λi e i ( )). i Is this function in the RKHS? Yes, if the l 2 norm of ( λ i e i (x) ) is bounded. This is due to Mercer: x X, k(x, ) 2 ( ) H = 2 λi e i (x) = k(x,x) <. l 2
27 Infinite dimensional feature space Proof (continued): Second part: The reproducing property holds: using the inner product definition, f,k(x, ) H = ( ) f i λi e i (x) = f(x), i which is always well defined since both f l 2 and k(x, ) l 2.
28 Infinite dimensional feature space Example RKHS function, Gaussian kernel: m m f(x) := α i k(x i,x) = λ j e j (x i )e j (x) = α i i=1 i=1 j=1 j=1 [ λj ] f j e j (x) where f j = m i=1 α i λj e j (x i ). f(x) x NOTE that this enforces smoothing: λ j decay as e j become rougher, f j decay since f 2 H = j f2 j <.
29 Reproducing kernel Hilbert space H a Hilbert space of R-valued functions on non-empty set X. A function k : X X R is a reproducing kernel of H, and H is a reproducing kernel Hilbert space, if x X, k(,x) H, x X, f H, f( ),k(,x) H = f(x) (the reproducing property). In particular, for any x,y X, k(x,y) = k(,x),k(,y) H. (2) Original definition: kernel an inner product between feature maps. Then ϕ x = k(,x) a valid feature map.
30 Probabilities in feature space: the mean trick The kernel trick Given x X for some set X, define feature map ϕ x H, [ ϕ x =... ] λ i e i (x)... l 2 For positive definite k(x,x ), k(x,x ) = ϕ x,ϕ x H The kernel trick: f H, f(x) = f,ϕ x H
31 Probabilities in feature space: the mean trick The kernel trick Given x X for some set X, define feature map ϕ x H, [ ϕ x =... ] λ i e i (x)... l 2 For positive definite k(x,x ), k(x,x ) = ϕ x,ϕ x H The kernel trick: f H, f(x) = f,ϕ x H The mean trick Given P a Borel probability measure on X, define feature map µ P H [ µ P =... ] λ i E P [e i (X)]... l 2 For positive definite k(x,x ), E P,Q k(x,y) = µ P,µ Q H for X P and Y Q. The mean trick: (we call µ P a mean/distribution embedding) E P (f(x)) = E P [ ϕ X,f F ]
32 Feature embeddings of probabilities The kernel trick: f(x) = f,ϕ x H The mean trick: Empirical mean embedding: E P (f(x)) = µ P,f F µ P = m 1 m i=1 ϕ xi x i i.i.d. P µ P gives you expectations of all RKHS functions When k characteristic, then µ P unique, e.g. Gauss, Laplace,...
33 Nonparametric Bayesian inference using distribution embeddings
34 Bayes again Bayes rule: P(y x) = P(x y)π(y) P(x y)π(y)dy P(x y) is likelihood π is prior How would this look with kernel embeddings?
35 Bayes again Bayes rule: P(y x) = P(x y)π(y) P(x y)π(y)dy P(x y) is likelihood π is prior How would this look with kernel embeddings? Define RKHS G on Y with feature map ψ y and kernel l(y, ) We need a conditional mean embedding: for all g G, E Y x g(y) = g,µ P(y x ) G This will be obtained by RKHS-valued ridge regression
36 Ridge regression and the conditional feature mean Ridge regression from X := R d to a finite vector output Y := R d (these could be d nonlinear features of y): Define training data [ ] [ ] X = x 1... x m R d m Y = y 1... y m R d m
37 Ridge regression and the conditional feature mean Ridge regression from X := R d to a finite vector output Y := R d (these could be d nonlinear features of y): Define training data [ ] [ ] X = x 1... x m R d m Y = y 1... y m R d m Solve Ă = argmin A R d d ( ) Y AX 2 +λ A 2 HS, where A 2 HS = tr(a A) = min{d,d } i=1 γ 2 A,i
38 Ridge regression and the conditional feature mean Ridge regression from X := R d to a finite vector output Y := R d (these could be d nonlinear features of y): Define training data [ ] [ ] X = x 1... x m R d m Y = y 1... y m R d m Solve Ă = argmin A R d d ( ) Y AX 2 +λ A 2 HS, where A 2 HS = tr(a A) = Solution: Ă = C YX (C XX +mλi) 1 min{d,d } i=1 γ 2 A,i
39 Ridge regression and the conditional feature mean Prediction at new point x: where y = Ăx = C YX (C XX +mλi) 1 x m = β i (x)y i i=1 β i (x) = (K +λmi) 1[ k(x 1,x)... k(x m,x) ] and K := X X k(x 1,x) = x 1 x
40 Ridge regression and the conditional feature mean Prediction at new point x: where y = Ăx = C YX (C XX +mλi) 1 x m = β i (x)y i i=1 β i (x) = (K +λmi) 1[ k(x 1,x)... k(x m,x) ] and K := X X k(x 1,x) = x 1 x What if we do everything in kernel space?
41 Ridge regression and the conditional feature mean Recall our setup: Given training pairs: (x i,y i ) P XY F on X with feature map ϕ x and kernel k(x, ) G on Y with feature map ψ y and kernel l(y, ) We define the covariance between feature maps: C XX = E X (ϕ X ϕ X ) C XY = E XY (ϕ X ψ Y ) and matrices of feature mapped training data [ ] [ X = ϕ x1... ϕ xm Y := ψ y1... ψ ym ]
42 Ridge regression and the conditional feature mean Objective: [Weston et al. (2003), Micchelli and Pontil (2005), Caponnetto and De Vito (2007), Grunewalder et al. (2012, 2013) ] ( ) Ă = arg min E XY Y AX 2 G A HS(F,G) +λ A 2 HS, A 2 HS = Solution same as vector case: i=1 γ 2 A,i Ă = C YX (C XX +mλi) 1, Prediction at new x using kernels: [ Ăϕ x = ψ y1... ψ ym ](K +λmi) 1[ k(x 1,x)... k(x m,x) m = β i (x)ψ yi i=1 ] where K ij = k(x i,x j )
43 Ridge regression and the conditional feature mean How is loss Y AX 2 G relevant to conditional expectation of some E Y x g(y)? Define: [Song et al. (2009), Grunewalder et al. (2013)] µ Y x := Aϕ x
44 Ridge regression and the conditional feature mean How is loss Y AX 2 G relevant to conditional expectation of some E Y x g(y)? Define: [Song et al. (2009), Grunewalder et al. (2013)] µ Y x := Aϕ x We need A to have the property E Y x g(y) g,µ Y x G = g,aϕ x G = A g,ϕ x F = (A g)(x)
45 Ridge regression and the conditional feature mean How is loss Y AX 2 G relevant to conditional expectation of some E Y x g(y)? Define: [Song et al. (2009), Grunewalder et al. (2013)] We need A to have the property µ Y x := Aϕ x E Y x g(y) g,µ Y x G = g,aϕ x G Natural risk function for conditional mean L(A,P XY ) := sup E X g 1 = A g,ϕ x F = (A g)(x) ( E Y X g(y) ) (X) (A g) (X) }{{}}{{} Target Estimator 2,
46 Ridge regression and the conditional feature mean The squared loss risk provides an upper bound on the natural risk. L(A,P XY ) E XY ψ Y Aϕ X 2 G
47 Ridge regression and the conditional feature mean The squared loss risk provides an upper bound on the natural risk. L(A,P XY ) E XY ψ Y Aϕ X 2 G Proof: Jensen and Cauchy Schwarz [( L(A,P XY ) := sup E X EY X g(y) ) (X) (A g)(x) ] 2 g 1 E XY sup [g(y) (A g)(x)] 2 g 1
48 Ridge regression and the conditional feature mean The squared loss risk provides an upper bound on the natural risk. L(A,P XY ) E XY ψ Y Aϕ X 2 G Proof: Jensen and Cauchy Schwarz [( L(A,P XY ) := sup E X EY X g(y) ) (X) (A g)(x) ] 2 g 1 E XY sup [g(y) (A g)(x)] 2 g 1 = E XY sup [ g,ψ Y G A g,ϕ X F ] 2 g 1
49 Ridge regression and the conditional feature mean The squared loss risk provides an upper bound on the natural risk. L(A,P XY ) E XY ψ Y Aϕ X 2 G Proof: Jensen and Cauchy Schwarz [( L(A,P XY ) := sup E X EY X g(y) ) (X) (A g)(x) ] 2 g 1 E XY sup [g(y) (A g)(x)] 2 g 1 = E XY sup [ g,ψ Y G g,aϕ X G ] 2 g 1
50 Ridge regression and the conditional feature mean The squared loss risk provides an upper bound on the natural risk. L(A,P XY ) E XY ψ Y Aϕ X 2 G Proof: Jensen and Cauchy Schwarz [( L(A,P XY ) := sup E X EY X g(y) ) (X) (A g)(x) ] 2 g 1 E XY sup [g(y) (A g)(x)] 2 g 1 = E XY sup g,ψ Y Aϕ X 2 G g 1
51 Ridge regression and the conditional feature mean The squared loss risk provides an upper bound on the natural risk. L(A,P XY ) E XY ψ Y Aϕ X 2 G Proof: Jensen and Cauchy Schwarz [( L(A,P XY ) := sup E X EY X g(y) ) (X) (A g)(x) ] 2 g 1 E XY sup [g(y) (A g)(x)] 2 g 1 = E XY sup g,ψ Y Aϕ X 2 G g 1 E XY ψ Y Aϕ X 2 G
52 Ridge regression and the conditional feature mean The squared loss risk provides an upper bound on the natural risk. L(A,P XY ) E XY ψ Y Aϕ X 2 G Proof: Jensen and Cauchy Schwarz [( L(A,P XY ) := sup E X EY X g(y) ) (X) (A g)(x) ] 2 g 1 E XY sup [g(y) (A g)(x)] 2 g 1 = E XY sup g,ψ Y Aϕ X 2 G g 1 E XY ψ Y Aϕ X 2 G If we assume E Y [g(y) X = x] F then upper bound tight (next slide).
53 Conditions for ridge regression = conditional mean Proof: conditional mean obtained by ridge regression when E Y [g(y) X = x] F Given a function g G. Assume E Y X [g(y) X = ] F. Then C XX E Y X [g(y) X = ] = C XY g.
54 Conditions for ridge regression = conditional mean Proof: conditional mean obtained by ridge regression when E Y [g(y) X = x] F Given a function g G. Assume E Y X [g(y) X = ] F. Then C XX E Y X [g(y) X = ] = C XY g. Proof: [Fukumizu et al., 2004] For all f F, by definition of C XX, f,cxx E Y X [g(y) X = ] F = cov ( f,e Y X [g(y) X = ] ) by definition of C XY. = E X ( f(x)ey X [g(y) X] ) = E XY (f(x)g(y)) = f,c XY g,
55 Kernel Bayes law Prior: Y π(y) Likelihood: (X y) P(x y) with some joint P(x, y)
56 Kernel Bayes law Prior: Y π(y) Likelihood: (X y) P(x y) with some joint P(x, y) Joint distribution: Q(x, y) = P(x y)π(y) Warning: Q P, change of measure from P(y) to π(y) Marginal for x: Q(x) := P(x y)π(y)dy. Bayes law: Q(y x) = P(x y)π(y) Q(x)
57 Kernel Bayes law Posterior embedding via the usual conditional update, µ Q(y x) = C Q(y,x) C 1 Q(x,x) φ x.
58 Kernel Bayes law Posterior embedding via the usual conditional update, µ Q(y x) = C Q(y,x) C 1 Q(x,x) φ x. Given mean embedding of prior: µ π (y) Define marginal covariance: C Q(x,x) = (ϕ x ϕ x ) P(x y)π(y)dx = C (xx)y Cyy 1 µ π(y) Define cross-covariance: C Q(y,x) = (φ y ϕ x ) P(x y)π(y)dx = C (yx)y Cyy 1 µ π(y).
59 Kernel Bayes law: consistency result How to compute posterior expectation from data? Given samples: {(x i,y i )} n i=1 from P xy, {(u j )} n j=1 from prior π. Want to compute E[g(Y) X = x] for g in G For any x X, g T y R Y X k X (x) E[f(Y) X = x] = Op (n 4 27 ), (n ), where g y = (g(y 1 ),...,g(y n )) T R n. k X (x) = (k(x 1,x),...,k(x n,x)) T R n R Y X learned from the samples, contains the u j Smoothness assumptions: π/p Y R(C 1/2 Y Y ), where p Y p.d.f. of P Y, E[g(Y) X = ] R(C 2 Q(xx) ).
60 Experiment: Kernel Bayes law vs EKF
61 Experiment: Kernel Bayes law vs EKF Compare with extended Kalman filter (EKF) on camera orientation task 3600 downsampled frames of RGB pixels (Y t [0,1] 1200 ) 1800 training frames, remaining for test. Gaussian noise added to Y t.
62 Experiment: Kernel Bayes law vs EKF Compare with extended Kalman filter (EKF) on camera orientation task 3600 downsampled frames of RGB pixels (Y t [0,1] 1200 ) 1800 training frames, remaining for test. Gaussian noise added to Y t. Average MSE and standard errors (10 runs) KBR (Gauss) KBR (Tr) Kalman (9 dim.) Kalman (Quat.) σ 2 = ± ± ± ±0.023 σ 2 = ± ± ± ±0.022
63 Overview Introduction to reproducing kernel Hilbert spaces Hilbert space Kernels and feature spaces Reproducing property Nonparametric Bayesian inference Mapping probabilities to feature space Learning conditional probabilities: smooth regression to an RKHS Kernelized Bayesian inference
64 Co-authors From UCL: Luca Baldasssarre Steffen Grunewalder Guy Lever Sam Patterson Massimiliano Pontil External: Kenji Fukumizu, ISM Bernhard Schoelkopf, MPI Alex Smola, Google/CMU Le Song, Georgia Tech Bharath Sriperumbudur, Penn. State
65 Questions?
66 Selected references Characteristic kernels and mean embeddings: Smola, A., Gretton, A., Song, L., Schoelkopf, B. (2007). A hilbert space embedding for distributions. ALT. Sriperumbudur, B., Gretton, A., Fukumizu, K., Schoelkopf, B., Lanckriet, G. (2010). Hilbert space embeddings and metrics on probability measures. JMLR. Gretton, A., Borgwardt, K., Rasch, M., Schoelkopf, B., Smola, A. (2012). A kernel two- sample test. JMLR. Sejdinovic, D., Sriperumbudur, B., Gretton, A., Fukumizu, K. (2013). Equivalence of distance-based and rkhs-based statistics in hypothesis testing. Annals of Statistics. Two-sample, independence, conditional independence tests: Gretton, A., Fukumizu, K., Teo, C., Song, L., Schoelkopf, B., Smola, A. (2008). A kernel statistical test of independence. NIPS Fukumizu, K., Gretton, A., Sun, X., Schoelkopf, B. (2008). Kernel measures of conditional dependence. Gretton, A., Fukumizu, K., Harchaoui, Z., Sriperumbudur, B. (2009). A fast, consistent kernel two-sample test. NIPS. Gretton, A., Borgwardt, K., Rasch, M., Schoelkopf, B., Smola, A. (2012). A kernel two- sample test. JMLR
67 Selected references (continued) Conditional mean embedding, RKHS-valued regression: Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., and Vapnik, V., (2003). Kernel Dependency Estimation, NIPS. Micchelli, C., and Pontil, M., (2005). On Learning Vector-Valued Functions. Neural Computation. Caponnetto, A., and De Vito, E. (2007). Optimal Rates for the Regularized Least-Squares Algorithm. Foundations of Computational Mathematics. Song, L., and Huang, J., and Smola, A., Fukumizu, K., (2009). Hilbert Space Embeddings of Conditional Distributions. ICML. Grunewalder, S., Lever, G., Baldassarre, L., Patterson, S., Gretton, A., Pontil, M. (2012). Conditional mean embeddings as regressors. ICML. Grunewalder, S., Gretton, A., Shawe-Taylor, J. (2013). Smooth operators. ICML. Kernel Bayes rule: Song, L., Fukumizu, K., Gretton, A. (2013). Kernel embeddings of conditional distributions: A unified kernel framework for nonparametric inference in graphical models. IEEE Signal Processing Magazine. Fukumizu, K., Song, L., Gretton, A. (2013). Kernel Bayes rule: Bayesian inference with positive definite kernels, JMLR
68
69 References K. Fukumizu, F. R. Bach, and M. I. Jordan. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. Journal of Machine Learning Research, 5:73 99, L. Song, J. Huang, A. J. Smola, and K. Fukumizu. Hilbert space embeddings of conditional distributions. In Proceedings of the International Conference on Machine Learning,
Kernel Bayes Rule: Nonparametric Bayesian inference with kernels
Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December
More informationHilbert Space Representations of Probability Distributions
Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationAdvances in kernel exponential families
Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate
More informationLearning Interpretable Features to Compare Distributions
Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections
More informationHilbert Space Embedding of Probability Measures
Lecture 2 Hilbert Space Embedding of Probability Measures Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 2017 Recap of Lecture
More informationLecture 2: Mappings of Probabilities to RKHS and Applications
Lecture 2: Mappings of Probabilities to RKHS and Applications Lille, 214 Arthur Gretton Gatsby Unit, CSML, UCL Detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange
More informationRecovering Distributions from Gaussian RKHS Embeddings
Motonobu Kanagawa Graduate University for Advanced Studies kanagawa@ism.ac.jp Kenji Fukumizu Institute of Statistical Mathematics fukumizu@ism.ac.jp Abstract Recent advances of kernel methods have yielded
More informationDistribution Regression with Minimax-Optimal Guarantee
Distribution Regression with Minimax-Optimal Guarantee (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton
More informationHilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems
with Applications to Dynamical Systems Le Song Jonathan Huang School of Computer Science, Carnegie Mellon University Alex Smola Yahoo! Research, Santa Clara, CA, USA Kenji Fukumizu Institute of Statistical
More informationDistribution Regression
Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationHilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems
with Applications to Dynamical Systems Le Song Jonathan Huang School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Alex Smola Yahoo! Research, Santa Clara, CA 95051, USA Kenji
More informationStatistical Convergence of Kernel CCA
Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,
More informationMinimax-optimal distribution regression
Zoltán Szabó (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) ISNPS, Avignon June 12,
More informationAdaptive HMC via the Infinite Exponential Family
Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationHypothesis Testing with Kernel Embeddings on Interdependent Data
Hypothesis Testing with Kernel Embeddings on Interdependent Data Dino Sejdinovic Department of Statistics University of Oxford joint work with Kacper Chwialkowski and Arthur Gretton (Gatsby Unit, UCL)
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationDistribution Regression: A Simple Technique with Minimax-optimal Guarantee
Distribution Regression: A Simple Technique with Minimax-optimal Guarantee (CMAP, École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department,
More informationBayesian Interpretations of Regularization
Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationLearning features to compare distributions
Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationMTTTS16 Learning from Multiple Sources
MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:
More informationKernel Embeddings of Conditional Distributions
Kernel Embeddings of Conditional Distributions Le Song, Kenji Fukumizu and Arthur Gretton Georgia Institute of Technology The Institute of Statistical Mathematics University College London Abstract Many
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationCS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings
CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationA Linear-Time Kernel Goodness-of-Fit Test
A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum 1; Wenkai Xu 1 Zoltán Szabó 2 NIPS 2017 Best paper! Kenji Fukumizu 3 Arthur Gretton 1 wittawatj@gmail.com 1 Gatsby Unit, University College
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationCS 7140: Advanced Machine Learning
Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)
More informationKernel Bayes Rule: Bayesian Inference with Positive Definite Kernels
Journal of Machine Learning Research 14 (2013) 3753-3783 Submitted 12/11; Revised 6/13; Published 12/13 Kernel Bayes Rule: Bayesian Inference with Positive Definite Kernels Kenji Fukumizu The Institute
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationBig Hypothesis Testing with Kernel Embeddings
Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationIntroduction to RKHS, and some simple kernel algorithms
Introduction to RKHS, and some simple kernel algorithms Arthur Gretton October 25, 217 1 Outline In this document, we give a nontechical introduction to reproducing kernel Hilbert spaces (RKHSs), and describe
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationKernel Measures of Conditional Dependence
Kernel Measures of Conditional Dependence Kenji Fukumizu Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 6-8569 Japan fukumizu@ism.ac.jp Arthur Gretton Max-Planck Institute for
More informationNonparametric Tree Graphical Models via Kernel Embeddings
Le Song, 1 Arthur Gretton, 1,2 Carlos Guestrin 1 1 School of Computer Science, Carnegie Mellon University; 2 MPI for Biological Cybernetics Abstract We introduce a nonparametric representation for graphical
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationRobust Support Vector Machines for Probability Distributions
Robust Support Vector Machines for Probability Distributions Andreas Christmann joint work with Ingo Steinwart (Los Alamos National Lab) ICORS 2008, Antalya, Turkey, September 8-12, 2008 Andreas Christmann,
More informationMetric Embedding for Kernel Classification Rules
Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding
More informationApproximate Kernel Methods
Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationLinear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0.
Gaussian Processes Gaussian Process Stochastic process: basically, a set of random variables. may be infinite. usually related in some way. Gaussian process: each variable has a Gaussian distribution every
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More information22 : Hilbert Space Embeddings of Distributions
10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation
More informationStability of Multi-Task Kernel Regression Algorithms
JMLR: Workshop and Conference Proceedings 29:1 16, 2013 ACML 2013 Stability of Multi-Task Kernel Regression Algorithms Julien Audiffren Hachem Kadri Aix-Marseille Université, CNRS, LIF UMR 7279, 13000,
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationInverse Density as an Inverse Problem: the Fredholm Equation Approach
Inverse Density as an Inverse Problem: the Fredholm Equation Approach Qichao Que, Mikhail Belkin Department of Computer Science and Engineering The Ohio State University {que,mbelkin}@cse.ohio-state.edu
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationKernel Bayesian Inference with Posterior Regularization
Kernel Bayesian Inference with Posterior Regularization Yang Song, Jun Zhu, Yong Ren Dept. of Physics, Tsinghua University, Beijing, China Dept. of Comp. Sci. & Tech., TNList Lab; Center for Bio-Inspired
More informationLearning features to compare distributions
Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationApproximate Kernel PCA with Random Features
Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,
More informationUnsupervised Nonparametric Anomaly Detection: A Kernel Method
Fifty-second Annual Allerton Conference Allerton House, UIUC, Illinois, USA October - 3, 24 Unsupervised Nonparametric Anomaly Detection: A Kernel Method Shaofeng Zou Yingbin Liang H. Vincent Poor 2 Xinghua
More informationStochastic optimization in Hilbert spaces
Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationTensor Product Kernels: Characteristic Property, Universality
Tensor Product Kernels: Characteristic Property, Universality Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Hangzhou International Conference on Frontiers of Data Science
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationLess is More: Computational Regularization by Subsampling
Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationBasis Expansion and Nonlinear SVM. Kai Yu
Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 4 Jan-Willem van de Meent (credit: Yijun Zhao, Arthur Gretton Rasmussen & Williams, Percy Liang) Kernel Regression Basis function regression
More informationIntroduction to Kernel methods
Introduction to Kernel ethods ML Workshop, ISI Kolkata Chiranjib Bhattacharyya Machine Learning lab Dept of CSA, IISc chiru@csa.iisc.ernet.in http://drona.csa.iisc.ernet.in/~chiru 19th Oct, 2012 Introduction
More informationLecture 4 February 2
4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have
More informationStructured Prediction
Structured Prediction Ningshan Zhang Advanced Machine Learning, Spring 2016 Outline Ensemble Methods for Structured Prediction[1] On-line learning Boosting AGeneralizedKernelApproachtoStructuredOutputLearning[2]
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationMathematical Methods for Data Analysis
Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data
More informationSemi-Supervised Learning through Principal Directions Estimation
Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de
More informationKernel Adaptive Metropolis-Hastings
Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationLess is More: Computational Regularization by Subsampling
Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro
More informationA Generalized Kernel Approach to Structured Output Learning
A Generalized Kernel Approach to Structured Output Learning Hachem Kadri, Mohammad Ghavamzadeh, Philippe Preux To cite this version: Hachem Kadri, Mohammad Ghavamzadeh, Philippe Preux. A Generalized Kernel
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationHilbert Space Embeddings of Predictive State Representations
Hilbert Space Embeddings of Predictive State Representations Byron Boots Computer Science and Engineering Dept. University of Washington Seattle, WA Arthur Gretton Gatsby Unit University College London
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More information