Distribution Regression: A Simple Technique with Minimax-optimal Guarantee

Size: px
Start display at page:

Download "Distribution Regression: A Simple Technique with Minimax-optimal Guarantee"

Transcription

1 Distribution Regression: A Simple Technique with Minimax-optimal Guarantee (CMAP, École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Parisian Statistics Seminar March 27, 2017

2 Example: sustainability Goal: aerosol prediction = air pollution climate.

3 Example: sustainability Goal: aerosol prediction = air pollution climate. Prediction using labelled bags: bag := multi-spectral satellite measurements over an area, label := local aerosol value.

4 Example: existing methods Multi-instance learning: [Haussler, 1999, Gärtner et al., 2002] (set kernel):

5 Example: existing methods Multi-instance learning: [Haussler, 1999, Gärtner et al., 2002] (set kernel): sensible methods in regression: few, 1 restrictive technical conditions, 2 super-high resolution satellite image: would be needed.

6 One-page summary Contributions: 1 Practical: state-of-the-art accuracy (aerosol). 2 Theoretical: General bags: graphs, time series, texts,... Consistency of set kernel in regression (17-year-old open problem). How many samples/bag?

7 One-page summary Contributions: 1 Practical: state-of-the-art accuracy (aerosol). 2 Theoretical: General bags: graphs, time series, texts,... Consistency of set kernel in regression (17-year-old open problem). How many samples/bag?

8 Objects in the bags Examples: time-series modelling: user = set of time-series, computer vision: image = collection of patch vectors, NLP: corpus = bag of documents, network analysis: group of people = bag of friendship graphs,... Zolta n Szabo

9 Objects in the bags Examples: time-series modelling: user = set of time-series, computer vision: image = collection of patch vectors, NLP: corpus = bag of documents, network analysis: group of people = bag of friendship graphs,... Wider context (statistics): point estimation tasks. Zolta n Szabo

10 Regression on labelled bags Given: labelled bags: ẑ = {( ˆPi, y i )} l i=1, ˆP i : bag from P i, N := ˆP i. test bag: ˆP.

11 Regression on labelled bags Given: labelled bags: ẑ = {( )} l ˆP i, y i i=1, ˆP i : bag from P i, N := ˆP i. test bag: ˆP. Estimator: f λ ẑ 1 = arg min f H l l i=1 [ f ( µ ˆPi }{{} feature of ˆP i ) yi ] 2 + λ f 2 H.

12 Regression on labelled bags Given: labelled bags: ẑ = {( ˆPi, y i )} l i=1, ˆP i : bag from P i, N := ˆP i. test bag: ˆP. Estimator: f λ ẑ Prediction: 1 = arg min f H(K) l l i=1 ŷ ( ˆP) = g T (G + lλi) 1 y, [ f ( µ ˆPi ) yi ] 2 + λ f 2 H. g = [ K ( µ ˆP, µ ˆP i )], G = [ K ( µ ˆP i, µ ˆP j )], y = [yi ].

13 Regression on labelled bags Given: labelled bags: ẑ = {( ˆPi, y i )} l i=1, ˆP i : bag from P i, N := ˆP i. test bag: ˆP. Estimator: f λ ẑ Prediction: 1 = arg min f H(K) l l i=1 ŷ ( ˆP) = g T (G + lλi) 1 y, [ f ( µ ˆPi ) yi ] 2 + λ f 2 H. g = [ K ( µ ˆP, µ ˆP i )], G = [ K ( µ ˆP i, µ ˆP j )], y = [yi ]. Challenges 1 Inner product of distributions: K ( µ ˆP i, µ ˆP j ) =? 2 How many samples/bag?

14 Regression on labelled bags: similarity Let us define an inner product on distributions [ K(P, Q)]: 1 Set kernel: A = {a i } N i=1, B = {b j} N j=1. K(A, B) = 1 N 2 Remember: N 1 N k(a i, b j ) = ϕ(a i ), 1 N N i=1 }{{} feature of bag A i,j=1 N j=1 ϕ(b j ).

15 Regression on labelled bags: similarity Let us define an inner product on distributions [ K(P, Q)]: 1 Set kernel: A = {a i } N i=1, B = {b j} N j=1. K(A, B) = 1 N 2 N 1 N k(a i, b j ) = ϕ(a i ), 1 N N i=1 }{{} feature of bag A i,j=1 N j=1 ϕ(b j ). 2 Taking limit [Berlinet and Thomas-Agnan, 2004, Altun and Smola, 2006, Smola et al., 2007]: a P, b Q K(P, Q) = E a,b k(a, b) = E a ϕ(a), E b ϕ(b). }{{} feature of distribution P=:µ P Example (Gaussian kernel): k(a, b) = e a b 2 2 /(2σ2).

16 RKHS definition(s) Given: D set. Kernel: k(a, b) = ϕ(a), ϕ(b) F, F: Hilbert space.

17 RKHS definition(s) Given: D set. Kernel: k(a, b) = ϕ(a), ϕ(b) F, F: Hilbert space. RKHS: H R D Hilbert space, δ b (f ) = f (b) is continuous ( b).

18 RKHS definition(s) Given: D set. Kernel: k(a, b) = ϕ(a), ϕ(b) F, F: Hilbert space. RKHS: H R D Hilbert space, δ b (f ) = f (b) is continuous ( b). Reproducing kernel of an H R D Hilbert space, 1 k(, b) H,

19 RKHS definition(s) Given: D set. Kernel: k(a, b) = ϕ(a), ϕ(b) F, F: Hilbert space. RKHS: H R D Hilbert space, δ b (f ) = f (b) is continuous ( b). Reproducing kernel of an H R D Hilbert space, 1 k(, b) H, 2 f, k(, b) H = f (b). Note: k(a, b) = k(, a), k(, b) H.

20 RKHS definition(s) Given: D set. Kernel: k(a, b) = ϕ(a), ϕ(b) F, F: Hilbert space. RKHS: H R D Hilbert space, δ b (f ) = f (b) is continuous ( b). Reproducing kernel of an H R D Hilbert space, 1 k(, b) H, 2 f, k(, b) H = f (b). Note: k(a, b) = k(, a), k(, b) H. k : D D R sym. is pd. if G = [k(x i, x j )] n i,j=1 0 ( n, x i).

21 Kernel examples on D = R d, θ > 0 k G (a, b) = e a b 2 2 2θ 2, k e (a, b) = e a b 2 2θ 2, 1 1 k C (a, b) =, k t (a, b) = 1 + a b a b θ, θ 2 2 k p (a, b) = ( a, b + θ) p, k r (a, b) = 1 a b 2 2 a b θ, k i (a, b) = k M, 3 (a, b) = 2 k M, 5 (a, b) = 2 1 a b 22 + θ2, ( ) 3 a b θ ( 5 a b θ e 3 a b 2 θ, + 5 a b 2 2 3θ 2 ) e 5 a b 2 θ.

22 Regression on labelled bags: baseline Quality of estimator, baseline: R(f ) = E (µp,y) ρ[f (µ P ) y] 2, f ρ = best regressor. How many samples/bag to achieve the accuracy of f ρ? Possible? Assume (for a moment): f ρ H(K).

23 Our result: how many samples/bag Known [Caponnetto and De Vito, 2007]: best/realized rate ) R(fz λ ) R(f ρ ) = O (l bc bc+1, b size of the input space, c smoothness of f ρ.

24 Our result: how many samples/bag Known [Caponnetto and De Vito, 2007]: best/realized rate ) R(fz λ ) R(f ρ ) = O (l bc bc+1, b size of the input space, c smoothness of f ρ. Let N = Õ(l a ). N: size of the bags. l: number of bags. Our result If 2 a, then f λ ẑ attains the best achievable rate.

25 Our result: how many samples/bag Known [Caponnetto and De Vito, 2007]: best/realized rate ) R(fz λ ) R(f ρ ) = O (l bc bc+1, b size of the input space, c smoothness of f ρ. Let N = Õ(l a ). N: size of the bags. l: number of bags. Our result If 2 a, then f λ ẑ In fact, a = b(c+1) bc+1 attains the best achievable rate. < 2 is enough. Consequence: regression with set kernel is consistent.

26 Well-specified case: computational & statistical tradeoff Let N = Õ (l a ). Our result If b(c+1) bc+1 a, then R ( fẑ λ ) ) R (fρ ) = O (l bc bc+1.

27 Well-specified case: computational & statistical tradeoff Let N = Õ (l a ). Our result If b(c+1) bc+1 a, then R ( fẑ λ ) ) R (fρ ) = O (l bc bc+1. If a b(c+1) bc+1, then R ( fẑ λ ) ) R (fρ ) = O (l ac c+1. Meaning: smaller a: computational saving, but reduced statistical efficiency.

28 Well-specified case: computational & statistical tradeoff Let N = Õ (l a ). Our result If b(c+1) bc+1 a, then R ( fẑ λ ) ) R (fρ ) = O (l bc bc+1. If a b(c+1) bc+1, then R ( fẑ λ ) ) R (fρ ) = O (l ac c+1. Meaning: smaller a: computational saving, but reduced statistical efficiency. c b(c+1) bc+1 decreasing: easier problems smaller bags.

29 Why can we get consistency/rates? intuition Convergence of the mean embedding: ( ) µ P µ 1 ˆP H = O. N Hölder property of K (0 < L, 0 < h 1): K(, µ P ) K(, µ ˆP) H L µ P µ h ˆP H. f λ ẑ depends nicely on µ ˆP.

30 Valid similarities Recall: K(P, Q) = µ P, µ Q. K G K e K C e µ P µ Q 2 2θ 2 e µ P µ Q 2θ 2 ( 1 + µ P µ Q 2 /θ 2 ) 1 K t (1 + µ P µ Q θ) 1 ( µ P µ Q 2 + θ 2 ) 1 2 K i Functions of µ P µ Q computation: similar to set kernel.

31 Extensions 1 Misspecified setting (f ρ L 2 \H): Consistency: convergence to inf f H f f ρ L 2. Smoothness on f ρ : computational & statistical tradeoff.

32 Extensions 2 Vector-valued output: Y : separable Hilbert space K(µ P, µ Q ) L(Y ).

33 Extensions 2 Vector-valued output: Y : separable Hilbert space K(µ P, µ Q ) L(Y ). Prediction on a test bag ˆP: ŷ ( ˆP) = g T (G + lλi) 1 y, g = [K(µ ˆP, µ ˆPi )], G = [K(µ ˆPi, µ ˆPj )], y = [y i ]. Specifically: Y = R L(Y ) = R; Y = R d L(Y ) = R d d.

34 Misspecified case: consistency Our result Let Then, N = Õ (l), l, λ 0, λ l. ( R f λ ẑ ) R (f ρ ) inf f H f f ρ L 2.

35 Misspecified case: s-smooth Let N = Õ ( l 2a). f ρ : s-smooth, s (0, 1]. Our result (computational & statistical tradeoff) If s+1 s+2 a, then R ( fẑ λ ) ) R (fρ ) = O (l 2s s+2.

36 Misspecified case: s-smooth Let N = Õ ( l 2a). f ρ : s-smooth, s (0, 1]. Our result (computational & statistical tradeoff) If s+1 s+2 a, then R ( fẑ λ ) ) R (fρ ) = O (l 2s s+2. If a s+1 s+2, then R ( fẑ λ ) ) R (fρ ) = O (l 2sa s+1. Meaning: Smaller a: computational saving, but reduced statistical efficiency.

37 Misspecified case: s-smooth Let N = Õ ( l 2a). f ρ : s-smooth, s (0, 1]. Our result (computational & statistical tradeoff) If s+1 s+2 a, then R ( fẑ λ ) ) R (fρ ) = O (l 2s s+2. If a s+1 s+2, then R ( fẑ λ ) ) R (fρ ) = O (l 2sa s+1. Meaning: Smaller a: computational saving, but reduced statistical efficiency. Sensible choice: a s+1 s a 4 3 < 2!

38 Misspecified case: s-smooth Let N = Õ ( l 2a). f ρ : s-smooth, s (0, 1]. Our result (computational & statistical tradeoff) If s+1 s+2 a, then R ( fẑ λ ) ) R (fρ ) = O (l 2s s+2. If a s+1 s+2, then R ( fẑ λ ) ) R (fρ ) = O (l 2sa s+1. Meaning: Smaller a: computational saving, but reduced statistical efficiency. Sensible choice: a s+1 s a 4 3 < 2! s 2s s+2 is increasing: easier task = better rate. s 0 ( f ρ L 2 only): arbitrary slow rate. s = 1: O(l 2 3 ) speed.

39 Misspecified case: optimality Our rate: r(l) = l 2s s+2. One-stage sampled optimal rate: r o (l) = l 2s 2s+1 [Steinwart et al., 2009], s-smoothness + eigendecay constraint, D: compact metric, Y = R.

40 Misspecified case: optimality Our rate: r(l) = l 2s s+2. One-stage sampled optimal rate: r o (l) = l 2s 2s+1 [Steinwart et al., 2009], s-smoothness + eigendecay constraint, D: compact metric, Y = R Rate log l r o (l)=2s/(2s+1) log l r(l)=2s/(s+2) Smoothness (s)

41 s-smoothness: intuition Assumption: f ρ Im(C s ), s (0, 1]. C = uncentered covariance.

42 s-smoothness: intuition Assumption: f ρ Im(C s ), s (0, 1]. C = uncentered covariance. Imagine: C R d d is a symmetric matrix, C = UΛU T

43 s-smoothness: intuition Assumption: f ρ Im(C s ), s (0, 1]. C = uncentered covariance. Imagine: C R d d is a symmetric matrix, C = UΛU T, Cv = d λ n u n, v u n. n=1

44 s-smoothness: intuition Assumption: f ρ Im(C s ), s (0, 1]. C = uncentered covariance. Imagine: C R d d is a symmetric matrix, General C: C = UΛU T, Cv = d λ n u n, v u n. n=1 C(v) = n λ n u n, v u n, C s (v) = λ s n u n, v u n, n { Im(C s ) = c n u n : cnλ 2 2s n n n < }. Larger s faster decay of the c n Fourier coefficients.

45 Aerosol prediction result (100 RMSE) We perform on par with the state-of-the-art, hand-engineered method. Zhuang Wang, Liang Lan, Slobodan Vucetic. IEEE Transactions on Geoscience and Remote Sensing, 2012: (± ): hand-crafted features. Our prediction accuracy: 7.81 (±1.64). no expert knowledge. Code in ITE: #2 on mloss,

46 Summary Problem: distribution regression. Contribution: computational & statistical tradeoff analysis, set kernel: simple algorithm with minimax optimal rate. Learning Theory for Distribution Regression. Journal of Machine Learning Research, 17(152):1-40, 2016.

47 Thank you for the attention! Acknowledgments: This work was supported by the Gatsby Charitable Foundation, and by NSF grants IIS and IIS A part of the work was carried out while Bharath K. Sriperumbudur was a research fellow in the Statistical Laboratory, Department of Pure Mathematics and Mathematical Statistics at the University of Cambridge, UK.

48 Altun, Y. and Smola, A. (2006). Unifying divergence minimization and statistical inference via convex duality. In Conference on Learning Theory (COLT), pages Berlinet, A. and Thomas-Agnan, C. (2004). Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer. Caponnetto, A. and De Vito, E. (2007). Optimal rates for regularized least-squares algorithm. Foundations of Computational Mathematics, 7: Gärtner, T., Flach, P. A., Kowalczyk, A., and Smola, A. (2002). Multi-instance kernels. In International Conference on Machine Learning (ICML), pages Haussler, D. (1999).

49 Convolution kernels on discrete structures. Technical report, Department of Computer Science, University of California at Santa Cruz. ( convolutions.pdf). Smola, A., Gretton, A., Song, L., and Schölkopf, B. (2007). A Hilbert space embedding for distributions. In Algorithmic Learning Theory (ALT), pages Steinwart, I., Hush, D. R., and Scovel, C. (2009). Optimal rates for regularized least squares regression. In Conference on Learning Theory (COLT).

Minimax-optimal distribution regression

Minimax-optimal distribution regression Zoltán Szabó (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) ISNPS, Avignon June 12,

More information

Distribution Regression with Minimax-Optimal Guarantee

Distribution Regression with Minimax-Optimal Guarantee Distribution Regression with Minimax-Optimal Guarantee (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton

More information

Distribution Regression

Distribution Regression Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481

More information

Two-stage Sampled Learning Theory on Distributions

Two-stage Sampled Learning Theory on Distributions Two-stage Sampled Learning Theory on Distributions Zoltán Szabó (Gatsby Unit, UCL) Joint work with Arthur Gretton (Gatsby Unit, UCL), Barnabás Póczos (ML Department, CMU), Bharath K. Sriperumbudur (Department

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Tensor Product Kernels: Characteristic Property, Universality

Tensor Product Kernels: Characteristic Property, Universality Tensor Product Kernels: Characteristic Property, Universality Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Hangzhou International Conference on Frontiers of Data Science

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney

More information

Statistical Optimality of Stochastic Gradient Descent through Multiple Passes

Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Loucas Pillaud-Vivien

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

Kernel methods for Bayesian inference

Kernel methods for Bayesian inference Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20

More information

Two-stage sampled learning theory on distributions

Two-stage sampled learning theory on distributions Zoltán Szabó 1 Arthur Gretton 1 Barnabás Póczos Bharath Sriperumbudur 3 1 Gatsby Unit, UCL Machine Learning Department, CMU 3 Department of Statistics, PSU Abstract We focus on the distribution regression

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

Mathematical Methods for Data Analysis

Mathematical Methods for Data Analysis Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute

More information

Robust Support Vector Machines for Probability Distributions

Robust Support Vector Machines for Probability Distributions Robust Support Vector Machines for Probability Distributions Andreas Christmann joint work with Ingo Steinwart (Los Alamos National Lab) ICORS 2008, Antalya, Turkey, September 8-12, 2008 Andreas Christmann,

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Less is More: Computational Regularization by Subsampling

Less is More: Computational Regularization by Subsampling Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Recovering Distributions from Gaussian RKHS Embeddings

Recovering Distributions from Gaussian RKHS Embeddings Motonobu Kanagawa Graduate University for Advanced Studies kanagawa@ism.ac.jp Kenji Fukumizu Institute of Statistical Mathematics fukumizu@ism.ac.jp Abstract Recent advances of kernel methods have yielded

More information

Kernels. B.Sc. École Polytechnique September 4, 2018

Kernels. B.Sc. École Polytechnique September 4, 2018 CMAP B.Sc. Day @ École Polytechnique September 4, 2018 Inner product Ñ kernel: similarity between features Extension of kpx, yq x, y ř i x iy i : kpx, yq ϕpxq, ϕpyq H. Inner product Ñ kernel: similarity

More information

Metric Embedding for Kernel Classification Rules

Metric Embedding for Kernel Classification Rules Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

Online Gradient Descent Learning Algorithms

Online Gradient Descent Learning Algorithms DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline

More information

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning Francesco Orabona Yahoo! Labs New York, USA francesco@orabona.com Abstract Stochastic gradient descent algorithms

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

A Linear-Time Kernel Goodness-of-Fit Test

A Linear-Time Kernel Goodness-of-Fit Test A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum 1; Wenkai Xu 1 Zoltán Szabó 2 NIPS 2017 Best paper! Kenji Fukumizu 3 Arthur Gretton 1 wittawatj@gmail.com 1 Gatsby Unit, University College

More information

5.6 Nonparametric Logistic Regression

5.6 Nonparametric Logistic Regression 5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that

More information

Structured Prediction

Structured Prediction Structured Prediction Ningshan Zhang Advanced Machine Learning, Spring 2016 Outline Ensemble Methods for Structured Prediction[1] On-line learning Boosting AGeneralizedKernelApproachtoStructuredOutputLearning[2]

More information

Hilbert Space Representations of Probability Distributions

Hilbert Space Representations of Probability Distributions Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck

More information

Kernel-Based Formulations of Spatio-Spectral Transform and Three Related Transforms on the Sphere

Kernel-Based Formulations of Spatio-Spectral Transform and Three Related Transforms on the Sphere Kernel-Based Formulations of Spatio-Spectral Transform and Three Related Transforms on the Sphere Rod Kennedy 1 rodney.kennedy@anu.edu.au 1 Australian National University Azores Antipode Tuesday 15 July

More information

Less is More: Computational Regularization by Subsampling

Less is More: Computational Regularization by Subsampling Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro

More information

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeddings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.edu, bodai@gatech.edu Abstract Kernel embedding of

More information

Unsupervised Nonparametric Anomaly Detection: A Kernel Method

Unsupervised Nonparametric Anomaly Detection: A Kernel Method Fifty-second Annual Allerton Conference Allerton House, UIUC, Illinois, USA October - 3, 24 Unsupervised Nonparametric Anomaly Detection: A Kernel Method Shaofeng Zou Yingbin Liang H. Vincent Poor 2 Xinghua

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

22 : Hilbert Space Embeddings of Distributions

22 : Hilbert Space Embeddings of Distributions 10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

Advances in kernel exponential families

Advances in kernel exponential families Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate

More information

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels 1/12 MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels Dominique Guillot Departments of Mathematical Sciences University of Delaware March 14, 2016 Separating sets:

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,

More information

Support Vector Machines: Kernels

Support Vector Machines: Kernels Support Vector Machines: Kernels CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 14.1, 14.2, 14.4 Schoelkopf/Smola Chapter 7.4, 7.6, 7.8 Non-Linear Problems

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Oslo Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n

More information

RegML 2018 Class 2 Tikhonov regularization and kernels

RegML 2018 Class 2 Tikhonov regularization and kernels RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,

More information

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates : A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.

More information

Hilbert Space Embedding of Probability Measures

Hilbert Space Embedding of Probability Measures Lecture 2 Hilbert Space Embedding of Probability Measures Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 2017 Recap of Lecture

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Adaptive HMC via the Infinite Exponential Family

Adaptive HMC via the Infinite Exponential Family Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family

More information

Characterizing Independence with Tensor Product Kernels

Characterizing Independence with Tensor Product Kernels Characterizing Independence with Tensor Product Kernels Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Department of Statistics, PSU December 13, 2017 Zolta n Szabo

More information

Generalized clustering via kernel embeddings

Generalized clustering via kernel embeddings Generalized clustering via kernel embeddings Stefanie Jegelka 1, Arthur Gretton 2,1, Bernhard Schölkopf 1, Bharath K. Sriperumbudur 3, and Ulrike von Luxburg 1 1 Max Planck Institute for Biological Cybernetics,

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Independent Subspace Analysis

Independent Subspace Analysis Independent Subspace Analysis Barnabás Póczos Supervisor: Dr. András Lőrincz Eötvös Loránd University Neural Information Processing Group Budapest, Hungary MPI, Tübingen, 24 July 2007. Independent Component

More information

Hypothesis Testing with Kernel Embeddings on Interdependent Data

Hypothesis Testing with Kernel Embeddings on Interdependent Data Hypothesis Testing with Kernel Embeddings on Interdependent Data Dino Sejdinovic Department of Statistics University of Oxford joint work with Kacper Chwialkowski and Arthur Gretton (Gatsby Unit, UCL)

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

Regularization in Reproducing Kernel Banach Spaces

Regularization in Reproducing Kernel Banach Spaces .... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 04: Features and Kernels Lorenzo Rosasco Linear functions Let H lin be the space of linear functions f(x) = w x. f w is one

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Distribution-Free Distribution Regression

Distribution-Free Distribution Regression Distribution-Free Distribution Regression Barnabás Póczos, Alessandro Rinaldo, Aarti Singh and Larry Wasserman AISTATS 2013 Presented by Esther Salazar Duke University February 28, 2014 E. Salazar (Reading

More information

Notes on Regularized Least Squares Ryan M. Rifkin and Ross A. Lippert

Notes on Regularized Least Squares Ryan M. Rifkin and Ross A. Lippert Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2007-025 CBCL-268 May 1, 2007 Notes on Regularized Least Squares Ryan M. Rifkin and Ross A. Lippert massachusetts institute

More information

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 10: Support Vector Machine and Large Margin Classifier Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu

More information

Efficient Complex Output Prediction

Efficient Complex Output Prediction Efficient Complex Output Prediction Florence d Alché-Buc Joint work with Romain Brault, Alex Lambert, Maxime Sangnier October 12, 2017 LTCI, Télécom ParisTech, Institut-Mines Télécom, Université Paris-Saclay

More information

A Kernel Between Sets of Vectors

A Kernel Between Sets of Vectors A Kernel Between Sets of Vectors Risi Kondor Tony Jebara Columbia University, New York, USA. 1 A Kernel between Sets of Vectors In SVM, Gassian Processes, Kernel PCA, kernel K de nes feature map Φ : X

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Hilbert Schmidt Independence Criterion

Hilbert Schmidt Independence Criterion Hilbert Schmidt Independence Criterion Thanks to Arthur Gretton, Le Song, Bernhard Schölkopf, Olivier Bousquet Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives

More information

1 Exponential manifold by reproducing kernel Hilbert spaces

1 Exponential manifold by reproducing kernel Hilbert spaces 1 Exponential manifold by reproducing kernel Hilbert spaces 1.1 Introduction The purpose of this paper is to propose a method of constructing exponential families of Hilbert manifold, on which estimation

More information

9.2 Support Vector Machines 159

9.2 Support Vector Machines 159 9.2 Support Vector Machines 159 9.2.3 Kernel Methods We have all the tools together now to make an exciting step. Let us summarize our findings. We are interested in regularized estimation problems of

More information

Big Hypothesis Testing with Kernel Embeddings

Big Hypothesis Testing with Kernel Embeddings Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big

More information

Non-Linear Regression for Bag-of-Words Data via Gaussian Process Latent Variable Set Model

Non-Linear Regression for Bag-of-Words Data via Gaussian Process Latent Variable Set Model Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Non-Linear Regression for Bag-of-Words Data via Gaussian Process Latent Variable Set Model Yuya Yoshikawa Nara Institute of Science

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

Joint Emotion Analysis via Multi-task Gaussian Processes

Joint Emotion Analysis via Multi-task Gaussian Processes Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions

More information

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels EECS 598: Statistical Learning Theory, Winter 2014 Topic 11 Kernels Lecturer: Clayton Scott Scribe: Jun Guo, Soumik Chatterjee Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Kernel Ridge Regression. Mohammad Emtiyaz Khan EPFL Oct 27, 2015

Kernel Ridge Regression. Mohammad Emtiyaz Khan EPFL Oct 27, 2015 Kernel Ridge Regression Mohammad Emtiyaz Khan EPFL Oct 27, 2015 Mohammad Emtiyaz Khan 2015 Motivation The ridge solution β R D has a counterpart α R N. Using duality, we will establish a relationship between

More information

Convergence rates of spectral methods for statistical inverse learning problems

Convergence rates of spectral methods for statistical inverse learning problems Convergence rates of spectral methods for statistical inverse learning problems G. Blanchard Universtität Potsdam UCL/Gatsby unit, 04/11/2015 Joint work with N. Mücke (U. Potsdam); N. Krämer (U. München)

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

Manifold Regularization

Manifold Regularization Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1 The Problem of Learning is

More information

The Learning Problem and Regularization

The Learning Problem and Regularization 9.520 Class 02 February 2011 Computational Learning Statistical Learning Theory Learning is viewed as a generalization/inference problem from usually small sets of high dimensional, noisy data. Learning

More information

On learning with kernels for unordered pairs

On learning with kernels for unordered pairs Martial Hue MartialHue@mines-paristechfr Jean-Philippe Vert Jean-PhilippeVert@mines-paristechfr Mines ParisTech, Centre for Computational Biology, 35 rue Saint Honoré, F-77300 Fontainebleau, France Institut

More information

A Note on Extending Generalization Bounds for Binary Large-Margin Classifiers to Multiple Classes

A Note on Extending Generalization Bounds for Binary Large-Margin Classifiers to Multiple Classes A Note on Extending Generalization Bounds for Binary Large-Margin Classifiers to Multiple Classes Ürün Dogan 1 Tobias Glasmachers 2 and Christian Igel 3 1 Institut für Mathematik Universität Potsdam Germany

More information

Inverse Density as an Inverse Problem: the Fredholm Equation Approach

Inverse Density as an Inverse Problem: the Fredholm Equation Approach Inverse Density as an Inverse Problem: the Fredholm Equation Approach Qichao Que, Mikhail Belkin Department of Computer Science and Engineering The Ohio State University {que,mbelkin}@cse.ohio-state.edu

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions

More information

Optimal kernel methods for large scale learning

Optimal kernel methods for large scale learning Optimal kernel methods for large scale learning Alessandro Rudi INRIA - École Normale Supérieure, Paris joint work with Luigi Carratino, Lorenzo Rosasco 6 Mar 2018 École Polytechnique Learning problem

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information