Convergence Rates of Kernel Quadrature Rules

Size: px
Start display at page:

Download "Convergence Rates of Kernel Quadrature Rules"

Transcription

1 Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015

2 Outline Introduction Quadrature rules Kernel quadrature rules (a.k.a. Bayes-Hermite quadrature) Generic analysis of kernel quadrature rules Eigenvalues of covariance operator Optimal sampling distribution Extensions Link with random feature approximations Full function approximation Herding

3 Quadrature Given a square-integrable function g : X R, and a probability measure dρ, approximating X h(x)g(x)dρ(x) α i h(x i ) i=1 for all functions h : X R in a certain function space F Many applications Main goal: Choice of support points x i X and weights α i R Control of error decay as n grows, uniformly over h

4 Quadrature - Existing rules Generic baseline: Monte-Carlo x i sampled from dρ(x), α i = g(x i )/n, with error O(1/ n)

5 Quadrature - Existing rules Generic baseline: Monte-Carlo x i sampled from dρ(x), α i = g(x i )/n, with error O(1/ n) One-dimensional integrals X = [0, 1] Trapezoidal or Simpson s rules: O(1/n 2 ) for f with uniformly bounded second derivatives (Cruz-Uribe and Neugebauer, 2002) Gaussian quadrature (based on orthogonal polynomials): exact on polynomials or degree 2n 1 (Hildebrand, 1987) Quasi-monte carlo: O(1/n) for functions with bounded variation (Morokoff and Caflisch, 1994)

6 Quadrature - Existing rules Generic baseline: Monte-Carlo x i sampled from dρ(x), α i = g(x i )/n, with error O(1/ n) One-dimensional integrals X = [0, 1] Multi-dimensional X = [0,1] d All uni-dimensional methods above generalize for small d Bayes-Hermite quadrature (O Hagan, 1991) Kernel quadrature (Smola et al., 2007)

7 Quadrature - Existing rules Generic baseline: Monte-Carlo x i sampled from dρ(x), α i = g(x i )/n, with error O(1/ n) One-dimensional integrals X = [0, 1] Multi-dimensional X = [0,1] d All uni-dimensional methods above generalize for small d Bayes-Hermite quadrature (O Hagan, 1991) Kernel quadrature (Smola et al., 2007) Extensions to less-standard sets X Only require a positive-definite kernel on X

8 Quadrature - Existing theoretical results Key reference: Novak (1988) Sobolev space on [0,1] (f (s) square-integrable) Minimax error decay: O(n s ) Sobolev space on [0,1] d All partial derivatives with total order s are square-integrable Minimax error decay: O(n s/d ) Sobolev space on the hypersphere in d+1 dimensions Minimax error decay: O(n 2s/d ) - A single result for all situations?

9 Quadrature - Existing theoretical results Key reference: Novak (1988) Sobolev space on [0,1] (f (s) square-integrable) Minimax error decay: O(n s ) Sobolev space on [0,1] d All partial derivatives with total order s are square-integrable Minimax error decay: O(n s/d ) Sobolev space on the hypersphere in d+1 dimensions Minimax error decay: O(n 2s/d ) A single result for all situations?

10 Kernels and reproducing kernel Hilbert spaces Input space X Positive definite kernel k : X X R Reproducing kernel Hilbert space (RKHS) F Space of functions f : X R spanned by Φ(x) = k(,x), x X Reproducing properties f,k(,x) F = f(x) and k(,y),k(,x) F = k(x,y) Example: Sobolev spaces, e.g., k(x,y) = exp( x y )

11 Quadrature in RKHS Goal: given a distribution dρ on X and g L 2 (dρ), estimation of X h(x)g(x)dρ(x) by α j h(x j ) for any h F j=1

12 Quadrature in RKHS Goal: given a distribution dρ on X and g L 2 (dρ), estimation of X h(x)g(x)dρ(x) by j=1 α j h(x j ) for any h F X k(,x),h F g(x)dρ(x) by α j k(,x j ),h F for any h F j=1 Error = h, k(x, )g(x)dρ(x) X h F α j k(x j, ) j=1 X k(x, )g(x)dρ(x) n j=1 α jk(x j, ) F

13 Quadrature in RKHS Goal: given a distribution dρ on X and g L 2 (dρ), estimation of X h(x)g(x)dρ(x) by j=1 α j h(x j ) for any h F Worst-case bound: sup h F 1 equal to (Smola et al., 2007) X X k(x, )g(x)dρ(x) h(x)g(x)dρ(x) α j k(x j, ) j=1 j=1 F α j h(x j ) is

14 Quadrature in RKHS Goal: find x 1,...,x n such that k(x, )g(x)dρ(x) X is as small as possible j=1 α j k(x j, ) F Computation of weights α R n given x i X, i = 1,...,n Need precise evaluations of µ(y) = X k(x,y)g(x)dρ(x) (Smola et al., 2007; Oates and Girolami, 2015) Minimize 2 n i=1 µ(x i)α i + n i,j=1 α iα j k(x i,x j ) Choice of support points x i Optimization (Chen et al., 2010; Bach et al., 2012) Sampling

15 Quadrature in RKHS Goal: find x 1,...,x n such that k(x, )g(x)dρ(x) X is as small as possible j=1 α j k(x j, ) F Computation of weights α R n given x i X, i = 1,...,n Need precise evaluations of µ(y) = X k(x,y)g(x)dρ(x) (Smola et al., 2007; Oates and Girolami, 2015) Minimize 2 n i=1 µ(x i)α i + n i,j=1 α iα j k(x i,x j ) Choice of support points x i Optimization (Chen et al., 2010; Bach et al., 2012) Sampling

16 Outline Introduction Quadrature rules Kernel quadrature rules (a.k.a. Bayes-Hermite quadrature) Generic analysis of kernel quadrature rules Eigenvalues of covariance operator Optimal sampling distribution Extensions Link with random feature approximations Full function approximation Herding

17 Generic analysis of kernel quadrature rules Support points x i sampled i.i.d. from a density q w.r.t. dρ Importance weighted quadrature and error bound β i q(x i ) 1/2k(,x i) k(,x)g(x)dρ(x) i=1 Robustness to noise: β 2 2 small X 2 F

18 Generic analysis of kernel quadrature rules Support points x i sampled i.i.d. from a density q w.r.t. dρ Importance weighted quadrature and error bound β i q(x i ) 1/2k(,x i) k(,x)g(x)dρ(x) i=1 Robustness to noise: β 2 2 small Approximation of a function µ = k(,x)g(x)dρ(x) by random X elements from an RKHS Classical tool: eigenvalues of covariance operator Mercer decomposition (Mercer, 1909) k(x,y) = m 1µ m e m (x)e m (y) X 2 F

19 Upper-bound Assumptions x 1,...,x n X sampled i.i.d. with density q(x)= µ m 1 m +λ e m(x) 2 Degrees of freedom d(λ) = m 1 µ m µ m +λ µ m

20 Upper-bound Assumptions x 1,...,x n X sampled i.i.d. with density q(x)= µ m 1 m +λ e m(x) 2 Degrees of freedom d(λ) = m 1 µ m µ m +λ Bound: for any δ > 0, if n 4+6d(λ)log 4d(λ), with probability δ greater than 1 δ, we have sup g L2 (dρ) 1 inf β n i=1 β i q(x i ) 1/2k(,x i) X µ m k(,x)g(x)dρ(x) 2 F 4λ

21 Upper-bound Assumptions x 1,...,x n X sampled i.i.d. with density q(x)= µ m 1 m +λ e m(x) 2 Degrees of freedom d(λ) = m 1 µ m µ m +λ Bound: for any δ > 0, if n 4+6d(λ)log 4d(λ), with probability δ greater than 1 δ, we have sup g L2 (dρ) 1 inf β n i=1 β i q(x i ) 1/2k(,x i) X µ m k(,x)g(x)dρ(x) Proof technique (Bach, 2013; El Alaoui and Mahoney, 2014) Explicit β obtained by regularizing by λ β 2 2 Concentration inequalities in Hilbert space 2 F 4λ

22 Upper-bound Assumptions x 1,...,x n X sampled i.i.d. with density q(x)= µ m 1 m +λ e m(x) 2 Degrees of freedom d(λ) = m 1 µ m µ m +λ Bound: for any δ > 0, if n 5d(λ)log 16d(λ), with probability δ greater than 1 δ, we have sup g L2 (dρ) 1 inf β n i=1 β i q(x i ) 1/2k(,x i) X Matching lower bound (any family of x i ) µ m k(,x)g(x)dρ(x) Key features: (a) degrees of freedom d(λ) and (b) distribution q 2 F 4λ

23 Degrees of freedom Degrees of freedom d(λ) = m 1 µ m µ m +λ Traditional quantity for analysis of kernel methods (Hastie and Tibshirani, 1990) If eigenvalues decay as µ m m α, α > 1, then d(λ) #{m,µ m λ} λ 1/α Sobolev spaces in dimension d and order s: α = 2s/d Take-home : need #{m,µ m λ} features for squared error λ Take-home : After n sampled points, squared error µ n

24 Degrees of freedom Degrees of freedom d(λ) = m 1 µ m µ m +λ Traditional quantity for analysis of kernel methods (Hastie and Tibshirani, 1990) If eigenvalues decay as µ m m α, α > 1, then d(λ) #{m,µ m λ} λ 1/α Sobolev spaces in dimension d and order s: α = 2s/d Take-home : need #{m,µ m λ} features for squared error λ Take-home : After n sampled points, squared error µ n

25 Optimized sampling distribution µ m Density q(x)= µ m 1 m +λ e m(x) 2 Relationship to leverage scores (Mahoney, 2011) Hard to compute in generic situations Possible approximations (Drineas et al., 2012)

26 Optimized sampling distribution µ m Density q(x)= µ m 1 m +λ e m(x) 2 Relationship to leverage scores (Mahoney, 2011) Hard to compute in generic situations Possible approximations (Drineas et al., 2012) Sobolev spaces on [0,1] d or hypersphere Equal to uniform distribution Matches known minimax rates

27 Outline Introduction Quadrature rules Kernel quadrature rules (a.k.a. Bayes-Hermite quadrature) Generic analysis of kernel quadrature rules Eigenvalues of covariance operator Optimal sampling distribution Extensions Link with random feature approximations Full function approximation Herding

28 Link with random feature expansions Some kernels are naturally expressed as expectations [ ] 1 k(x,y) = E v ϕ(v,x)ϕ(v,y) ϕ(v i,x)ϕ(v i,y) n Neural networks with infinitely random hidden units (Neal, 1995) Fourier features (Rahimi and Recht, 2007) Main question: minimal number n of features for a given approximation quality i=1

29 Link with random feature expansions Some kernels are naturally expressed as expectations [ ] 1 k(x,y) = E v ϕ(v,x)ϕ(v,y) ϕ(v i,x)ϕ(v i,y) n Neural networks with infinitely random hidden units (Neal, 1995) Fourier features (Rahimi and Recht, 2007) Main question: minimal number n of features for a given approximation quality Kernel quadrature is a subcase of random feature expansions i=1 Mercer decomposition: k(x,y) = m 1 µ me m (x)e m (y) ϕ(v,x) = m 1 µ1/2 m e m (x)e m (v), with v X sampled from dρ [ ] E v ϕ(v,x)ϕ(v,y) = (µ m µ m ) 1/2 e m (x)e m (y) ( Ee m (v)e m (v) ) m,m 1

30 Full function approximation Fact: given support points x i, quadrature rule for X h(x)g(x)dρ(x) has weights which are linear in g, that is α i = a i,g L2 (dρ) X h(x)g(x)dρ(x) i=1 α i h(x i ) = g,h i=1 h(x i )a i L 2 (dρ)

31 Full function approximation Fact: given support points x i, quadrature rule for X h(x)g(x)dρ(x) has weights which are linear in g, that is α i = a i,g L2 (dρ) X h(x)g(x)dρ(x) i=1 α i h(x i ) = g,h i=1 h(x i )a i L 2 (dρ) Uniform bound for g L2 (dρ) 1 approximation of h in L 2 (dρ) Recover result from Novak (1988) Approximation in RKHS norm not possible Approximation in L -norm incur loss of performance of n

32 Full function approximation Fact: given support points x i, quadrature rule for X h(x)g(x)dρ(x) has weights which are linear in g, that is α i = a i,g L2 (dρ) X h(x)g(x)dρ(x) i=1 α i h(x i ) = g,h i=1 h(x i )a i L 2 (dρ) Uniform bound for g L2 (dρ) 1 approximation of h in L 2 (dρ) Recover result from Novak (1988) Approximation in RKHS norm not possible Approximation in L -norm incur loss of performance of n Adaptivity to smoother functions If h is (a bit) smoother, then the rate is still optimal

33 Sobolev spaces on [0,1] Quadrature rule obtained from order t, applied to order s log 10 (squared error) Test functions : s = 1 t=1 : 2.0 t=2 : 2.5 t=3 : 3.3 t=4 : 5.2 log 10 (squared error) Test functions : s = 2 t=1 : 3.9 t=2 : 3.9 t=3 : 4.0 t=4 : log (n) log (n) 10 0 Test functions : s = 3 0 Test functions : s = 4 log 10 (squared error) t=1 : 4.3 t=2 : 5.8 t=3 : 5.8 t=4 : log (n) 10 log 10 (squared error) t=1 : 4.2 t=2 : t=3 : 7.5 t=4 : log (n) 10 2

34 Herding Choosing points through optimization (Chen et al., 2010) Interpretation as Frank-Wolfe optimization (Bach, Lacoste- Julien, and Obozinski, 2012) Convex weights α Extra-projection step (Briol et al., 2015)

35 Herding Choosing points through optimization (Chen et al., 2010) Interpretation as Frank-Wolfe optimization (Bach, Lacoste- Julien, and Obozinski, 2012) Convex weights α Extra-projection step (Briol et al., 2015) Open problem: No true convergence rate (Bach et al., 2012) In finite dimension: exponential convergence rate depending on the existence of a certain constant c > 0 In infinite dimension: the constant c is provably equal to zero (exponential would contradict lower bounds)

36 Conclusion Sharp analysis of kernel quadrature rules Spectrum of the covariance operator (µ m ) n points sampled i.i.d. from a well chosen distribution Error of µ n Applies to all X with a positive definite kernel

37 Conclusion Sharp analysis of kernel quadrature rules Spectrum of the covariance operator (µ m ) n points sampled i.i.d. from a well chosen distribution Error of µ n Applies to all X with a positive definite kernel Extensions Computationally efficient ways to sample from optimized distribution (Drineas et al., 2012) Anytime sampling From quadrature to maximization (Novak, 1988) Quasi-random sampling (Yang et al., 2014; Oates and Girolami, 2015)

38 References F. Bach. Sharp analysis of low-rank kernel matrix approximations. In Proceedings of the International Conference on Learning Theory (COLT), Francis Bach, Simon Lacoste-Julien, and Guillaume Obozinski. On the equivalence between herding and conditional gradient algorithms. arxiv preprint arxiv: , François-Xavier Briol, Chris J Oates, Mark Girolami, and Michael A Osborne. Frank-wolfe bayesian quadrature: Probabilistic integration with theoretical guarantees. In Adv. NIPS, Y. Chen, M. Welling, and A. Smola. Super-samples from kernel herding. In Proc. UAI, D. Cruz-Uribe and C. J. Neugebauer. Sharp error bounds for the trapezoidal rule and Simpson s rule. Journal of Inequalities in Pure and Applied Mathematics, 3(4), P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruff. Fast approximation of matrix coherence and statistical leverage. Journal of Machine Learning Research, 13: , A. El Alaoui and M. W. Mahoney. Fast randomized kernel methods with statistical guarantees. Technical Report , arxiv, T. J. Hastie and R. J. Tibshirani. Generalized Additive Models. Chapman & Hall, F. B. Hildebrand. Introduction to Numerical Analysis. Courier Dover Publications, M. W. Mahoney. Randomized algorithms for matrices and data. Foundations and Trends in Machine Learning, 3(2): , J. Mercer. Functions of positive and negative type, and their connection with the theory of integral

39 equations. Philosophical Transactions of the Royal Society of London, Series A., 209: , W. J. Morokoff and R. E. Caflisch. Quasi-random sequences and their discrepancies. SIAM Journal on Scientific Computing, 15(6): , R. M. Neal. Bayesian Learning for Neural Networks. PhD thesis, University of Toronto, E. Novak. Deterministic and Stochastic Error Bounds in Numerical Analysis. Springer-Verlag, C. J. Oates and M. Girolami. Variance reduction for quasi-monte-carlo. Technical Report , arxiv, A. O Hagan. Bayes-Hermite quadrature. Journal of statistical planning and inference, 29(3): , A. Rahimi and B. Recht. Random features for large-scale kernel machines. Advances in neural information processing systems, 20: , A. Smola, A. Gretton, L. Song, and B. Schölkopf. A Hilbert space embedding for distributions. In Algorithmic Learning Theory, pages Springer, J. Yang, V. Sindhwani, H. Avron, and M. Mahoney. Quasi-Monte Carlo feature maps for shift-invariant kernels. In Proceedings of the International Conference on Machine Learning (ICML), 2014.

Convergence guarantees for kernel-based quadrature rules in misspecified settings

Convergence guarantees for kernel-based quadrature rules in misspecified settings Convergence guarantees for kernel-based quadrature rules in misspecified settings Motonobu Kanagawa, Bharath K Sriperumbudur, Kenji Fukumizu The Institute of Statistical Mathematics, Tokyo 19-856, Japan

More information

On the Equivalence between Herding and Conditional Gradient Algorithms

On the Equivalence between Herding and Conditional Gradient Algorithms On the Equivalence between Herding and Conditional Gradient Algorithms Francis Bach, Simon Lacoste-Julien, Guillaume Obozinski firstname.lastname@inria.fr Sierra project-team, INRIA, Département d Informatique

More information

Less is More: Computational Regularization by Subsampling

Less is More: Computational Regularization by Subsampling Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro

More information

Statistical Optimality of Stochastic Gradient Descent through Multiple Passes

Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Loucas Pillaud-Vivien

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Statistical Convergence of Kernel CCA

Statistical Convergence of Kernel CCA Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

Less is More: Computational Regularization by Subsampling

Less is More: Computational Regularization by Subsampling Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Stochastic gradient descent and robustness to ill-conditioning

Stochastic gradient descent and robustness to ill-conditioning Stochastic gradient descent and robustness to ill-conditioning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,

More information

Optimal kernel methods for large scale learning

Optimal kernel methods for large scale learning Optimal kernel methods for large scale learning Alessandro Rudi INRIA - École Normale Supérieure, Paris joint work with Luigi Carratino, Lorenzo Rosasco 6 Mar 2018 École Polytechnique Learning problem

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,

More information

Kernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt.

Kernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt. SINGAPORE SHANGHAI Vol TAIPEI - Interdisciplinary Mathematical Sciences 19 Kernel-based Approximation Methods using MATLAB Gregory Fasshauer Illinois Institute of Technology, USA Michael McCourt University

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - CAP, July

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

Kernel methods for Bayesian inference

Kernel methods for Bayesian inference Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees

Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees François-Xavier Briol Department of Statistics University of Warwick f-x.briol@warwick.ac.uk Mark Girolami Department

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

TUM 2016 Class 3 Large scale learning by regularization

TUM 2016 Class 3 Large scale learning by regularization TUM 2016 Class 3 Large scale learning by regularization Lorenzo Rosasco UNIGE-MIT-IIT July 25, 2016 Learning problem Solve min w E(w), E(w) = dρ(x, y)l(w x, y) given (x 1, y 1 ),..., (x n, y n ) Beyond

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Random Projections. Lopez Paz & Duvenaud. November 7, 2013

Random Projections. Lopez Paz & Duvenaud. November 7, 2013 Random Projections Lopez Paz & Duvenaud November 7, 2013 Random Outline The Johnson-Lindenstrauss Lemma (1984) Random Kitchen Sinks (Rahimi and Recht, NIPS 2008) Fastfood (Le et al., ICML 2013) Why random

More information

Classical quadrature rules via Gaussian processes

Classical quadrature rules via Gaussian processes Classical quadrature rules via Gaussian processes Toni Karvonen and Simo Särkkä This article has been accepted for publication in Proceedings of 2th IEEE International Workshop on Machine Learning for

More information

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions

More information

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway

More information

Recovering Distributions from Gaussian RKHS Embeddings

Recovering Distributions from Gaussian RKHS Embeddings Motonobu Kanagawa Graduate University for Advanced Studies kanagawa@ism.ac.jp Kenji Fukumizu Institute of Statistical Mathematics fukumizu@ism.ac.jp Abstract Recent advances of kernel methods have yielded

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information

Hierarchical kernel learning

Hierarchical kernel learning Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel

More information

Online Gradient Descent Learning Algorithms

Online Gradient Descent Learning Algorithms DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new

More information

Distribution Regression: A Simple Technique with Minimax-optimal Guarantee

Distribution Regression: A Simple Technique with Minimax-optimal Guarantee Distribution Regression: A Simple Technique with Minimax-optimal Guarantee (CMAP, École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department,

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates : A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Oslo Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth Computer Science Department, University of California - Santa Cruz dima@cse.ucsc.edu manfred@cse.ucsc.edu Abstract A number of updates for density matrices have been developed

More information

Parcimonie en apprentissage statistique

Parcimonie en apprentissage statistique Parcimonie en apprentissage statistique Guillaume Obozinski Ecole des Ponts - ParisTech Journée Parcimonie Fédération Charles Hermite, 23 Juin 2014 Parcimonie en apprentissage 1/44 Classical supervised

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Lecture 3-A - Convex) SUVRIT SRA Massachusetts Institute of Technology Special thanks: Francis Bach (INRIA, ENS) (for sharing this material, and permitting its use) MPI-IS

More information

Distribution Regression

Distribution Regression Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481

More information

A Linear-Time Kernel Goodness-of-Fit Test

A Linear-Time Kernel Goodness-of-Fit Test A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum 1; Wenkai Xu 1 Zoltán Szabó 2 NIPS 2017 Best paper! Kenji Fukumizu 3 Arthur Gretton 1 wittawatj@gmail.com 1 Gatsby Unit, University College

More information

Gradient-based Sampling: An Adaptive Importance Sampling for Least-squares

Gradient-based Sampling: An Adaptive Importance Sampling for Least-squares Gradient-based Sampling: An Adaptive Importance Sampling for Least-squares Rong Zhu Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China. rongzhu@amss.ac.cn Abstract

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Introduction to Three Paradigms in Machine Learning. Julien Mairal

Introduction to Three Paradigms in Machine Learning. Julien Mairal Introduction to Three Paradigms in Machine Learning Julien Mairal Inria Grenoble Yerevan, 208 Julien Mairal Introduction to Three Paradigms in Machine Learning /25 Optimization is central to machine learning

More information

Maximum Mean Discrepancy

Maximum Mean Discrepancy Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

On non-parametric robust quantile regression by support vector machines

On non-parametric robust quantile regression by support vector machines On non-parametric robust quantile regression by support vector machines Andreas Christmann joint work with: Ingo Steinwart (Los Alamos National Lab) Arnout Van Messem (Vrije Universiteit Brussel) ERCIM

More information

Distribution Regression with Minimax-Optimal Guarantee

Distribution Regression with Minimax-Optimal Guarantee Distribution Regression with Minimax-Optimal Guarantee (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models

Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models Danilo Jimenez Rezende Shakir Mohamed Daan Wierstra Google DeepMind, London, United Kingdom DANILOR@GOOGLE.COM

More information

Kernel methods and the exponential family

Kernel methods and the exponential family Kernel methods and the exponential family Stéphane Canu 1 and Alex J. Smola 2 1- PSI - FRE CNRS 2645 INSA de Rouen, France St Etienne du Rouvray, France Stephane.Canu@insa-rouen.fr 2- Statistical Machine

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

Hilbert Space Representations of Probability Distributions

Hilbert Space Representations of Probability Distributions Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck

More information

MTTTS16 Learning from Multiple Sources

MTTTS16 Learning from Multiple Sources MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Probabilistic numerics for deep learning

Probabilistic numerics for deep learning Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling

More information

Mathematical Methods for Data Analysis

Mathematical Methods for Data Analysis Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data

More information

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables Revised submission to IEEE TNN Aapo Hyvärinen Dept of Computer Science and HIIT University

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

1. Mathematical Foundations of Machine Learning

1. Mathematical Foundations of Machine Learning 1. Mathematical Foundations of Machine Learning 1.1 Basic Concepts Definition of Learning Definition [Mitchell (1997)] A computer program is said to learn from experience E with respect to some class of

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra (Part 2) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,

More information

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Convergence of the Ensemble Kalman Filter in Hilbert Space

Convergence of the Ensemble Kalman Filter in Hilbert Space Convergence of the Ensemble Kalman Filter in Hilbert Space Jan Mandel Center for Computational Mathematics Department of Mathematical and Statistical Sciences University of Colorado Denver Parts based

More information

Memory Efficient Kernel Approximation

Memory Efficient Kernel Approximation Si Si Department of Computer Science University of Texas at Austin ICML Beijing, China June 23, 2014 Joint work with Cho-Jui Hsieh and Inderjit S. Dhillon Outline Background Motivation Low-Rank vs. Block

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu Dilin Wang Department of Computer Science Dartmouth College Hanover, NH 03755 {qiang.liu, dilin.wang.gr}@dartmouth.edu

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Machine Learning and Related Disciplines

Machine Learning and Related Disciplines Machine Learning and Related Disciplines The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 8-12 (Mon.-Fri.) Yung-Kyun Noh Machine Learning Interdisciplinary

More information

Polynomial interpolation on the sphere, reproducing kernels and random matrices

Polynomial interpolation on the sphere, reproducing kernels and random matrices Polynomial interpolation on the sphere, reproducing kernels and random matrices Paul Leopardi Mathematical Sciences Institute, Australian National University. For presentation at MASCOS Workshop on Stochastics

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information