Learning Interpretable Features to Compare Distributions

Size: px
Start display at page:

Download "Learning Interpretable Features to Compare Distributions"

Transcription

1 Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, /41

2 Goal of this talk Given: Two collections of samples from unknown distributions P and Q. Goal: Learn distinguishing features that indicate how P and Q differ. 2/41

3 Goal of this talk Given: Two collections of samples from unknown distributions P and Q. Goal: Learn distinguishing features that indicate how P and Q differ. 2/41

4 Goal of this talk Have: Two collections of samples X; Y from unknown distributions P and Q. Goal: Learn distinguishing features that indicate how P and Q differ. MNIST samples Samples from a GAN Significant difference in GAN and MNIST? 3/41

5 Divergences 4/41

6 Divergences 5/41

7 Divergences 6/41

8 Divergences 7/41

9 Divergences Sriperumbudur, Fukumizu, G, Schoelkopf, Lanckriet (2012) 8/41

10 Overview The maximum mean discrepancy: How to compute and interpret the MMD Training the MMD to maximize test power Application to troubleshooting GANs The ME test statistic: Informative, linear time features for comparing distributions How to learn these features The maximum Stein discrepancy: Divergence between sample and model Only need model up to normalizing constant 9/41

11 Feature mean difference Simple example: 2 Gaussians with different means Answer: t-test Two Gaussians with different means P X Q X Prob. density X 10/41

12 Feature mean difference Two Gaussians with same means, different variance Idea: look at difference in means of features of the RVs In Gaussian case: second order features of form '(x ) = x Two Gaussians with different variances P X Q X 0.3 Prob. density X 11/41

13 Feature mean difference Two Gaussians with same means, different variance Idea: look at difference in means of features of the RVs In Gaussian case: second order features of form '(x ) = x 2 Prob. density Two Gaussians with different variances X P X Q X Prob. density Densities of feature X X 2 P X Q X 11/41

14 Feature mean difference Gaussian and Laplace distributions Same mean and same variance Difference in means using higher order features...rkhs Prob. density Gaussian and Laplace densities P X Q X X 12/41

15 The maximum mean discrepancy The reproducing property (kernel trick) Given x 2 X for some set X, define feature map '(x ) 2 F, '(x ) = [: : : ' i (x ) : : :] 2 `2 For positive definite k (x ; x 0 ), k (x ; x 0 ) = h'(x ); '(x 0 )i F e.g. k(x ; x 0 ) = exp kx x 0 k 2 13/41

16 The maximum mean discrepancy The reproducing property (kernel trick) Given x 2 X for some set X, define feature map '(x ) 2 F, '(x ) = [: : : ' i (x ) : : :] 2 `2 For positive definite k (x ; x 0 ), k (x ; x 0 ) = h'(x ); '(x 0 )i F e.g. k(x ; x 0 ) = exp kx x 0 k 2 The mean trick Given P a Borel probability measure on X, define feature map P 2 F P = [: : : E P [' i (X )] : : :] For positive definite k (x ; x 0 ), E P ;Qk(x ; y) = h P ; Q i F for x P and y Q. 13/41

17 The maximum mean discrepancy The maximum mean discrepancy is the distance between feature means: MMD 2 (P ; Q) = k P Q k 2 F = h P ; P i F + h Q ; Q i F 2 h P ; Q i F {z } (a) = E P k (X ; X 0 ) + E Q k (Y ; Y 0 ) {z } (a) 2E P ;Qk(X ; Y ) {z } (b) (a)= within distrib. similarity, (b)= cross-distrib. similarity. 14/41

18 The maximum mean discrepancy The maximum mean discrepancy is the distance between feature means: MMD 2 (P ; Q) = k P Q k 2 F = h P ; P i F + h Q ; Q i F 2 h P ; Q i F {z } (a) = E P k (X ; X 0 ) + E Q k (Y ; Y 0 ) {z } (a) 2E P ;Qk(X ; Y ) {z } (b) (a)= within distrib. similarity, (b)= cross-distrib. similarity. Unbiased empirical estimate of first term (quadratic time) be P k (x ; x 0 1 ) = m(m 1) mx mx i=1 j 6=i k (x i ; x j ) 14/41

19 Illustration of MMD Dogs (= P ) and fish (= Q) example revisited Each entry is one of k (dog i ; dog j ), k (dog i ; fish j ), or k (fish i ; fish j ) 15/41

20 Illustration of MMD The maximum mean discrepancy: \MMD 2 1 X = n(n 1) i6=j 2 X n 2 k(dog i ; fish j ) i ;j k(dog i ; dog j ) + 1 n(n 1) X i6=j k(fish i ; fish j ) 16/41

21 Witness function interpretation of MMD Are P and Q different? P(x) Q(y) /41

22 Witness function interpretation of MMD 18/41

23 Witness function interpretation of MMD Observe X = fx 1 ; : : : ; x n g P Observe Y = fy 1 ; : : : ; y n g Q 18/41

24 Witness function interpretation of MMD Gaussian kernel on x i Gaussian kernel on y i 18/41

25 Witness function interpretation of MMD ^P (v): mean embedding of b P ^Q (v): mean embedding of c Q v ^P (v) := 1 m P mi=1 k (x i ; v ) 18/41

26 Witness function interpretation of MMD ^P (v): mean embedding of b P ^Q (v): mean embedding of c Q witness(v) = ^P (v) {z } ^Q (v) v 18/41

27 Function showing difference in distributions Maximum mean discrepancy: smooth function for P vs Q MMD(P ; Q; F ) := sup [E P f (X ) E Q f (Y )] f 2F MMD(P ; Q; F ) = 0 iff P = Q when F = the unit ball in a characteristic RKHS 19/41

28 Function showing difference in distributions Maximum mean discrepancy: smooth function for P vs Q MMD(P ; Q; F ) := sup [E P f (X ) E Q f (Y )] f 2F MMD(P ; Q; F ) = 0 iff P = Q when F = the unit ball in a characteristic RKHS Example revisited: Laplace P vs Gauss Q Prob. density and f Witness f for Gauss and Laplace densities f Gauss Laplace X 19/41

29 Asymptotics of MMD The MMD: \MMD 2 1 = n(n 1) X k(x i ; x j ) + i6=j 2 X n 2 k(x i ; y j ) but how to choose the kernel? i ;j 1 n(n 1) X i6=j k(y i ; y j ) 20/41

30 Asymptotics of MMD The MMD: \MMD 2 1 = n(n 1) X k(x i ; x j ) + i6=j 2 X n 2 k(x i ; y j ) but how to choose the kernel? i ;j 1 n(n 1) X i6=j k(y i ; y j ) Perspective from statistical hypothesis testing: When P = Q then MMD 2 close to zero. \ When P 6= \ Q then MMD 2 far from zero \ 2 Threshold c for MMD gives false positive rate 20/41

31 A statistical test MMD density P=Q P Q Prob. of n MMD False negatives c α = 1 α quantile when P=Q n MMD 21/41

32 A statistical test MMD density P=Q P Q Prob. of n MMD False negatives c α = 1 α quantile when P=Q n MMD Best kernel gives lowest false negative rate (=highest power) 21/41

33 A statistical test MMD density P=Q P Q Prob. of n MMD False negatives c α = 1 α quantile when P=Q n MMD Best kernel gives lowest false negative rate (=highest power)... but can you train for this? 21/41

34 Asymptotics of MMD When P 6= Q, statistic is asymptotically normal, \MMD 2 MMD(P ; Q) p Vn (P ; Q) D! N (0; 1); where MMD(P ; Q) is population MMD, and V n (P ; Q) = O n MMD distribution and Gaussian fit under H1 Empirical PDF Gaussian fit Prob. density MMD 22/41

35 Asymptotics of MMD n\ MMD 2 1X Where P = Q, statistic has asymptotic distribution l=1 l hz 2 l 2 i MMD density under H0 χ 2 sum Empirical PDF where i i (x 0 ) = Z X ~k (x ; x 0 ) {z } centred i (x )dp (x ) 0.5 Prob. density z l N (0; 2) i:i:d: n MMD 2 23/41

36 Optimizing test power The power of our test (Pr 1 denotes probability under P 6= Q): n\ 2 Pr 1 MMD > ^c 24/41

37 Optimizing test power The power of our test (Pr 1 denotes probability under P 6= Q): n\ 2 Pr 1 MMD > ^c! MMD2 (P ; Q) p Vn (P ; Q) c n p V n (P ; Q)! where is the CDF of the standard normal distribution. ^c is an estimate of c test threshold. 24/41

38 Optimizing test power The power of our test (Pr 1 denotes probability under P 6= Q): n\ 2 Pr 1 MMD > ^c! MMD2 (P ; Q) p Vn (P ; Q) {z } O(n 1=2 ) c n p V n (P ; Q) {z } O(n 3=2 )! Second term asymptotically negligible! 24/41

39 Optimizing test power The power of our test (Pr 1 denotes probability under P 6= Q): n\ 2 Pr 1 MMD > ^c! MMD2 (P ; Q) p Vn (P ; Q) c n p V n (P ; Q)! To maximize test power, maximize MMD 2 (P ; Q) p Vn (P ; Q) (Sutherland, Tung, Strathmann, De, Ramdas, Smola, G., ICLR 2017) Code: github.com/dougalsutherland/opt-mmd 24/41

40 Troubleshooting for generative adversarial networks MNIST samples Samples from a GAN 25/41

41 Troubleshooting for generative adversarial networks MNIST samples ARD map Samples from a GAN Power for optimzed ARD kernel: 1.00 at = 0:01 Power for optimized RBF kernel: 0.57 at = 0:01 25/41

42 Benchmarking generative adversarial networks 26/41

43 The ME statistic and test 27/41

44 Distinguishing Feature(s) ^P (v): mean embedding of P ^Q (v): mean embedding of Q witness(v) = ^P (v) {z } ^Q (v) v 28/41

45 Distinguishing Feature(s) witness 2 (v) Take square of witness (only worry about amplitude) 29/41

46 Distinguishing Feature(s) New test statistic: witness 2 at a single v ; Linear time in number n of samples...but how to choose best feature v? 29/41

47 Distinguishing Feature(s) v Best feature = v that maximizes witness 2 (v)?? 29/41

48 Distinguishing Feature(s) witness 2 (v) Sample size n = 3 30/41

49 Distinguishing Feature(s) Sample size n = 50 30/41

50 Distinguishing Feature(s) Sample size n = /41

51 Distinguishing Feature(s) P(x) Q(y) witness 2 (v) Population witness 2 function 30/41

52 Distinguishing Feature(s) P(x) Q(y) witness 2 (v) v? v? 30/41

53 Variance of witness function Variance at v = variance of X at v + variance of Y at v. ME Statistic: ^ n (v) := n witness2 (v) variance of v. 31/41

54 Variance of witness function Variance at v = variance of X at v + variance of Y at v. ME Statistic: ^ n (v) := n witness2 (v) variance of v. 31/41

55 Variance of witness function Variance at v = variance of X at v + variance of Y at v. ME Statistic: ^ n (v) := n witness2 (v) variance of v. P(x) Q(y) witness 2 (v) 31/41

56 Variance of witness function Variance at v = variance of X at v + variance of Y at v. ME Statistic: ^ n (v) := n witness2 (v) variance of v. witness 2 (v) variance X (v) 31/41

57 Variance of witness function Variance at v = variance of X at v + variance of Y at v. ME Statistic: ^ n (v) := n witness2 (v) variance of v. witness 2 (v) variance Y (v) 31/41

58 Variance of witness function Variance at v = variance of X at v + variance of Y at v. ME Statistic: ^ n (v) := n witness2 (v) variance of v. witness 2 (v) variance of v 31/41

59 Variance of witness function Variance at v = variance of X at v + variance of Y at v. ME Statistic: ^ n (v) := n witness2 (v) variance of v. ^ n(v) Best location is v that maximizes ^ n. v Improve performance using multiple locations fv j gj j=1 31/41

60 Distinguishing Positive/Negative Emotions + : : happy neutral surprised afraid angry disgusted 35 females and 35 males (Lundqvist et al., 1998) = 1632 dimensions. Pixel features. Sample size: 402. The proposed test achieves maximum test power in time O(n). Informative features: differences at the nose, and smile lines. 32/41

61 Distinguishing Positive/Negative Emotions Random feature + : : happy neutral surprised afraid angry disgusted Power vs. - The proposed test achieves maximum test power in time O(n). Informative features: differences at the nose, and smile lines. 32/41

62 Distinguishing Positive/Negative Emotions Random feature + : : happy neutral surprised afraid angry disgusted Power Proposed + vs. - The proposed test achieves maximum test power in time O(n). Informative features: differences at the nose, and smile lines. 32/41

63 Distinguishing Positive/Negative Emotions Random feature + : : happy neutral surprised afraid angry disgusted Power Proposed MMD (quadratic time) + vs. - The proposed test achieves maximum test power in time O(n). Informative features: differences at the nose, and smile lines. 32/41

64 Distinguishing Positive/Negative Emotions + : happy neutral surprised : afraid angry disgusted Learned feature The proposed test achieves maximum test power in time O(n). Informative features: differences at the nose, and smile lines. 32/41

65 Distinguishing Positive/Negative Emotions + : happy neutral surprised : afraid angry disgusted Learned feature The proposed test achieves maximum test power in time O(n). Informative features: differences at the nose, and smile lines. Code: 32/41

66 Testing against a probabilistic model 33/41

67 Statistical model criticism MMD(P ; Q) = kf k 2 = sup kf kf 1 [E Qf E p f ] p(x) q(x) f * (x) f (x ) is the witness function Can we compute MMD with samples from Q and a model P? Problem: usualy can t compute E p f in closed form. 34/41

68 Stein idea To get rid of E p f in we define the Stein operator sup [E q f E p f ] kf k F 1 T p f x f + f (@ x log p) Then E P T P f = 0 subject to appropriate boundary conditions. (Oates, Girolami, Chopin, 2016) 35/41

69 Proof of Stein idea Consider the class G = f@x f + f (@x log p)jf 2 Fg Given g 2 G, then (integration by parts) Epg(X ) =Ep [@ x f (X ) + f (X )@ x log p(x )] Z = Z 1 = x f (x )p(x ) + f (x )@ x p(x )dx (f (x )@ x p(x ))dx = f (x )p(x ) x= 1 = 0 x=1 36/41

70 Proof of Stein idea Consider the class G = f@x f + f (@x log p)jf 2 Fg Given g 2 G, then (integration by parts) Epg(X ) =Ep [@ x f (X ) + f (X )@ x log p(x )] Z = Z 1 = x f (x )p(x ) + f (x )@ x p(x )dx (f (x )@ x p(x ))dx = f (x )p(x ) x= 1 = 0 x=1 36/41

71 Proof of Stein idea Consider the class G = f@x f + f (@x log p)jf 2 Fg Given g 2 G, then (integration by parts) Epg(X ) =Ep [@ x f (X ) + f (X )@ x log p(x )] Z = Z 1 = x f (x )p(x ) + f (x )@ x p(x )dx (f (x )@ x p(x ))dx = f (x )p(x ) x= 1 = 0 x=1 36/41

72 Proof of Stein idea Consider the class G = f@x f + f (@x log p)jf 2 Fg Given g 2 G, then (integration by parts) Epg(X ) =Ep [@ x f (X ) + f (X )@ x log p(x )] Z = Z 1 = x f (x )p(x ) + f (x )@ x p(x )dx (f (x )@ x p(x ))dx = f (x )p(x ) x= 1 = 0 x=1 36/41

73 Maximum Stein Discrepancy Stein operator T p f x f + x (log p) Maximum Stein Discrepancy (MSD) MSD(p; q; F ) = sup E q T p g E p T p g kgk F 1 37/41

74 Maximum Stein Discrepancy Stein operator T p f x f + x (log p) Maximum Stein Discrepancy (MSD) MSD(p; q; F ) = sup E q T p g E p T p g kgk F 1 37/41

75 Maximum Stein Discrepancy Stein operator T p f x f + x (log p) Maximum Stein Discrepancy (MSD) MSD(p; q; F ) = sup E q T p g E p T p g = sup E q T p g kgk F 1 kgk F 1 37/41

76 Maximum Stein Discrepancy Stein operator T p f x f + x (log p) Maximum Stein Discrepancy (MSD) MSD(p; q; F ) = sup E q T p g E p T p g = sup E q T p g kgk F 1 kgk F p(x) q(x) g * (x) /41

77 Maximum Stein Discrepancy Stein operator T p f x f + x (log p) Maximum Stein Discrepancy (MSD) MSD(p; q; F ) = sup E q T p g E p T p g = sup E q T p g kgk F 1 kgk F p(x) q(x) g * (x) /41

78 Maximum stein discrepancy Closed-form expression for MSD: given Z ; Z 0 q, then (Chwialkowski, Strathmann, G., 2016) (Liu, Lee, Jordan 2016) where MSD(p; q; F ) = E q h p (Z ; Z 0 ) and k is RKHS kernel for F h p (x ; y) x log p(x )@ x log p(y)k (x ; y) y log p(y)@ x k (x ; y) x log p(x )@ y k (x ; y) y k (x ; y) Only depends on kernel log p(x ). Do not need to normalize p, or sample from it. 38/41

79 Statistical model criticism 3 Solar activity (normalised) Test the hypothesis that a Gaussian process model, learned from data?, is a good fit for the test data (example from Lloyd and Ghahramani, 2015) Year Code: 39/41

80 Statistical model criticism V n test Bootstrapped B n Frequency V n Test the hypothesis that a Gaussian process model, learned from data?, is a good fit for the test data 40/41

81 Co-authors Students and postdocs: Kacper Chwialkowski (at Voleon) Wittawat Jitkrittum Heiko Strathmann Dougal Sutherland Collaborators Kenji Fukumizu Krikamol Muandet Bernhard Schoelkopf Alex Smola Bharath Sriperumbudur Zoltan Szabo Questions? 41/41

Learning features to compare distributions

Learning features to compare distributions Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this

More information

Learning features to compare distributions

Learning features to compare distributions Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this

More information

A Linear-Time Kernel Goodness-of-Fit Test

A Linear-Time Kernel Goodness-of-Fit Test A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum 1; Wenkai Xu 1 Zoltán Szabó 2 NIPS 2017 Best paper! Kenji Fukumizu 3 Arthur Gretton 1 wittawatj@gmail.com 1 Gatsby Unit, University College

More information

Hilbert Space Representations of Probability Distributions

Hilbert Space Representations of Probability Distributions Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck

More information

Advances in kernel exponential families

Advances in kernel exponential families Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate

More information

Hilbert Space Embedding of Probability Measures

Hilbert Space Embedding of Probability Measures Lecture 2 Hilbert Space Embedding of Probability Measures Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 2017 Recap of Lecture

More information

A Linear-Time Kernel Goodness-of-Fit Test

A Linear-Time Kernel Goodness-of-Fit Test A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum Gatsby Unit, UCL wittawatj@gmail.com Wenkai Xu Gatsby Unit, UCL wenkaix@gatsby.ucl.ac.uk Zoltán Szabó CMAP, École Polytechnique zoltan.szabo@polytechnique.edu

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute

More information

Maximum Mean Discrepancy

Maximum Mean Discrepancy Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

Kernel methods for Bayesian inference

Kernel methods for Bayesian inference Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney

More information

Adaptive HMC via the Infinite Exponential Family

Adaptive HMC via the Infinite Exponential Family Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family

More information

Examples are not Enough, Learn to Criticize! Criticism for Interpretability

Examples are not Enough, Learn to Criticize! Criticism for Interpretability Examples are not Enough, Learn to Criticize! Criticism for Interpretability Been Kim, Rajiv Khanna, Oluwasanmi Koyejo Wittawat Jitkrittum Gatsby Machine Learning Journal Club 16 Jan 2017 1/20 Summary Examples

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

Big Hypothesis Testing with Kernel Embeddings

Big Hypothesis Testing with Kernel Embeddings Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Kernel Adaptive Metropolis-Hastings

Kernel Adaptive Metropolis-Hastings Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1

More information

Topics in kernel hypothesis testing

Topics in kernel hypothesis testing Topics in kernel hypothesis testing Kacper Chwialkowski A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of University College London. Department

More information

Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches

Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Arthur Gretton 1 and László Györfi 2 1. Gatsby Computational Neuroscience Unit London, UK 2. Budapest University of Technology

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Tensor Product Kernels: Characteristic Property, Universality

Tensor Product Kernels: Characteristic Property, Universality Tensor Product Kernels: Characteristic Property, Universality Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Hangzhou International Conference on Frontiers of Data Science

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Fast Two Sample Tests using Smooth Random Features

Fast Two Sample Tests using Smooth Random Features Kacper Chwialkowski University College London, Computer Science Department Aaditya Ramdas Carnegie Mellon University, Machine Learning and Statistics School of Computer Science Dino Sejdinovic University

More information

A Kernelized Stein Discrepancy for Goodness-of-fit Tests

A Kernelized Stein Discrepancy for Goodness-of-fit Tests Qiang Liu QLIU@CS.DARTMOUTH.EDU Computer Science, Dartmouth College, NH, 3755 Jason D. Lee JASONDLEE88@EECS.BERKELEY.EDU Michael Jordan JORDAN@CS.BERKELEY.EDU Department of Electrical Engineering and Computer

More information

MMD GAN 1 Fisher GAN 2

MMD GAN 1 Fisher GAN 2 MMD GAN 1 Fisher GAN 1 Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabás Póczos (CMU, IBM Research) Youssef Mroueh, and Tom Sercu (IBM Research) Presented by Rui-Yi(Roy) Zhang Decemeber

More information

Lecture 2: Mappings of Probabilities to RKHS and Applications

Lecture 2: Mappings of Probabilities to RKHS and Applications Lecture 2: Mappings of Probabilities to RKHS and Applications Lille, 214 Arthur Gretton Gatsby Unit, CSML, UCL Detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange

More information

arxiv: v2 [stat.ml] 3 Sep 2015

arxiv: v2 [stat.ml] 3 Sep 2015 Nonparametric Independence Testing for Small Sample Sizes arxiv:46.9v [stat.ml] 3 Sep 5 Aaditya Ramdas Dept. of Statistics and Machine Learning Dept. Carnegie Mellon University aramdas@cs.cmu.edu September

More information

High-dimensional test for normality

High-dimensional test for normality High-dimensional test for normality Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Inria joint work with Alain Celisse Rennes - June 5th, 2014 Jérémie Kellner Ph.D Student University

More information

Hypothesis Testing with Kernel Embeddings on Interdependent Data

Hypothesis Testing with Kernel Embeddings on Interdependent Data Hypothesis Testing with Kernel Embeddings on Interdependent Data Dino Sejdinovic Department of Statistics University of Oxford joint work with Kacper Chwialkowski and Arthur Gretton (Gatsby Unit, UCL)

More information

arxiv: v1 [stat.ml] 27 Oct 2018

arxiv: v1 [stat.ml] 27 Oct 2018 Informative Features for Model Comparison Wittawat Jitkrittum Max Planck Institute for Intelligent Systems wittawat@tuebingen.mpg.de Heishiro Kanagawa Gatsby Unit, UCL heishirok@gatsby.ucl.ac.uk arxiv:1810.11630v1

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection

More information

A Wild Bootstrap for Degenerate Kernel Tests

A Wild Bootstrap for Degenerate Kernel Tests A Wild Bootstrap for Degenerate Kernel Tests Kacper Chwialkowski Department of Computer Science University College London London, Gower Street, WCE 6BT kacper.chwialkowski@gmail.com Dino Sejdinovic Gatsby

More information

Measuring Sample Quality with Stein s Method

Measuring Sample Quality with Stein s Method Measuring Sample Quality with Stein s Method Lester Mackey, Joint work with Jackson Gorham Microsoft Research Stanford University July 4, 2017 Mackey (MSR) Kernel Stein Discrepancy July 4, 2017 1 / 25

More information

Training generative neural networks via Maximum Mean Discrepancy optimization

Training generative neural networks via Maximum Mean Discrepancy optimization Training generative neural networks via Maximum Mean Discrepancy optimization Gintare Karolina Dziugaite University of Cambridge Daniel M. Roy University of Toronto Zoubin Ghahramani University of Cambridge

More information

Forefront of the Two Sample Problem

Forefront of the Two Sample Problem Forefront of the Two Sample Problem From classical to state-of-the-art methods Yuchi Matsuoka What is the Two Sample Problem? X ",, X % ~ P i. i. d Y ",, Y - ~ Q i. i. d Two Sample Problem P = Q? Example:

More information

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,

More information

Manny Parzen, Cherished Teacher And a Tale of Two Kernels. Grace Wahba

Manny Parzen, Cherished Teacher And a Tale of Two Kernels. Grace Wahba Manny Parzen, Cherished Teacher And a Tale of Two Kernels Grace Wahba From Emanuel Parzen and a Tale of Two Kernels Memorial Session for Manny Parzen Joint Statistical Meetings Baltimore, MD August 2,

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

Additional Practice Lessons 2.02 and 2.03

Additional Practice Lessons 2.02 and 2.03 Additional Practice Lessons 2.02 and 2.03 1. There are two numbers n that satisfy the following equations. Find both numbers. a. n(n 1) 306 b. n(n 1) 462 c. (n 1)(n) 182 2. The following function is defined

More information

Unsupervised Nonparametric Anomaly Detection: A Kernel Method

Unsupervised Nonparametric Anomaly Detection: A Kernel Method Fifty-second Annual Allerton Conference Allerton House, UIUC, Illinois, USA October - 3, 24 Unsupervised Nonparametric Anomaly Detection: A Kernel Method Shaofeng Zou Yingbin Liang H. Vincent Poor 2 Xinghua

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

INDEPENDENCE MEASURES

INDEPENDENCE MEASURES INDEPENDENCE MEASURES Beatriz Bueno Larraz Máster en Investigación e Innovación en Tecnologías de la Información y las Comunicaciones. Escuela Politécnica Superior. Máster en Matemáticas y Aplicaciones.

More information

B-tests: Low Variance Kernel Two-Sample Tests

B-tests: Low Variance Kernel Two-Sample Tests B-tests: Low Variance Kernel Two-Sample Tests Wojciech Zaremba, Arthur Gretton, Matthew Blaschko To cite this version: Wojciech Zaremba, Arthur Gretton, Matthew Blaschko. B-tests: Low Variance Kernel Two-

More information

learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015

learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015 learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015 Introduction Often, training distribution does not match testing distribution Want to utilize information about test

More information

Statistical Convergence of Kernel CCA

Statistical Convergence of Kernel CCA Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,

More information

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions

More information

Robust Support Vector Machines for Probability Distributions

Robust Support Vector Machines for Probability Distributions Robust Support Vector Machines for Probability Distributions Andreas Christmann joint work with Ingo Steinwart (Los Alamos National Lab) ICORS 2008, Antalya, Turkey, September 8-12, 2008 Andreas Christmann,

More information

1 Exponential manifold by reproducing kernel Hilbert spaces

1 Exponential manifold by reproducing kernel Hilbert spaces 1 Exponential manifold by reproducing kernel Hilbert spaces 1.1 Introduction The purpose of this paper is to propose a method of constructing exponential families of Hilbert manifold, on which estimation

More information

AUTOREGRESSIVE IMPLICIT QUANTILE NETWORKS FOR TIME SERIES GENERATION MAXIMILIEN BAUDRY *

AUTOREGRESSIVE IMPLICIT QUANTILE NETWORKS FOR TIME SERIES GENERATION MAXIMILIEN BAUDRY * AUTOREGRESSIVE IMPLICIT QUANTILE NETWORKS FOR TIME SERIES GENERATION MAXIMILIEN BAUDRY * *DAMI Chair & Université Lyon 1 - ISFA http://chaire-dami.fr/ AKNOWLEDGEMENTS 2 ACKNOWLEDGEMENTS TO Christian Y.

More information

Two-sample hypothesis testing for random dot product graphs

Two-sample hypothesis testing for random dot product graphs Two-sample hypothesis testing for random dot product graphs Minh Tang Department of Applied Mathematics and Statistics Johns Hopkins University JSM 2014 Joint work with Avanti Athreya, Vince Lyzinski,

More information

Recovering Distributions from Gaussian RKHS Embeddings

Recovering Distributions from Gaussian RKHS Embeddings Motonobu Kanagawa Graduate University for Advanced Studies kanagawa@ism.ac.jp Kenji Fukumizu Institute of Statistical Mathematics fukumizu@ism.ac.jp Abstract Recent advances of kernel methods have yielded

More information

Energy-Based Generative Adversarial Network

Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network J. Zhao, M. Mathieu and Y. LeCun Learning to Draw Samples: With Application to Amoritized MLE for Generalized Adversarial

More information

Distribution Regression

Distribution Regression Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

MMD GAN: Towards Deeper Understanding of Moment Matching Network

MMD GAN: Towards Deeper Understanding of Moment Matching Network MMD GAN: Towards Deeper Understanding of Moment Matching Network Chun-Liang Li 1, Wei-Cheng Chang 1, Yu Cheng 2 Yiming Yang 1 Barnabás Póczos 1 1 Carnegie Mellon University, 2 AI Foundations, IBM Research

More information

Non-parametric Estimation of Integral Probability Metrics

Non-parametric Estimation of Integral Probability Metrics Non-parametric Estimation of Integral Probability Metrics Bharath K. Sriperumbudur, Kenji Fukumizu 2, Arthur Gretton 3,4, Bernhard Schölkopf 4 and Gert R. G. Lanckriet Department of ECE, UC San Diego,

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Kernel Embeddings of Conditional Distributions

Kernel Embeddings of Conditional Distributions Kernel Embeddings of Conditional Distributions Le Song, Kenji Fukumizu and Arthur Gretton Georgia Institute of Technology The Institute of Statistical Mathematics University College London Abstract Many

More information

Generalized clustering via kernel embeddings

Generalized clustering via kernel embeddings Generalized clustering via kernel embeddings Stefanie Jegelka 1, Arthur Gretton 2,1, Bernhard Schölkopf 1, Bharath K. Sriperumbudur 3, and Ulrike von Luxburg 1 1 Max Planck Institute for Biological Cybernetics,

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Is the test error unbiased for these programs?

Is the test error unbiased for these programs? Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x

More information

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples

More information

Introduction to Smoothing spline ANOVA models (metamodelling)

Introduction to Smoothing spline ANOVA models (metamodelling) Introduction to Smoothing spline ANOVA models (metamodelling) M. Ratto DYNARE Summer School, Paris, June 215. Joint Research Centre www.jrc.ec.europa.eu Serving society Stimulating innovation Supporting

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

Wasserstein GAN. Juho Lee. Jan 23, 2017

Wasserstein GAN. Juho Lee. Jan 23, 2017 Wasserstein GAN Juho Lee Jan 23, 2017 Wasserstein GAN (WGAN) Arxiv submission Martin Arjovsky, Soumith Chintala, and Léon Bottou A new GAN model minimizing the Earth-Mover s distance (Wasserstein-1 distance)

More information

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee 9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic

More information

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM Wei Chu Chong Jin Ong chuwei@gatsby.ucl.ac.uk mpeongcj@nus.edu.sg S. Sathiya Keerthi mpessk@nus.edu.sg Control Division, Department

More information

arxiv: v1 [stat.ml] 14 May 2015

arxiv: v1 [stat.ml] 14 May 2015 Training generative neural networks via aximum ean Discrepancy optimization Gintare Karolina Dziugaite University of Cambridge Daniel. Roy University of Toronto Zoubin Ghahramani University of Cambridge

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 12. Gaussian Processes Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 The Normal Distribution http://www.gaussianprocess.org/gpml/chapters/

More information

Support Vector Clustering

Support Vector Clustering Support Vector Clustering Asa Ben-Hur, David Horn, Hava T. Siegelmann, Vladimir Vapnik Wittawat Jitkrittum Gatsby tea talk 3 July 25 / Overview Support Vector Clustering Asa Ben-Hur, David Horn, Hava T.

More information

Learning to Sample Using Stein Discrepancy

Learning to Sample Using Stein Discrepancy Learning to Sample Using Stein Discrepancy Dilin Wang Yihao Feng Qiang Liu Department of Computer Science Dartmouth College Hanover, NH 03755 {dilin.wang.gr, yihao.feng.gr, qiang.liu}@dartmouth.edu Abstract

More information

A Kernel Between Sets of Vectors

A Kernel Between Sets of Vectors A Kernel Between Sets of Vectors Risi Kondor Tony Jebara Columbia University, New York, USA. 1 A Kernel between Sets of Vectors In SVM, Gassian Processes, Kernel PCA, kernel K de nes feature map Φ : X

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Imitation Learning via Kernel Mean Embedding

Imitation Learning via Kernel Mean Embedding Imitation Learning via Kernel Mean Embedding Kee-Eung Kim School of Computer Science KAIST kekim@cs.kaist.ac.kr Hyun Soo Park Department of Computer Science and Engineering University of Minnesota hspark@umn.edu

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

On Integral Probability Metrics, φ-divergences and Binary Classification

On Integral Probability Metrics, φ-divergences and Binary Classification On Integral Probability etrics, φ-divergences and Binary Classification Bharath K. Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf and Gert R. G. Lanckriet arxiv:090.698v4 [cs.it] 3 Oct

More information

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Kernel methods and the exponential family

Kernel methods and the exponential family Kernel methods and the exponential family Stéphane Canu 1 and Alex J. Smola 2 1- PSI - FRE CNRS 2645 INSA de Rouen, France St Etienne du Rouvray, France Stephane.Canu@insa-rouen.fr 2- Statistical Machine

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Roots and Coefficients Polynomials Preliminary Maths Extension 1

Roots and Coefficients Polynomials Preliminary Maths Extension 1 Preliminary Maths Extension Question If, and are the roots of x 5x x 0, find the following. (d) (e) Question If p, q and r are the roots of x x x 4 0, evaluate the following. pq r pq qr rp p q q r r p

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

A Kernel Method for the Two-Sample-Problem

A Kernel Method for the Two-Sample-Problem A Kernel Method for the Two-Sample-Problem Arthur Gretton MPI for Biological Cybernetics Tübingen, Germany arthur@tuebingen.mpg.de Karsten M. Borgwardt Ludwig-Maximilians-Univ. Munich, Germany kb@dbs.ifi.lmu.de

More information

Quantifying mismatch in Bayesian optimization

Quantifying mismatch in Bayesian optimization Quantifying mismatch in Bayesian optimization Eric Schulz University College London e.schulz@cs.ucl.ac.uk Maarten Speekenbrink University College London m.speekenbrink@ucl.ac.uk José Miguel Hernández-Lobato

More information

Stein Points. Abstract. 1. Introduction. Wilson Ye Chen 1 Lester Mackey 2 Jackson Gorham 3 François-Xavier Briol Chris. J.

Stein Points. Abstract. 1. Introduction. Wilson Ye Chen 1 Lester Mackey 2 Jackson Gorham 3 François-Xavier Briol Chris. J. Wilson Ye Chen Lester Mackey 2 Jackson Gorham 3 François-Xavier Briol 4 5 6 Chris. J. Oates 7 6 Abstract An important task in computational statistics and machine learning is to approximate a posterior

More information

Metric Embedding for Kernel Classification Rules

Metric Embedding for Kernel Classification Rules Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu Dilin Wang Department of Computer Science Dartmouth College Hanover, NH 03755 {qiang.liu, dilin.wang.gr}@dartmouth.edu

More information