Lecture 2: Mappings of Probabilities to RKHS and Applications

Size: px
Start display at page:

Download "Lecture 2: Mappings of Probabilities to RKHS and Applications"

Transcription

1 Lecture 2: Mappings of Probabilities to RKHS and Applications Lille, 214 Arthur Gretton Gatsby Unit, CSML, UCL

2 Detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange when measured near a spike burst?.3 LFP near spike burst.3 LFP without spike burst LFP amplitude.1 LFP amplitude Time Time

3 Detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange when measured near a spike burst?

4 Detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange when measured near a spike burst?

5 Detecting statistical dependence How do you detect dependence......in a discrete domain? [Read and Cressie, 1988]

6 Detecting statistical dependence How do you detect dependence......in a discrete domain? [Read and Cressie, 1988]

7 Detecting statistical dependence How do you detect dependence......in a discrete domain? [Read and Cressie, 1988]

8 Detecting statistical dependence How do you detect dependence......in a discrete domain? [Read and Cressie, 1988]

9 Detecting statistical dependence How do you detect dependence......in a discrete domain? [Read and Cressie, 1988] X 1 : Honourable senators, I have a question for the Leader of the Government in the Senate with regard to the support funding to farmers that has been announced. Most farmers have not received any money yet. X 2 : No doubt there is great pressure on provincial and municipal governments in relation to the issue of child care, but the reality is that there have been no cuts to child care funding from the federal government to the provinces. In fact, we have increased federal investments for early childhood development.? Y 1 : Honorables sénateurs, ma question s adresse au leader du gouvernement au Sénat et concerne l aide financiére qu on a annoncée pour les agriculteurs. La plupart des agriculteurs n ont encore rien reu de cet argent. Y 2 :Il est évident que les ordres de gouvernements provinciaux et municipaux subissent de fortes pressions en ce qui concerne les services de garde, mais le gouvernement n a pas réduit le financement qu il verse aux provinces pour les services de garde. Au contraire, nous avons augmenté le financement fédéral pour le développement des jeunes enfants. Are the French text extracts translations of the English ones?

10 Detecting statistical dependence, continuous domains How do you detect dependence......in a continuous domain? 1.5 Dependent P XY Sample from P XY Y X? Independent P XY =P X P Y

11 Detecting statistical dependence, continuous domains How do you detect dependence......in a continuous domain? Discretized empirical P XY 1.5 Sample from P XY 1.5 Y.5? Discretized empirical P X P Y X

12 Detecting statistical dependence, continuous domains How do you detect dependence......in a continuous domain? Discretized empirical P XY 1.5 Sample from P XY 1.5 Y.5? Discretized empirical P X P Y X

13 Detecting statistical dependence, continuous domains How do you detect dependence......in a continuous domain? Problem: failsevenin low dimensions! [NIPS7a, ALT8] X and Y in R 4,statistic=Power divergence, samples=124, cases where dependence detected=/5 Too few points per bin

14 Detecting statistical dependence, continuous domains How do you detect dependence......in a continuous domain? Problem: failsevenin low dimensions! [NIPS7a, ALT8] X and Y in R 4,statistic=Power divergence, samples=124, cases where dependence detected=/5 Too few points per bin Can we represent and compare distributions in high dimensions?

15 Further motivating questions Compare distributions with high dimension/ lowsample size/ complex structure Microarray data (aggregation problem) Neuroscience: naturalistic stimulus, complex response Images and text on web (kernels on structured data)

16 Outline Kernel metric on the space of probability measures Function revealing differences in distributions Distance between means in space of features (RKHS) Characteristic kernels: feature space mappings of probabilities unique

17 Outline Kernel metric on the space of probability measures Function revealing differences in distributions Distance between means in space of features (RKHS) Characteristic kernels: feature space mappings of probabilities unique Dependence detection Covariance and Correlation in feature space

18 Outline Kernel metric on the space of probability measures Function revealing differences in distributions Distance between means in space of features (RKHS) Characteristic kernels: feature space mappings of probabilities unique Dependence detection Covariance and Correlation in feature space Advanced topics Testing for big data Bayesian inference without models Three way interactions Energy distance/distance covariance is special case of RKHS distances

19 Kernel distance between distributions

20 Feature mean difference Simple example: 2 Gaussians with different means Answer: t-test Two Gaussians with different means P X Q X Prob. density X

21 Feature mean difference Two Gaussians with same means, different variance Idea: look at difference in means of features of the RVs In Gaussian case: second order features of form ϕ(x) =x Two Gaussians with different variances P X Q X.3 Prob. density X

22 Feature mean difference Two Gaussians with same means, different variance Idea: look at difference in means of features of the RVs In Gaussian case: second order features of form ϕ(x) =x 2 Prob. density Two Gaussians with different variances X P X Q X Prob. density Densities of feature X X 2 P X Q X

23 Feature mean difference Gaussian and Laplace distributions Same mean and same variance Difference in means using higher order features...rkhs.7.6 Gaussian and Laplace densities P X Q X Prob. density X

24 Reminder: feature maps and the RKHS Feature map of x R 2,writtenϕ x [ ϕ (p) ] (x) = x 2 1 x 2 2 x 1 x 2 2 ϕ (g) (x) = [... ] λ i e i (x)... l 2

25 Reminder: feature maps and the RKHS Feature map of x R 2,writtenϕ x [ ϕ (p) ] (x) = x 2 1 x 2 2 x 1 x 2 2 ϕ (g) (x) = [... ] λ i e i (x)... l 2 Inner product between feature maps: ϕ (p) (x),ϕ (p) (y) F = x, y 2 ϕ (g) (x),ϕ (g) (y) ( λ F =exp x y 2)

26 Reminder: feature maps and the RKHS Feature map of x R 2,writtenϕ x [ ϕ (p) ] (x) = x 2 1 x 2 2 x 1 x 2 2 ϕ (g) (x) = [... ] λ i e i (x)... l 2 Inner product between feature maps: ϕ (p) (x),ϕ (p) (y) F = x, y 2 ϕ (g) (x),ϕ (g) (y) ( λ F =exp x y 2) In general, ϕ x1,ϕ x2 F = k(x 1,x 2 ) for positive definite k(x, y) Kernels are inner products of feature maps

27 Reminder: feature view of RKHS functions Reproducing property: f(x) = m α i k(x i,x)= i=1 m α i ϕ(x i ),ϕ(x) F = f( ),ϕ(x) F i=1 f( ) = m α i ϕ(x i ) i= f(x) x

28 The mean in feature space For finite dimensional feature spaces, we can define expectations in terms of inner products. φ(x) =k(,x)= x x 2 f( ) = a b Then f(x) = a b x x 2 = f( ),φ(x) F.

29 The mean in feature space For finite dimensional feature spaces, we can define expectations in terms of inner products. φ(x) =k(,x)= x x 2 f( ) = a b Then f(x) = a b x x 2 = f( ),φ(x) F. Consider random variable x P E P f(x) =E P a x = b x 2 a b E Px E P (x 2 ) =: a b µ P. Does this reasoning translate to infinite dimensions?

30 Does the feature space mean exist? Does there exist an element µ P F such that E P f(x) = f( ), µ P ( ) F f F

31 Does the feature space mean exist? Does there exist an element µ P F such that E P f(x) = f( ), µ P ( ) F f F Recall the concept of bounded operator: alinearoperatora : F Ris bounded when f F, Af λ A f F. Riesz representation theorem: In a Hilbert space F, all bounded linear operators A can be written,g A F, for some g A F, Af = f( ),g A ( ) F

32 Does the feature space mean exist? Existence of mean embedding: If E P k(x, x) < then µp F. Proof: The linear operator T P f := E P f(x) forallf F is bounded under the assumption, since ( ) T P f E P f(x) E P f(x) = E P f( ),φ(x) F E P k(x, x) f F. Hence by Riesz (with λ TP = E P k(x, x)), µp F such that T P f = f( ), µ P ( ) F.

33 µ P is feature map of probability Embedding of P to feature space Mean embedding µ P F µ P ( ),f( ) F = E P f(x). What does prob. feature map look like? µ P (x) = µ P ( ),ϕ(x) F = µ P ( ),k(,x) F = E P k(x,x). Expectation of kernel! Empirical estimate: ˆµ P (x) = 1 m m k(x i,x) i=1 x i P

34 µ P is feature map of probability Embedding of P to feature space Mean embedding µ P F µ P ( ),f( ) F = E P f(x). What does prob. feature map look like? µ P (x) = µ P ( ),ϕ(x) F = µ P ( ),k(,x) F = E P k(x,x). Expectation of kernel! Empirical estimate: Histogram Embedding 2 2 X ˆµ P (x) = 1 m m k(x i,x) i=1 x i P

35 Function Showing Difference in Distributions Are P and Q different? 1 Samples from P and Q

36 Function Showing Difference in Distributions Are P and Q different? 1 Samples from P and Q

37 Function Showing Difference in Distributions Maximum mean discrepancy: smooth function for P vs Q MMD(P, Q; F ):=sup f F [E P f(x) E Q f(y)]. 1 Smooth function.5 f(x) x

38 Function Showing Difference in Distributions Maximum mean discrepancy: smooth function for P vs Q MMD(P, Q; F ):=sup f F [E P f(x) E Q f(y)]. 1 Smooth function.5 f(x) x

39 Function Showing Difference in Distributions What if the function is not smooth? MMD(P, Q; F ):=sup f F [E P f(x) E Q f(y)]. 1 Bounded continuous function.5 f(x) x

40 Function Showing Difference in Distributions What if the function is not smooth? MMD(P, Q; F ):=sup f F [E P f(x) E Q f(y)]. 1 Bounded continuous function.5 f(x) x

41 Function Showing Difference in Distributions Maximum mean discrepancy: smooth function for P vs Q MMD(P, Q; F ):=sup f F Gauss P vs Laplace Q [E P f(x) E Q f(y)]. Prob. density and f Witness f for Gauss and Laplace densities f Gauss Laplace X

42 Function Showing Difference in Distributions Maximum mean discrepancy: smooth function for P vs Q MMD(P, Q; F ):=sup f F [E P f(x) E Q f(y)]. Classical results: MMD(P, Q; F )=iffp = Q, when F =bounded continuous [Dudley, 22] F = bounded variation 1 (Kolmogorov metric) [Müller, 1997] F = bounded Lipschitz (Earth mover s distances) [Dudley, 22]

43 Function Showing Difference in Distributions Maximum mean discrepancy: smooth function for P vs Q MMD(P, Q; F ):=sup f F [E P f(x) E Q f(y)]. Classical results: MMD(P, Q; F )=iffp = Q, when F =bounded continuous [Dudley, 22] F = bounded variation 1 (Kolmogorov metric) [Müller, 1997] F = bounded Lipschitz (Earth mover s distances) [Dudley, 22] MMD(P, Q; F )=iffp = Q when F =the unit ball in a characteristic RKHS F [ISMB6, NIPS6a, NIPS7b, NIPS8a, JMLR1]

44 Function Showing Difference in Distributions Maximum mean discrepancy: smooth function for P vs Q MMD(P, Q; F ):=sup f F [E P f(x) E Q f(y)]. Classical results: MMD(P, Q; F )=iffp = Q, when F =bounded continuous [Dudley, 22] F = bounded variation 1 (Kolmogorov metric) [Müller, 1997] F = bounded Lipschitz (Earth mover s distances) [Dudley, 22] MMD(P, Q; F )=iffp = Q when F =the unit ball in a characteristic RKHS F [ISMB6, NIPS6a, NIPS7b, NIPS8a, JMLR1] How do smooth functions relate to feature maps?

45 Function view vs feature mean view The (kernel) MMD: [ISMB6, NIPS6a] MMD 2 (P, Q; F ) ( = sup f F [E P f(x) E Q f(y)] ) 2 Prob. density and f Witness f for Gauss and Laplace densities f Gauss Laplace X

46 Function view vs feature mean view The (kernel) MMD: [ISMB6, NIPS6a] MMD 2 (P, Q; F ) ( = sup f F [E P f(x) E Q f(y)] ) 2 use E P (f(x)) =: µ P,f F

47 Function view vs feature mean view The (kernel) MMD: [ISMB6, NIPS6a] MMD 2 (P, Q; F ) ( = = ( sup f F sup f F [E P f(x) E Q f(y)] f,µ P µ Q F ) 2 ) 2 use E P (f(x)) =: µ P,f F

48 Function view vs feature mean view The (kernel) MMD: [ISMB6, NIPS6a] MMD 2 (P, Q; F ) ( = = ( sup f F sup f F [E P f(x) E Q f(y)] f,µ P µ Q F ) 2 ) 2 use θ F =sup f,θ F f F = µ P µ Q 2 F Function view and feature view equivalent

49 Empirical estimate of MMD An unbiased empirical estimate: for {x i } m i=1 P and {y i} m i=1 Q, MMD 2 = 1 m(m 1) m i=1 m j i [k(x i,x j )+k(y i,y j )] m i=1 m j=1 [k(y i,x j )+k(x i,y j )]

50 Empirical estimate of MMD An unbiased empirical estimate: for {x i } m i=1 P and {y i} m i=1 Q, Proof: MMD 2 = 1 m(m 1) m i=1 m j i [k(x i,x j )+k(y i,y j )] m i=1 m j=1 [k(y i,x j )+k(x i,y j )] µ P µ Q 2 F = µ P µ Q,µ P µ Q F = µ P,µ P + µ Q,µ Q 2 µ P,µ Q

51 Empirical estimate of MMD An unbiased empirical estimate: for {x i } m i=1 P and {y i} m i=1 Q, Proof: MMD 2 = 1 m(m 1) m i=1 m j i [k(x i,x j )+k(y i,y j )] m i=1 m j=1 [k(y i,x j )+k(x i,y j )] µ P µ Q 2 F = µ P µ Q,µ P µ Q F = µ P,µ P + µ Q,µ Q 2 µ P,µ Q

52 Empirical estimate of MMD An unbiased empirical estimate: for {x i } m i=1 P and {y i} m i=1 Q, Proof: MMD 2 = 1 m(m 1) m i=1 m j i [k(x i,x j )+k(y i,y j )] m i=1 m j=1 [k(y i,x j )+k(x i,y j )] µ P µ Q 2 F = µ P µ Q,µ P µ Q F = µ P,µ P + µ Q,µ Q 2 µ P,µ Q = E P ϕ(x), E P ϕ(x) +...

53 Empirical estimate of MMD An unbiased empirical estimate: for {x i } m i=1 P and {y i} m i=1 Q, Proof: MMD 2 = 1 m(m 1) m i=1 m j i [k(x i,x j )+k(y i,y j )] m i=1 m j=1 [k(y i,x j )+k(x i,y j )] µ P µ Q 2 F = µ P µ Q,µ P µ Q F = µ P,µ P + µ Q,µ Q 2 µ P,µ Q = E P ϕ(x), E P ϕ(x) +... = E P ϕ(x),ϕ(x ) +...

54 Empirical estimate of MMD An unbiased empirical estimate: for {x i } m i=1 P and {y i} m i=1 Q, Proof: MMD 2 = 1 m(m 1) m i=1 m j i [k(x i,x j )+k(y i,y j )] m i=1 m j=1 [k(y i,x j )+k(x i,y j )] µ P µ Q 2 F = µ P µ Q,µ P µ Q F = µ P,µ P + µ Q,µ Q 2 µ P,µ Q = E P ϕ(x), E P ϕ(x) +... = E P ϕ(x),ϕ(x ) +... = E P k(x, x )+E Q k(y, y ) 2E P,Q k(x, y)

55 Empirical estimate of MMD An unbiased empirical estimate: for {x i } m i=1 P and {y i} m i=1 Q, Proof: MMD 2 = 1 m(m 1) m i=1 m j i [k(x i,x j )+k(y i,y j )] m i=1 m j=1 [k(y i,x j )+k(x i,y j )] µ P µ Q 2 F = µ P µ Q,µ P µ Q F = µ P,µ P + µ Q,µ Q 2 µ P,µ Q = E P ϕ(x), E P ϕ(x) +... = E P ϕ(x),ϕ(x ) +... = E P k(x, x )+E Q k(y, y ) 2E P,Q k(x, y) Then Êk(x, x ) = 1 m(m 1) m i=1 m j i k(x i,x j )

56 MMD for independence Dependence measure: [ALT5, NIPS7a, ALT7, ALT8, JMLR1] ( supf [E PXY f E PX P Y f] ) 2 = sup f,µ PXY f 1 = µ PXY µ PX P Y 2 F G := HSIC(P XY, P X P Y ) µ PX P Y 2 F G 1.5 Dependence witness and sample Y X.4

57 MMD for independence Dependence measure: [ALT5, NIPS7a, ALT7, ALT8, JMLR1] ( supf [E PXY f E PX P Y f] ) 2 = sup f,µ PXY f 1 = µ PXY µ PX P Y 2 F G := HSIC(P XY, P X P Y ) µ PX P Y 2 F G k(, ) l(, ) κ(, )= k(, ) l(, )

58 Characteristic kernels (Version 1: Via Universality)

59 Characteristic Kernels (via universality) Characteristic: MMD a metric (MMD = iff P = Q) [NIPS7b, COLT8]

60 Characteristic Kernels (via universality) Characteristic: MMD a metric (MMD = iff P = Q) [NIPS7b, COLT8] Classical result: P = Q if and only if E P (f(x)) = E Q (f(y)) for all f C(X ), the space of bounded continuous functions on X [Dudley, 22]

61 Characteristic Kernels (via universality) Characteristic: MMD a metric (MMD = iff P = Q) [NIPS7b, COLT8] Classical result: P = Q if and only if E P (f(x)) = E Q (f(y)) for all f C(X ), the space of bounded continuous functions on X [Dudley, 22] Universal RKHS: k(x, x )continuous,x compact, and F dense in C(X )with respect to L [Steinwart, 21]

62 Characteristic Kernels (via universality) Characteristic: MMD a metric (MMD = iff P = Q) [NIPS7b, COLT8] Classical result: P = Q if and only if E P (f(x)) = E Q (f(y)) for all f C(X ), the space of bounded continuous functions on X [Dudley, 22] Universal RKHS: k(x, x )continuous,x compact, and F dense in C(X )with respect to L [Steinwart, 21] If F universal, thenmmd{p, Q; F } =iffp = Q

63 Characteristic Kernels (via universality) Proof: First, it is clear that P = Q implies MMD {P, Q; F } is zero. Converse: by the universality of F, for any given ɛ> and f C(X ) g F f g ɛ.

64 Characteristic Kernels (via universality) Proof: First, it is clear that P = Q implies MMD {P, Q; F } is zero. Converse: by the universality of F, for any given ɛ> and f C(X ) g F f g ɛ. We next make the expansion E P f(x) E Q f(y) E P f(x) E P g(x) + E P g(x) E Q g(y) + E Q g(y) E Q f(y). The first and third terms satisfy E P f(x) E P g(x) E P f(x) g(x) ɛ.

65 Characteristic Kernels (via universality) Proof (continued): Next, write E P g(x) E Q g(y) = g( ),µ P µ Q F =, since MMD {P, Q; F } =impliesµ P = µ Q.Hence E P f(x) E Q f(y) 2ɛ for all f C(X ) and ɛ>, which implies P = Q.

66 Characteristic kernels (Version 2: Via Fourier)

67 Characteristic Kernels (via Fourier) Reminder: Fourier series Function [ π, π] with periodic boundary. f(x) = ˆf l exp(ılx) = ˆf l (cos(lx)+ı sin(lx)). l= l= 1.4 Top hat.5 Fourier series coefficients f (x).6 ˆfl x l

68 Characteristic Kernels (via Fourier) Reminder: Fourier series of kernel k(x, y) =k(x y) =k(z), k(z) = l= ˆk l exp (ılz), ( ) ( E.g. Gaussian, k(x) = 1 exp x 2, ˆkl = 1 2πσ 2 2σ 2 2π exp σ 2 l 2 2 ). Gaussian.16 Fourier series coefficients k(x).3 ˆfl x l

69 Characteristic Kernels (via Fourier) Maximum mean embedding via Fourier series: Fourier series for P is characteristic function φ P Fourier series for mean embedding is product of fourier series! (convolution theorem) π µ P (x) =E P k(x x) = π k(x t)dp(t) ˆµ P,l = ˆk l φ P,l

70 Characteristic Kernels (via Fourier) Maximum mean embedding via Fourier series: Fourier series for P is characteristic function φ P Fourier series for mean embedding is product of fourier series! (convolution theorem) µ P (x) =E P k(x x) = π π k(x t)dp(t) ˆµ P,l = ˆk l φ P,l MMD can be written in terms of Fourier series: MMD(P, Q; F ):= [(φ P,l φ Q,l ) ˆk ] l exp(ılx) l= Characteristic: MMDametric (MMD = iff P = Q) [NIPS7b, COLT8, JMLR1] F

71 Example Example: P differs from Q at one frequency P (x) x Q(x) x

72 Characteristic Kernels (2) Example: P differs from Q at (roughly) one frequency P (x) F φp,l x l.15 Q(x).1 F φq,l x 1 1 l

73 Characteristic Kernels (2) Example: P differs from Q at (roughly) one frequency.2 1 Q(x) P (x) x F F φp,l φq,l l Characteristic function difference φp,l φq,l l 2 2 x 1 1 l

74 Example Is the Gaussian kernel characteristic? Gaussian.16 Fourier series coefficients k(x).3 ˆfl x l MMD(P, Q; F ):= l= [(φ P,l φ Q,l ) ˆk ] l exp(ılx) F

75 Example Is the Gaussian kernel characteristic? YES Gaussian.16 Fourier series coefficients k(x).3 ˆfl x l MMD(P, Q; F ):= l= [(φ P,l φ Q,l ) ˆk ] l exp(ılx) F

76 Example Is the triangle kernel characteristic?.3 Triangle.7 Fourier series coefficients f (x) ˆfl x l MMD(P, Q; F ):= l= [(φ P,l φ Q,l ) ˆk ] l exp(ılx) F

77 Example Is the triangle kernel characteristic? NO.3 Triangle.7 Fourier series coefficients f (x) ˆfl x l MMD(P, Q; F ):= l= [(φ P,l φ Q,l ) ˆk ] l exp(ılx) F

78 Characteristic Kernels (via Fourier) Can we prove characteristic on R d?

79 Characteristic Kernels (via Fourier) Can we prove characteristic on R d? Characteristic function of P via Fourier transform φ P (ω) = e ix ω dp(x) R d

80 Characteristic Kernels (via Fourier) Can we prove characteristic on R d? Characteristic function of P via Fourier transform φ P (ω) = e ix ω dp(x) R d Translation invariant kernels: k(x, y) = k(x y) = k(z) Bochner s theorem: k(z) = e iz ω dλ(ω) R d Λfinitenon-negativeBorelmeasure

81 Characteristic Kernels (via Fourier) Can we prove characteristic on R d? Characteristic function of P via Fourier transform φ P (ω) = e ix ω dp(x) R d Translation invariant kernels: k(x, y) = k(x y) = k(z) Bochner s theorem: k(z) = e iz ω dλ(ω) R d Λfinitenon-negativeBorelmeasure

82 Characteristic Kernels (via Fourier) Fourier representation of MMD: MMD(P, Q; F ):= [( φp (ω) φ Q (ω) ) Λ(ω) ] F φ P characteristic function of P f is Fourier transform, f is inverse Fourier transform µ P := k(,x) dp(x)

83 Example Example: P differs from Q at (roughly) one frequency P(X) X.5.4 Q(X) X

84 Example Example: P differs from Q at (roughly) one frequency.35.4 P(X) F P X Q(X) F Q X

85 Example Example: P differs from Q at (roughly) one frequency.35.4 P(X) Q(X) X F F P Q Characteristic function difference P Q X

86 Example Example: P differs from Q at (roughly) one frequency Gaussian kernel Difference φ P φ Q Frequency

87 Example Example: P differs from Q at (roughly) one frequency.2 Characteristic Frequency

88 Example Example: P differs from Q at (roughly) one frequency Sinc kernel Difference φ P φ Q Frequency

89 Example Example: P differs from Q at (roughly) one frequency.2 NOT characteristic Frequency

90 Example Example: P differs from Q at (roughly) one frequency Triangle (B-spline) kernel Difference φ P φ Q Frequency

91 Example Example: P differs from Q at (roughly) one frequency.2??? Frequency

92 Example Example: P differs from Q at (roughly) one frequency.2 Characteristic Frequency

93 Choosing the kernel Gaussian kernel example MMD vs frequency of perturbation to P Q(X).2 Q X Q(X) Q(X) X Q Q MMD(P,Q;F) X Perturbing frequency

94 Why does MMD decay with increasing peturbation freq.? Fourier series argument (notationally easier, for periodic domains only): MMD(P, Q; F ):= [(φ P,l φ Q,l ) ˆk ] l exp(ılx) l= The squared norm of a function f in F is: F f 2 F = f,f F = l= ˆf l 2 ˆk l. Squared MMD is MMD(P, Q; F )= l= [ φ P,l φ Q,l ˆk l ] 2 ˆk l = l= φ P,l φ Q,l ˆk l

95 Choosing the kernel B-spline kernel example.4.4 MMD vs frequency of perturbation to P.3.3 Q(X).2.1 Q X Q(X) Q(X) X Q Q MMD(P,Q;F) X Perturbing frequency

96 Summary: Characteristic Kernels Characteristic kernel: (MMD=iffP = Q) [NIPS7b, COLT8] Main theorem: k characteristic for prob. measures on R d if and only if supp(λ) = R d [COLT8, JMLR1]

97 Summary: Characteristic Kernels Characteristic kernel: (MMD=iffP = Q) [NIPS7b, COLT8] Main theorem: k characteristic for prob. measures on R d if and only if supp(λ) = R d [COLT8, JMLR1] Corollary: continuous, compactly supported k characteristic

98 Summary: Characteristic Kernels Characteristic kernel: (MMD=iffP = Q) [NIPS7b, COLT8] Main theorem: k characteristic for prob. measures on R d if and only if supp(λ) = R d [COLT8, JMLR1] Corollary: continuous, compactly supported k characteristic Similar reasoning wherever extensions of Bochner s theorem exist: [NIPS8a] Locally compact Abelian groups (periodic domains, as we saw) Compact, non-abelian groups (orthogonal matrices) The semigroup R + n (histograms)

99 Statistical hypothesis testing

100 Reminder: detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange when measured near a spike burst?.3 LFP near spike burst.3 LFP without spike burst LFP amplitude.1 LFP amplitude Time Time

101 Reminder: detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange when measured near a spike burst?

102 Reminder: detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange when measured near a spike burst?

103 Statistical test using MMD (1) Two hypotheses: H : null hypothesis (P = Q) H 1 : alternative hypothesis (P Q)

104 Statistical test using MMD (1) Two hypotheses: H : null hypothesis (P = Q) H 1 : alternative hypothesis (P Q) Observe samples x := {x 1,...,x n } from P and y from Q If empirical MMD(x, y; F )is far from zero : rejecth close to zero : accept H

105 Statistical test using MMD (2) far from zero vs close to zero -threshold? One answer: asymptotic distribution of MMD(x, y; F )

106 Statistical test using MMD (2) far from zero vs close to zero -threshold? One answer: asymptotic distribution of MMD(x, y; F ) An unbiased empirical estimate (quadratic cost): MMD(x, y; F )= 1 n(n 1) k(x i,x j ) k(x i,y j ) k(y i,x j )+k(y i,y j ) }{{} i j h((x i,y i ),(x j,y j ))

107 Statistical test using MMD (2) far from zero vs close to zero -threshold? One answer: asymptotic distribution of MMD(x, y; F ) An unbiased empirical estimate (quadratic cost): MMD(x, y; F )= 1 n(n 1) k(x i,x j ) k(x i,y j ) k(y i,x j )+k(y i,y j ) }{{} i j h((x i,y i ),(x j,y j )) When P Q, asymptotically normal [Hoeffding, 1948, Serfling, 198] Expression for the variance: z i := (x i,y i ) σ 2 u = 22 n ( E z [ (Ez h(z, z )) 2] [ E z,z (h(z, z )) ] 2 ) + O(n 2 )

108 Statistical test using MMD (3) Example: laplace distributions with different variance Prob. density MMD distribution and Gaussian fit under H1 Empirical PDF Gaussian fit Prob. density Two Laplace distributions with different variances X P X Q X MMD

109 Statistical test using MMD (4) When P = Q, U-statisticdegenerate:E z [h(z, z )] = [Anderson et al., 1994] Distribution is nmmd(x, y; F ) [ λ l z 2 l 2 ] l=1 where z l N(, 2) i.i.d k(x, X x ) ψ }{{} i (x)dp(x) =λ i ψ i (x ) centred

110 Statistical test using MMD (4) When P = Q, U-statisticdegenerate:E z [h(z, z )] = [Anderson et al., 1994] Distribution is nmmd(x, y; F ) [ λ l z 2 l 2 ] l=1 where z l N(, 2) i.i.d k(x, X x ) ψ }{{} i (x)dp(x) =λ i ψ i (x ) centred Prob. density MMD density under H Empirical PDF MMD

111 Statistical test using MMD (5) Given P = Q, wantthresholdt such that P(MMD > T ).5

112 Statistical test using MMD (5) Given P = Q, wantthresholdt such that P(MMD > T ).5 Permutation for empirical CDF [Arcones and Giné, 1992] Pearson curves by matching first four moments [Johnson et al., 1994] Large deviation bounds [Hoeffding, 1963, McDiarmid, 1989] Consistent test using kernel eigenspectrum [NIPS9b]

113 Statistical test using MMD (5) Given P = Q, wantthresholdt such that P(MMD > T ).5 Permutation for empirical CDF [Arcones and Giné, 1992] Pearson curves by matching first four moments [Johnson et al., 1994] Large deviation bounds [Hoeffding, 1963, McDiarmid, 1989] Consistent test using kernel eigenspectrum [NIPS9b] 1 CDF of the MMD and Pearson fit P(MMD < mmd) MMD Pearson mmd

114 Consistent test w/o bootstrap (not examinable) Maximum mean discrepancy (MMD): distance between P and Q MMD(P, Q; F ):= µ P µ Q 2 F Is MMD significantly >? P = Q, nulldistrib.of MMD: n MMD D l=1 λ l (z 2 l 2), λ l is lth eigenvalue of kernel k(x i,x j ) Type II error P Q (neuro) Spectral Permutation Sample size m Use Gram matrix spectrum for ˆλ l : consistent test without bootstrap

115 Kernel dependence measures

116 Reminder: MMD can be used as a dependence measure Dependence measure: [ALT5, NIPS7a, ALT7, ALT8, JMLR1] ( supf [E PXY f E PX P Y f] ) 2 = sup f,µ PXY f 1 = µ PXY µ PX P Y 2 F G := HSIC(P XY, P X P Y ) µ PX P Y 2 F G 1.5 Dependence witness and sample Y X.4

117 Reminder: MMD can be used as a dependence measure Dependence measure: [ALT5, NIPS7a, ALT7, ALT8, JMLR1] ( supf [E PXY f E PX P Y f] ) 2 = sup f,µ PXY f 1 = µ PXY µ PX P Y 2 F G := HSIC(P XY, P X P Y ) µ PX P Y 2 F G k(, ) l(, ) κ(, )= k(, ) l(, )

118 Distribution of HSIC at independence (Biased) empirical HSIC av-statistic HSIC b = 1 n 2 trace(khlh) Statistical testing: How do we find when this is larger enough that the null hypothesis P = P x P y is unlikely? Formally: given P = P x P y, what is the threshold T such that P(HSIC > T ) < α for small α?

119 Distribution of HSIC at independence (Biased) empirical HSIC av-statistic HSIC b = 1 n 2 trace(khlh) Associated U-statistic degenerate when P = P x P y [Serfling, 198]: D nhsic b λ l zl 2, l=1 z l N(, 1)i.i.d. λ l ψ l (z j )= h ijqr ψ l (z i )df i,q,r, h ijqr = 1 4! (i,j,q,r) k tu l tu + k tu l vw 2k tu l tv (t,u,v,w)

120 Distribution of HSIC at independence (Biased) empirical HSIC av-statistic HSIC b = 1 n 2 trace(khlh) Associated U-statistic degenerate when P = P x P y [Serfling, 198]: D nhsic b λ l zl 2, l=1 z l N(, 1)i.i.d. λ l ψ l (z j )= h ijqr ψ l (z i )df i,q,r, h ijqr = 1 4! (i,j,q,r) k tu l tu + k tu l vw 2k tu l tv (t,u,v,w) First two moments [NIPS7b] E(HSIC b )= 1 n TrC xxtrc yy var(hsic b )= 2(n 4)(n 5) (n) 4 C xx 2 HS C yy 2 HS +O(n 3 ).

121 Statistical testing with HSIC Given P = P x P y, what is the threshold T such that P(HSIC > T ) < α for small α? Null distribution via permutation [Feuerverger, 1993] Compute HSIC for {x i,y π(i) } n i=1 for random permutation π of indices {1,...,n}. This gives HSIC for independent variables. Repeat for many different permutations, get empirical CDF Threshold T is 1 α quantile of empirical CDF

122 Statistical testing with HSIC Given P = P x P y, what is the threshold T such that P(HSIC > T ) < α for small α? Null distribution via permutation [Feuerverger, 1993] Compute HSIC for {x i,y π(i) } n i=1 for random permutation π of indices {1,...,n}. This gives HSIC for independent variables. Repeat for many different permutations, get empirical CDF Threshold T is 1 α quantile of empirical CDF Approximate null distribution via moment matching [Kankainen, 1995]: nhsic b (Z) xα 1 e x/β β α Γ(α) where α = (E(HSIC b)) 2 var(hsic b ), β = var(hsic b) ne(hsic b ).

123 Experiment: dependence testing for translation (Biased) empirical HSIC: HSIC b = 1 n 2 trace(khlh) Translation example: [NIPS7b] Canadian Hansard (agriculture) 5-line extracts, k-spectrum kernel, k = 1, repetitions=3, sample size 1 K HSIC L k-spectrum kernel: averagetype II error (α =.5)

124 Experiment: dependence testing for translation (Biased) empirical HSIC: HSIC b = 1 n 2 trace(khlh) Translation example: [NIPS7b] Canadian Hansard (agriculture) 5-line extracts, k-spectrum kernel, k = 1, repetitions=3, sample size 1 K HSIC L k-spectrum kernel: averagetype II error (α =.5) Bag of words kernel: average Type II error.18

125 Advanced topics Energy distance and the MMD Two-sample testing for big data Testing three-way interactions Bayesian inference without models

126 Summary MMD adistancebetweendistributions [ISMB6, NIPS6a, JMLR1, JMLR12a] high dimensionality non-euclidean data (strings, graphs) Nonparametric hypothesis tests Measure and test independence [ALT5, NIPS7a, NIPS7b, ALT8, JMLR1, JMLR12a] Characteristic RKHS: MMDametric [NIPS7b, COLT8, NIPS8a] Easy to check: does spectrum cover R d

127 Co-authors From UCL: Luca Baldasssarre Steffen Grunewalder Guy Lever Sam Patterson Massimiliano Pontil Dino Sejdinovic External: Karsten Borgwardt, MPI Wicher Bergsma, LSE Kenji Fukumizu, ISM Zaid Harchaoui, INRIA Bernhard Schoelkopf, MPI Alex Smola, CMU/Google Le Song, Georgia Tech Bharath Sriperumbudur, Cambridge

128 Selected references Characteristic kernels and mean embeddings: Smola, A., Gretton, A., Song, L., Schoelkopf, B. (27). A hilbert space embedding for distributions. ALT. Sriperumbudur, B., Gretton, A., Fukumizu, K., Schoelkopf, B., Lanckriet, G. (21). Hilbert space embeddings and metrics on probability measures. JMLR. Gretton, A., Borgwardt, K., Rasch, M., Schoelkopf, B., Smola, A. (212). A kernel two- sample test. JMLR. Two-sample, independence, conditional independence tests: Gretton, A., Fukumizu, K., Teo, C., Song, L., Schoelkopf, B., Smola, A. (28). A kernel statistical test of independence. NIPS Fukumizu, K., Gretton, A., Sun, X., Schoelkopf, B. (28). Kernel measures of conditional dependence. Gretton, A., Fukumizu, K., Harchaoui, Z., Sriperumbudur, B. (29). A fast, consistent kernel two-sample test. NIPS. Gretton, A., Borgwardt, K., Rasch, M., Schoelkopf, B., Smola, A. (212). A kernel two- sample test. JMLR Energy distance, relation to kernel distances Sejdinovic, D., Sriperumbudur, B., Gretton, A.,, Fukumizu, K., (213). Equivalence of distance-based and rkhs-based statistics in hypothesis testing. Annals of Statistics. Three way interaction Sejdinovic, D., Gretton, A., and Bergsma, W. (213). A Kernel Test for Three-Variable Interactions. NIPS.

129 Selected references (continued) Conditional mean embedding, RKHS-valued regression: Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., and Vapnik, V., (23). Kernel Dependency Estimation, NIPS. Micchelli, C., and Pontil, M., (25). On Learning Vector-Valued Functions. Neural Computation. Caponnetto, A., and De Vito, E. (27). Optimal Rates for the Regularized Least-Squares Algorithm. Foundations of Computational Mathematics. Song, L., and Huang, J., and Smola, A., Fukumizu, K., (29). Hilbert Space Embeddings of Conditional Distributions. ICML. Grunewalder, S., Lever, G., Baldassarre, L., Patterson, S., Gretton, A., Pontil, M. (212). Conditional mean embeddings as regressors. ICML. Grunewalder, S., Gretton, A., Shawe-Taylor, J. (213). Smooth operators. ICML. Kernel Bayes rule: Song, L., Fukumizu, K., Gretton, A. (213). Kernel embeddings of conditional distributions: A unified kernel framework for nonparametric inference in graphical models. IEEE Signal Processing Magazine. Fukumizu, K., Song, L., Gretton, A. (213). Kernel Bayes rule: Bayesian inference with positive definite kernels, JMLR

130

131 References N. Anderson, P. Hall, and D. Titterington. Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. Journal of Multivariate Analysis, 5:41 54,1994. M. Arcones and E. Giné. On the bootstrap of u and v statistics. The Annals of Statistics, 2(2): ,1992. C. Baker. Joint measures and cross-covariance operators. Transactions of the American Mathematical Society, 186: ,1973. L. Baringhaus and C. Franz. On a new multivariate two-sample test. J. Multivariate Anal., 88:19 26,24. C. Berg, J. P. R. Christensen, and P. Ressel. Harmonic Analysis on Semigroups. Springer, New York, R. M. Dudley. Real analysis and probability. Cambridge University Press, Cambridge, UK, 22. Andrey Feuerverger. A consistent test for bivariate dependence. International Statistical Review, 61(3): ,1993. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13 3,1963. Wassily Hoeffding. A class of statistics with asymptotically normal distribution. The Annals of Mathematical Statistics, 19(3): ,1948. N. L. Johnson, S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions. Volume 1. John Wiley and Sons, 2nd edition, A. Kankainen. Consistent Testing of Total Independence Based on the Empirical Characteristic Function. PhD thesis, University of Jyväskylä, R. Lyons. Distance covariance in metric spaces. arxiv: , to appear in Ann. Probab., June 211. C. McDiarmid. On the method of bounded differences. In Survey in Combinatorics, pages Cambridge University Press, A. Müller. Integral probability metrics and their generating classes of functions. Advances in Applied Probability, 29(2): ,

132 T. Read and N. Cressie. Goodness-Of-Fit Statistics for Discrete Multivariate Analysis. Springer-Verlag, New York, R. Serfling. Approximation Theorems of Mathematical Statistics. Wiley, New York, 198. I. Steinwart. On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2:67 93,21. G. Székely and M. Rizzo. Testing for equal distributions in high dimension. InterStat, 5,24. G. Székely and M. Rizzo. A new test for multivariate normality. J. Multivariate Anal., 93:58 8,25. G. Székely and M. Rizzo. Brownian distance covariance. Annals of Applied Statistics, 4(3): ,29. G. Székely, M. Rizzo, and N. Bakirov. Measuring and testing dependence by correlation of distances. Ann. Stat., 35(6): ,

Kernel methods for hypothesis testing and inference

Kernel methods for hypothesis testing and inference Kernel methods for hypothesis testing and inference MLSS Tübingen, 2015 Arthur Gretton Gatsby Unit, CSML, UCL Some motivating questions... Detecting differences in brain signals The problem: Do local field

More information

Kernel methods. MLSS Arequipa, Peru, Arthur Gretton. Gatsby Unit, CSML, UCL

Kernel methods. MLSS Arequipa, Peru, Arthur Gretton. Gatsby Unit, CSML, UCL Kernel methods MLSS Arequipa, Peru, 2016 Arthur Gretton Gatsby Unit, CSML, UCL Some motivating questions... Detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange when

More information

Hilbert Space Representations of Probability Distributions

Hilbert Space Representations of Probability Distributions Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck

More information

Hypothesis Testing with Kernels

Hypothesis Testing with Kernels (Gatsby Unit, UCL) PRNI, Trento June 22, 2016 Motivation: detecting differences in AM signals Amplitude modulation: simple technique to transmit voice over radio. in the example: 2 songs. Fragments from

More information

Kernel methods for Bayesian inference

Kernel methods for Bayesian inference Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20

More information

Kernels. B.Sc. École Polytechnique September 4, 2018

Kernels. B.Sc. École Polytechnique September 4, 2018 CMAP B.Sc. Day @ École Polytechnique September 4, 2018 Inner product Ñ kernel: similarity between features Extension of kpx, yq x, y ř i x iy i : kpx, yq ϕpxq, ϕpyq H. Inner product Ñ kernel: similarity

More information

Hilbert Space Embedding of Probability Measures

Hilbert Space Embedding of Probability Measures Lecture 2 Hilbert Space Embedding of Probability Measures Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 2017 Recap of Lecture

More information

Hypothesis Testing with Kernel Embeddings on Interdependent Data

Hypothesis Testing with Kernel Embeddings on Interdependent Data Hypothesis Testing with Kernel Embeddings on Interdependent Data Dino Sejdinovic Department of Statistics University of Oxford joint work with Kacper Chwialkowski and Arthur Gretton (Gatsby Unit, UCL)

More information

Learning Interpretable Features to Compare Distributions

Learning Interpretable Features to Compare Distributions Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections

More information

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,

More information

Maximum Mean Discrepancy

Maximum Mean Discrepancy Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia

More information

A Kernel Statistical Test of Independence

A Kernel Statistical Test of Independence A Kernel Statistical Test of Independence Arthur Gretton MPI for Biological Cybernetics Tübingen, Germany arthur@tuebingen.mpg.de Kenji Fukumizu Inst. of Statistical Mathematics Tokyo Japan fukumizu@ism.ac.jp

More information

Learning features to compare distributions

Learning features to compare distributions Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

Distribution Regression with Minimax-Optimal Guarantee

Distribution Regression with Minimax-Optimal Guarantee Distribution Regression with Minimax-Optimal Guarantee (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton

More information

Big Hypothesis Testing with Kernel Embeddings

Big Hypothesis Testing with Kernel Embeddings Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big

More information

Distribution Regression: A Simple Technique with Minimax-optimal Guarantee

Distribution Regression: A Simple Technique with Minimax-optimal Guarantee Distribution Regression: A Simple Technique with Minimax-optimal Guarantee (CMAP, École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department,

More information

Distribution Regression

Distribution Regression Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481

More information

B-tests: Low Variance Kernel Two-Sample Tests

B-tests: Low Variance Kernel Two-Sample Tests B-tests: Low Variance Kernel Two-Sample Tests Wojciech Zaremba, Arthur Gretton, Matthew Blaschko To cite this version: Wojciech Zaremba, Arthur Gretton, Matthew Blaschko. B-tests: Low Variance Kernel Two-

More information

Minimax-optimal distribution regression

Minimax-optimal distribution regression Zoltán Szabó (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) ISNPS, Avignon June 12,

More information

A Kernel Method for the Two-Sample-Problem

A Kernel Method for the Two-Sample-Problem A Kernel Method for the Two-Sample-Problem Arthur Gretton MPI for Biological Cybernetics Tübingen, Germany arthur@tuebingen.mpg.de Karsten M. Borgwardt Ludwig-Maximilians-Univ. Munich, Germany kb@dbs.ifi.lmu.de

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

Recovering Distributions from Gaussian RKHS Embeddings

Recovering Distributions from Gaussian RKHS Embeddings Motonobu Kanagawa Graduate University for Advanced Studies kanagawa@ism.ac.jp Kenji Fukumizu Institute of Statistical Mathematics fukumizu@ism.ac.jp Abstract Recent advances of kernel methods have yielded

More information

On a Nonparametric Notion of Residual and its Applications

On a Nonparametric Notion of Residual and its Applications On a Nonparametric Notion of Residual and its Applications Bodhisattva Sen and Gábor Székely arxiv:1409.3886v1 [stat.me] 12 Sep 2014 Columbia University and National Science Foundation September 16, 2014

More information

Advances in kernel exponential families

Advances in kernel exponential families Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate

More information

A Linear-Time Kernel Goodness-of-Fit Test

A Linear-Time Kernel Goodness-of-Fit Test A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum 1; Wenkai Xu 1 Zoltán Szabó 2 NIPS 2017 Best paper! Kenji Fukumizu 3 Arthur Gretton 1 wittawatj@gmail.com 1 Gatsby Unit, University College

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

A Linear-Time Kernel Goodness-of-Fit Test

A Linear-Time Kernel Goodness-of-Fit Test A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum Gatsby Unit, UCL wittawatj@gmail.com Wenkai Xu Gatsby Unit, UCL wenkaix@gatsby.ucl.ac.uk Zoltán Szabó CMAP, École Polytechnique zoltan.szabo@polytechnique.edu

More information

Strictly Positive Definite Functions on a Real Inner Product Space

Strictly Positive Definite Functions on a Real Inner Product Space Strictly Positive Definite Functions on a Real Inner Product Space Allan Pinkus Abstract. If ft) = a kt k converges for all t IR with all coefficients a k 0, then the function f< x, y >) is positive definite

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

Learning features to compare distributions

Learning features to compare distributions Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this

More information

arxiv: v1 [cs.lg] 15 May 2008

arxiv: v1 [cs.lg] 15 May 2008 Journal of Machine Learning Research (008) -0 Submitted 04/08; Published 04/08 A Kernel Method for the Two-Sample Problem arxiv:0805.368v [cs.lg] 5 May 008 Arthur Gretton MPI for Biological Cybernetics

More information

Topics in kernel hypothesis testing

Topics in kernel hypothesis testing Topics in kernel hypothesis testing Kacper Chwialkowski A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of University College London. Department

More information

Two-sample hypothesis testing for random dot product graphs

Two-sample hypothesis testing for random dot product graphs Two-sample hypothesis testing for random dot product graphs Minh Tang Department of Applied Mathematics and Statistics Johns Hopkins University JSM 2014 Joint work with Avanti Athreya, Vince Lyzinski,

More information

Kernel Measures of Conditional Dependence

Kernel Measures of Conditional Dependence Kernel Measures of Conditional Dependence Kenji Fukumizu Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 6-8569 Japan fukumizu@ism.ac.jp Arthur Gretton Max-Planck Institute for

More information

Tensor Product Kernels: Characteristic Property, Universality

Tensor Product Kernels: Characteristic Property, Universality Tensor Product Kernels: Characteristic Property, Universality Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Hangzhou International Conference on Frontiers of Data Science

More information

Statistical Convergence of Kernel CCA

Statistical Convergence of Kernel CCA Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,

More information

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway

More information

Nonparametric Independence Tests: Space Partitioning and Kernel Approaches

Nonparametric Independence Tests: Space Partitioning and Kernel Approaches Nonparametric Independence Tests: Space Partitioning and Kernel Approaches Arthur Gretton and László Györfi 2 MPI for Biological Cybernetics, Spemannstr. 38, 7276 Tübingen, Germany, arthur@tuebingen.mpg.de

More information

arxiv: v1 [cs.lg] 22 Jun 2009

arxiv: v1 [cs.lg] 22 Jun 2009 Bayesian two-sample tests arxiv:0906.4032v1 [cs.lg] 22 Jun 2009 Karsten M. Borgwardt 1 and Zoubin Ghahramani 2 1 Max-Planck-Institutes Tübingen, 2 University of Cambridge June 22, 2009 Abstract In this

More information

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee 9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic

More information

INDEPENDENCE MEASURES

INDEPENDENCE MEASURES INDEPENDENCE MEASURES Beatriz Bueno Larraz Máster en Investigación e Innovación en Tecnologías de la Información y las Comunicaciones. Escuela Politécnica Superior. Máster en Matemáticas y Aplicaciones.

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Introduction to RKHS, and some simple kernel algorithms

Introduction to RKHS, and some simple kernel algorithms Introduction to RKHS, and some simple kernel algorithms Arthur Gretton October 25, 217 1 Outline In this document, we give a nontechical introduction to reproducing kernel Hilbert spaces (RKHSs), and describe

More information

Unsupervised Nonparametric Anomaly Detection: A Kernel Method

Unsupervised Nonparametric Anomaly Detection: A Kernel Method Fifty-second Annual Allerton Conference Allerton House, UIUC, Illinois, USA October - 3, 24 Unsupervised Nonparametric Anomaly Detection: A Kernel Method Shaofeng Zou Yingbin Liang H. Vincent Poor 2 Xinghua

More information

Fast Two Sample Tests using Smooth Random Features

Fast Two Sample Tests using Smooth Random Features Kacper Chwialkowski University College London, Computer Science Department Aaditya Ramdas Carnegie Mellon University, Machine Learning and Statistics School of Computer Science Dino Sejdinovic University

More information

arxiv: v2 [stat.ml] 3 Sep 2015

arxiv: v2 [stat.ml] 3 Sep 2015 Nonparametric Independence Testing for Small Sample Sizes arxiv:46.9v [stat.ml] 3 Sep 5 Aaditya Ramdas Dept. of Statistics and Machine Learning Dept. Carnegie Mellon University aramdas@cs.cmu.edu September

More information

A Kernel Test for Three-Variable Interactions

A Kernel Test for Three-Variable Interactions A Kernel Test for Three-Variable Interactions Dino Sejdinovic, Arthur Gretton Gatsby Unit, CSML, UCL, UK {dino.sejdinovic, arthur.gretton}@gmail.com Wicher Bergsma Department of Statistics, LSE, UK w.p.bergsma@lse.ac.uk

More information

Forefront of the Two Sample Problem

Forefront of the Two Sample Problem Forefront of the Two Sample Problem From classical to state-of-the-art methods Yuchi Matsuoka What is the Two Sample Problem? X ",, X % ~ P i. i. d Y ",, Y - ~ Q i. i. d Two Sample Problem P = Q? Example:

More information

arxiv: v1 [math.st] 24 Aug 2017

arxiv: v1 [math.st] 24 Aug 2017 Submitted to the Annals of Statistics MULTIVARIATE DEPENDENCY MEASURE BASED ON COPULA AND GAUSSIAN KERNEL arxiv:1708.07485v1 math.st] 24 Aug 2017 By AngshumanRoy and Alok Goswami and C. A. Murthy We propose

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Adaptive HMC via the Infinite Exponential Family

Adaptive HMC via the Infinite Exponential Family Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family

More information

A Wild Bootstrap for Degenerate Kernel Tests

A Wild Bootstrap for Degenerate Kernel Tests A Wild Bootstrap for Degenerate Kernel Tests Kacper Chwialkowski Department of Computer Science University College London London, Gower Street, WCE 6BT kacper.chwialkowski@gmail.com Dino Sejdinovic Gatsby

More information

Generalized clustering via kernel embeddings

Generalized clustering via kernel embeddings Generalized clustering via kernel embeddings Stefanie Jegelka 1, Arthur Gretton 2,1, Bernhard Schölkopf 1, Bharath K. Sriperumbudur 3, and Ulrike von Luxburg 1 1 Max Planck Institute for Biological Cybernetics,

More information

Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

A Kernelized Stein Discrepancy for Goodness-of-fit Tests

A Kernelized Stein Discrepancy for Goodness-of-fit Tests Qiang Liu QLIU@CS.DARTMOUTH.EDU Computer Science, Dartmouth College, NH, 3755 Jason D. Lee JASONDLEE88@EECS.BERKELEY.EDU Michael Jordan JORDAN@CS.BERKELEY.EDU Department of Electrical Engineering and Computer

More information

High-dimensional test for normality

High-dimensional test for normality High-dimensional test for normality Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Inria joint work with Alain Celisse Rennes - June 5th, 2014 Jérémie Kellner Ph.D Student University

More information

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles Weihua Zhou 1 University of North Carolina at Charlotte and Robert Serfling 2 University of Texas at Dallas Final revision for

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Hilbert Schmidt Independence Criterion

Hilbert Schmidt Independence Criterion Hilbert Schmidt Independence Criterion Thanks to Arthur Gretton, Le Song, Bernhard Schölkopf, Olivier Bousquet Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au

More information

Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.

Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Michel Besserve MPI for Intelligent Systems, Tübingen michel.besserve@tuebingen.mpg.de Nikos K. Logothetis MPI

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Strictly positive definite functions on a real inner product space

Strictly positive definite functions on a real inner product space Advances in Computational Mathematics 20: 263 271, 2004. 2004 Kluwer Academic Publishers. Printed in the Netherlands. Strictly positive definite functions on a real inner product space Allan Pinkus Department

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information

A spectral clustering algorithm based on Gram operators

A spectral clustering algorithm based on Gram operators A spectral clustering algorithm based on Gram operators Ilaria Giulini De partement de Mathe matiques et Applications ENS, Paris Joint work with Olivier Catoni 1 july 2015 Clustering task of grouping

More information

Kernel-based Dependency Measures & Hypothesis Testing

Kernel-based Dependency Measures & Hypothesis Testing Kernel-based Dependency Measures and Hypothesis Testing CMAP, École Polytechnique Carnegie Mellon University November 27, 2017 Outline Motivation. Diverse set of domains: kernel. Dependency measures: Kernel

More information

Kernel Adaptive Metropolis-Hastings

Kernel Adaptive Metropolis-Hastings Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches

Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Arthur Gretton 1 and László Györfi 2 1. Gatsby Computational Neuroscience Unit London, UK 2. Budapest University of Technology

More information

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels EECS 598: Statistical Learning Theory, Winter 2014 Topic 11 Kernels Lecturer: Clayton Scott Scribe: Jun Guo, Soumik Chatterjee Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators

More information

On the Noise Model of Support Vector Machine Regression. Massimiliano Pontil, Sayan Mukherjee, Federico Girosi

On the Noise Model of Support Vector Machine Regression. Massimiliano Pontil, Sayan Mukherjee, Federico Girosi MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1651 October 1998

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information

Efficient Complex Output Prediction

Efficient Complex Output Prediction Efficient Complex Output Prediction Florence d Alché-Buc Joint work with Romain Brault, Alex Lambert, Maxime Sangnier October 12, 2017 LTCI, Télécom ParisTech, Institut-Mines Télécom, Université Paris-Saclay

More information

Statistical Optimality of Stochastic Gradient Descent through Multiple Passes

Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Loucas Pillaud-Vivien

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

Spline Density Estimation and Inference with Model-Based Penalities

Spline Density Estimation and Inference with Model-Based Penalities Spline Density Estimation and Inference with Model-Based Penalities December 7, 016 Abstract In this paper we propose model-based penalties for smoothing spline density estimation and inference. These

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Non-parametric Estimation of Integral Probability Metrics

Non-parametric Estimation of Integral Probability Metrics Non-parametric Estimation of Integral Probability Metrics Bharath K. Sriperumbudur, Kenji Fukumizu 2, Arthur Gretton 3,4, Bernhard Schölkopf 4 and Gert R. G. Lanckriet Department of ECE, UC San Diego,

More information

Understanding the relationship between Functional and Structural Connectivity of Brain Networks

Understanding the relationship between Functional and Structural Connectivity of Brain Networks Understanding the relationship between Functional and Structural Connectivity of Brain Networks Sashank J. Reddi Machine Learning Department, Carnegie Mellon University SJAKKAMR@CS.CMU.EDU Abstract Background.

More information

Robust Support Vector Machines for Probability Distributions

Robust Support Vector Machines for Probability Distributions Robust Support Vector Machines for Probability Distributions Andreas Christmann joint work with Ingo Steinwart (Los Alamos National Lab) ICORS 2008, Antalya, Turkey, September 8-12, 2008 Andreas Christmann,

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

On Testing Independence and Goodness-of-fit in Linear Models

On Testing Independence and Goodness-of-fit in Linear Models On Testing Independence and Goodness-of-fit in Linear Models arxiv:1302.5831v2 [stat.me] 5 May 2014 Arnab Sen and Bodhisattva Sen University of Minnesota and Columbia University Abstract We consider a

More information

22 : Hilbert Space Embeddings of Distributions

22 : Hilbert Space Embeddings of Distributions 10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation

More information

Gaussian random variables inr n

Gaussian random variables inr n Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

On non-parametric robust quantile regression by support vector machines

On non-parametric robust quantile regression by support vector machines On non-parametric robust quantile regression by support vector machines Andreas Christmann joint work with: Ingo Steinwart (Los Alamos National Lab) Arnout Van Messem (Vrije Universiteit Brussel) ERCIM

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Mathematical Methods for Data Analysis

Mathematical Methods for Data Analysis Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information