Hilbert Space Representations of Probability Distributions
|
|
- Prosper Gilbert
- 5 years ago
- Views:
Transcription
1 Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck Institute for Biological Cybernetics, Tübingen, Germany
2 Overview The two sample problem: are samples {x 1,..., x m } and {y 1,..., y n } generated from the same distribution? Kernel independence testing: given a sample of m pairs {(x 1, y 1 ),...,(x m, y m )}, are the random variables x and y independent?
3 Kernels, feature maps
4 A very short introduction to kernels Hilbert space of functions f F from X to R RKHS: evaluation operator δ x : x R continuous
5 A very short introduction to kernels Hilbert space of functions f F from X to R RKHS: evaluation operator δ x : x R continuous Riesz: Unique representer of evaluation k(x, ) F: f(x) = f, k(x, ) F k(x, ) feature map k : X R is kernel function
6 A very short introduction to kernels Hilbert space of functions f F from X to R RKHS: evaluation operator δ x : x R continuous Riesz: Unique representer of evaluation k(x, ) F: f(x) = f, k(x, ) F k(x, ) feature map k : X R is kernel function Inner product between two feature maps: k(x 1, ), k(x 2, ) F = k(x 1, x 2 )
7 A Kernel Method for the Two Sample Problem
8 The two-sample problem Given: m samples x := {x 1,..., x m } drawn i.i.d. from P samples y drawn from Q Determine: Are P and Q different?
9 The two-sample problem Given: m samples x := {x 1,..., x m } drawn i.i.d. from P samples y drawn from Q Determine: Are P and Q different? Applications: Microarray data aggregation Speaker/author identification Schema matching
10 The two-sample problem Given: m samples x := {x 1,..., x m } drawn i.i.d. from P samples y drawn from Q Determine: Are P and Q different? Applications: Microarray data aggregation Speaker/author identification Schema matching Where is our test useful? High dimensionality Low sample size Structured data (strings and graphs): currently the only method
11 Overview (two-sample problem) How to detect P Q? Distance between means in space of features Function revealing differences in distributions Same thing: the MMD [Gretton et al., 27, Borgwardt et al., 26]
12 Overview (two-sample problem) How to detect P Q? Distance between means in space of features Function revealing differences in distributions Same thing: the MMD [Gretton et al., 27, Borgwardt et al., 26] Hypothesis test using MMD Asymptotic distribution of MMD Large deviation bounds
13 Overview (two-sample problem) How to detect P Q? Distance between means in space of features Function revealing differences in distributions Same thing: the MMD [Gretton et al., 27, Borgwardt et al., 26] Hypothesis test using MMD Asymptotic distribution of MMD Large deviation bounds Experiments
14 Mean discrepancy (1) Simple example: 2 Gaussians with different means Answer: t-test Two Gaussians with different means P X Q X Prob. density X
15 Mean discrepancy (2) Two Gaussians with same means, different variance Idea: look at difference in means of features of the RVs In Gaussian case: second order features of form x Two Gaussians with different variances P X Q X.3 Prob. density X
16 Mean discrepancy (2) Two Gaussians with same means, different variance Idea: look at difference in means of features of the RVs In Gaussian case: second order features of form x 2 Prob. density Two Gaussians with different variances X P X Q X Prob. density Densities of feature X X 2 P X Q X
17 Mean discrepancy (3) Gaussian and Laplace distributions Same mean and same variance Difference in means using higher order features.7.6 Gaussian and Laplace densities P X Q X Prob. density X
18 Function revealing difference in distributions (1) Idea: avoid density estimation when testing P Q [Fortet and Mourier, 1953] D(P,Q; F) := sup f F [E P f(x) E Q f(y)].
19 Function revealing difference in distributions (1) Idea: avoid density estimation when testing P Q [Fortet and Mourier, 1953] D(P,Q; F) := sup f F [E P f(x) E Q f(y)]. D(P,Q; F) = iff P = Q, when F =bounded continuous functions [Dudley, 22]
20 Function revealing difference in distributions (1) Idea: avoid density estimation when testing P Q [Fortet and Mourier, 1953] D(P,Q; F) := sup f F [E P f(x) E Q f(y)]. D(P,Q; F) = iff P = Q, when F =bounded continuous functions [Dudley, 22] D(P,Q; F) = iff P = Q when F =the unit ball in a universal RKHS F [via Steinwart, 21]
21 Function revealing difference in distributions (1) Idea: avoid density estimation when testing P Q [Fortet and Mourier, 1953] D(P,Q; F) := sup f F [E P f(x) E Q f(y)]. D(P,Q; F) = iff P = Q, when F =bounded continuous functions [Dudley, 22] D(P,Q; F) = iff P = Q when F =the unit ball in a universal RKHS F [via Steinwart, 21] Examples: Gaussian, Laplace [see also Fukumizu et al., 24]
22 Function revealing difference in distributions (2) Gauss vs Laplace revisited Prob. density and f Witness f for Gauss and Laplace densities f Gauss Laplace X
23 Function revealing difference in distributions (3) The (kernel) MMD: MMD(P,Q;F) ( = sup f F [E P f(x) E Q f(y)] ) 2
24 Function revealing difference in distributions (3) The (kernel) MMD: MMD(P,Q;F) ( = sup f F [E P f(x) E Q f(y)] ) 2 using E P (f(x)) = E P [ φ(x), f F ] =: µ x, f F
25 Function revealing difference in distributions (3) The (kernel) MMD: MMD(P,Q;F) ( = = ( sup f F sup f F [E P f(x) E Q f(y)] f, µ x µ y F ) 2 ) 2 using E P (f(x)) = E P [ φ(x), f F ] =: µ x, f F
26 Function revealing difference in distributions (3) The (kernel) MMD: MMD(P,Q;F) ( = = ( sup f F sup f F [E P f(x) E Q f(y)] f, µ x µ y F ) 2 ) 2 using µ F = sup f, µ F f F = µ x µ y 2 F
27 Function revealing difference in distributions (3) The (kernel) MMD: MMD(P,Q;F) ( = = ( sup f F sup f F = µ x µ y 2 F [E P f(x) E Q f(y)] f, µ x µ y F ) 2 = µ x µ y, µ x µ y F ) 2 = E P,P k(x,x ) + E Q,Q k(y,y ) 2E P,Q k(x,y) x is a R.V. independent of x with distribution P y is a R.V. independent of y with distribution Q.
28 Function revealing difference in distributions (3) The (kernel) MMD: MMD(P,Q;F) ( = = ( sup f F sup f F = µ x µ y 2 F [E P f(x) E Q f(y)] f, µ x µ y F ) 2 = µ x µ y, µ x µ y F ) 2 = E P,P k(x,x ) + E Q,Q k(y,y ) 2E P,Q k(x,y) x is a R.V. independent of x with distribution P y is a R.V. independent of y with distribution Q. Kernel between measures [Hein and Bousquet, 25] K(P,Q) = E P,Q k(x,y)
29 Statistical test using MMD (1) Two hypotheses: H : null hypothesis (P = Q) H 1 : alternative hypothesis (P Q)
30 Statistical test using MMD (1) Two hypotheses: H : null hypothesis (P = Q) H 1 : alternative hypothesis (P Q) Observe samples x := {x 1,..., x m } from P and y from Q If empirical MMD(x,y;F) is far from zero : reject H close to zero : accept H
31 Statistical test using MMD (1) Two hypotheses: H : null hypothesis (P = Q) H 1 : alternative hypothesis (P Q) Observe samples x := {x 1,..., x m } from P and y from Q If empirical MMD(x,y;F) is far from zero : reject H close to zero : accept H How good is a test? Type I error: We reject H although it is true Type II error: We accept H although it is false
32 Statistical test using MMD (1) Two hypotheses: H : null hypothesis (P = Q) H 1 : alternative hypothesis (P Q) Observe samples x := {x 1,..., x m } from P and y from Q If empirical MMD(x,y;F) is far from zero : reject H close to zero : accept H How good is a test? Type I error: We reject H although it is true Type II error: We accept H although it is false Good test has a low type II error for user-defined Type I error
33 Statistical test using MMD (2) far from zero vs close to zero - threshold?
34 Statistical test using MMD (2) far from zero vs close to zero - threshold? One answer: asymptotic distribution of MMD(x, y; F)
35 Statistical test using MMD (2) far from zero vs close to zero - threshold? One answer: asymptotic distribution of MMD(x, y; F) An unbiased empirical estimate (quadratic cost): MMD(x,y;F) = 1 m(m 1) i j k(x i,x j ) k(x i,y j ) k(y i,x j ) + k(y i,y j ) }{{} h((x i,y i ),(x j,y j ))
36 Statistical test using MMD (2) far from zero vs close to zero - threshold? One answer: asymptotic distribution of MMD(x, y; F) An unbiased empirical estimate (quadratic cost): MMD(x,y;F) = 1 m(m 1) i j k(x i,x j ) k(x i,y j ) k(y i,x j ) + k(y i,y j ) }{{} h((x i,y i ),(x j,y j )) When P Q, asymptotically normal [Hoeffding, 1948, Serfling, 198]
37 Statistical test using MMD (2) far from zero vs close to zero - threshold? One answer: asymptotic distribution of MMD(x, y; F) An unbiased empirical estimate (quadratic cost): MMD(x,y;F) = 1 m(m 1) i j k(x i,x j ) k(x i,y j ) k(y i,x j ) + k(y i,y j ) }{{} h((x i,y i ),(x j,y j )) When P Q, asymptotically normal [Hoeffding, 1948, Serfling, 198] Expression for the variance: z i := (x i, y i ) σ 2 u = 22 m ( E z [ (Ez h(z, z )) 2] [E z,z (h(z, z ))] 2) + O(m 2 )
38 Statistical test using MMD (3) Example: laplace distributions with different variance Prob. density MMD distribution and Gaussian fit under H1 Empirical PDF Gaussian fit Prob. density Two Laplace distributions with different variances X P X Q X MMD
39 Statistical test using MMD (4) When P = Q, U-statistic degenerate: E z [h(z,z )] = [Anderson et al., 1994] Distribution is mmmd(x,y;f) [ λ l z 2 l 2 ] l=1 where z l N(,2) i.i.d k(x, X x ) ψ }{{} i (x)dp x (x) = λ i ψ i (x ) centred
40 Statistical test using MMD (4) When P = Q, U-statistic degenerate: E z [h(z,z )] = [Anderson et al., 1994] Distribution is mmmd(x,y;f) [ λ l z 2 l 2 ] l=1 where z l N(,2) i.i.d k(x, X x ) ψ }{{} i (x)dp x (x) = λ i ψ i (x ) centred Prob. density MMD density under H Empirical PDF MMD
41 Statistical test using MMD (5) Given P = Q, want threshold T such that P(MMD > T).5
42 Statistical test using MMD (5) Given P = Q, want threshold T such that P(MMD > T).5 Bootstrap for empirical CDF [Arcones and Giné, 1992] Pearson curves by matching first four moments [Johnson et al., 1994] Large deviation bounds [Hoeffding, 1963, McDiarmid, 1969] Other...
43 Statistical test using MMD (5) Given P = Q, want threshold T such that P(MMD > T).5 Bootstrap for empirical CDF [Arcones and Giné, 1992] Pearson curves by matching first four moments [Johnson et al., 1994] Large deviation bounds [Hoeffding, 1963, McDiarmid, 1969] Other... 1 CDF of the MMD and Pearson fit P(MMD < mmd) MMD Pearson mmd
44 Experiments Small sample size: Pearson more accurate than bootstrap Large sample size: bootstrap faster
45 Experiments Small sample size: Pearson more accurate than bootstrap Large sample size: bootstrap faster Cancer subtype (m = 25, 2118 dimensions): For Pearson, Type I 3.5%, Type II % For bootstrap, Type I.9%, Type II %
46 Experiments Small sample size: Pearson more accurate than bootstrap Large sample size: bootstrap faster Cancer subtype (m = 25, 2118 dimensions): For Pearson, Type I 3.5%, Type II % For bootstrap, Type I.9%, Type II % Neural spikes (m = 1, 1 dimensions): For Pearson, Type I 4.8%, Type II 3.4% For bootstrap, Type I 5.4%, Type II 3.3%
47 Experiments Small sample size: Pearson more accurate than bootstrap Large sample size: bootstrap faster Cancer subtype (m = 25, 2118 dimensions): For Pearson, Type I 3.5%, Type II % For bootstrap, Type I.9%, Type II % Neural spikes (m = 1, 1 dimensions): For Pearson, Type I 4.8%, Type II 3.4% For bootstrap, Type I 5.4%, Type II 3.3% Further experiments: comparison with t-test, Friedman-Rafsky tests [Friedman and Rafsky, 1979], Biau-Györfi test [Biau and Gyorfi, 25], and Hall-Tajvidi test [Hall and Tajvidi, 22].
48 Conclusions (two-sample problem) The MMD: distance between means in feature spaces When feature spaces universal RKHSs, MMD = iff P = Q Statistical test of whether P Q using asymptotic distribution: Pearson approximation for low sample size Bootstrap for large sample size Useful in high dimensions and for structured data
49 Dependence Detection with Kernels
50 Kernel dependence measures Independence testing Given: m samples z := {(x 1, y 1 ),...,(x m, y m )} from P Determine: Does P = P x P y?
51 Kernel dependence measures Independence testing Given: m samples z := {(x 1, y 1 ),...,(x m, y m )} from P Determine: Does P = P x P y? Kernel dependence measures Zero only at independence Take into account high order moments Make sensible assumptions about smoothness
52 Kernel dependence measures Independence testing Given: m samples z := {(x 1, y 1 ),...,(x m, y m )} from P Determine: Does P = P x P y? Kernel dependence measures Zero only at independence Take into account high order moments Make sensible assumptions about smoothness Covariance operators in spaces of features Spectral norm (COCO) [Gretton et al., 25c,d] Hilbert-Schmidt norm (HSIC) [Gretton et al., 25a]
53 Function revealing dependence (1) Idea: avoid density estimation when testing P = P x P y [Rényi, 1959] COCO(P;F,G) := sup (E x,y [f(x)g(y)] E x [f(x)]e y [g(y)]) f F,g G
54 Function revealing dependence (1) Idea: avoid density estimation when testing P = P x P y [Rényi, 1959] COCO(P;F,G) := sup (E x,y [f(x)g(y)] E x [f(x)]e y [g(y)]) f F,g G COCO(P;F, G) = iff x,y independent, when F and G are respective unit balls in universal RKHSs F and G [via Steinwart, 21] Examples: Gaussian, Laplace [see also Bach and Jordan, 22]
55 Function revealing dependence (1) Idea: avoid density estimation when testing P = P x P y [Rényi, 1959] COCO(P;F,G) := sup (E x,y [f(x)g(y)] E x [f(x)]e y [g(y)]) f F,g G COCO(P;F, G) = iff x,y independent, when F and G are respective unit balls in universal RKHSs F and G [via Steinwart, 21] Examples: Gaussian, Laplace [see also Bach and Jordan, 22] In geometric terms: Covariance operator: C xy : G F such that f,c xy g F = E x,y [f(x)g(y)] E x [f(x)]e y [g(y)]
56 Function revealing dependence (1) Idea: avoid density estimation when testing P = P x P y [Rényi, 1959] COCO(P;F,G) := sup (E x,y [f(x)g(y)] E x [f(x)]e y [g(y)]) f F,g G COCO(P;F, G) = iff x,y independent, when F and G are respective unit balls in universal RKHSs F and G [via Steinwart, 21] Examples: Gaussian, Laplace [see also Bach and Jordan, 22] In geometric terms: Covariance operator: C xy : G F such that f,c xy g F = E x,y [f(x)g(y)] E x [f(x)]e y [g(y)] COCO is the spectral norm of C xy [Gretton et al., 25c,d]: COCO(P;F, G) := C xy S
57 Function revealing dependence (2) Ring-shaped density, correlation approx. zero [example from Fukumizu, Bach, and Gretton, 25] 1.5 Correlation:. 1.5 Y X
58 Function revealing dependence (2) Ring-shaped density, correlation approx. zero [example from Fukumizu, Bach, and Gretton, 25] 1.5 Correlation:..5 Dependence witness, X Y f(x) x Dependence witness, Y X g(y) y
59 Function revealing dependence (2) Ring-shaped density, correlation approx. zero [example from Fukumizu, Bach, and Gretton, 25] Y Correlation: X f(x) g(y).5.5 Dependence witness, X x Dependence witness, Y y g(y) Correlation:.9 COCO: f(x)
60 Function revealing dependence (3) Empirical COCO(z;F, G) largest eigenvalue of 1 K L m 1 m L K c d = γ K L c d. K and L are matrices of inner products between centred observations in respective feature spaces: K = HKH where H = I 1 m 11 and k(x i, x j ) = φ(x i ), φ(x j ) F, l(y i, y j ) = ψ(y i ), ψ(y j ) G
61 Function revealing dependence (3) Empirical COCO(z;F, G) largest eigenvalue of 1 K L m 1 m L K c d = γ K L c d. K and L are matrices of inner products between centred observations in respective feature spaces: K = HKH where H = I 1 m 11 and k(x i, x j ) = φ(x i ), φ(x j ) F, l(y i, y j ) = ψ(y i ), ψ(y j ) G Witness function for x: m f(x) = i=1 c i k(x i, x) 1 m m k(x j, x) j=1
62 Function revealing dependence (4) Can we do better? A second example with zero correlation Y Correlation: X f(x) g(y).5.5 Dependence witness, X x Dependence witness, Y y g(y) Correlation:.8 COCO: f(x)
63 Function revealing dependence (4) Can we do better? A second example with zero correlation 1 Correlation: 2nd dependence witness, X 1.5 Y X f 2 (x) g 2 (y) x 2nd dependence witness, Y y
64 Function revealing dependence (4) Can we do better? A second example with zero correlation Y Correlation: f 2 (x).5.5 2nd dependence witness, X x 2nd dependence witness, Y 1 g 2 (Y) Correlation:.37 COCO 2 : X g 2 (y) y f 2 (X)
65 Hilbert-Schmidt Independence Criterion Given γ i := COCO i (z;f, G), define Hilbert-Schmidt Independence Criterion (HSIC) [Gretton et al., 25b]: HSIC(z;F, G) := m i=1 γ 2 i
66 Hilbert-Schmidt Independence Criterion Given γ i := COCO i (z;f, G), define Hilbert-Schmidt Independence Criterion (HSIC) [Gretton et al., 25b]: HSIC(z;F, G) := m i=1 γ 2 i In limit of infinite samples: HSIC(P;F, G) := C xy 2 HS = C xy, C xy HS = E x,x,y,y [k(x,x )l(y,y )] + E x,x [k(x,x )]E y,y [l(y,y )] 2E x,y [ Ex [k(x,x )]E y [l(y,y )] ] x an independent copy of x, y a copy of y
67 Link between HSIC and MMD (1) Define the product space F G with kernel Φ(x, y),φ(x, y ) = K((x, y),(x, y )) = k(x, x )l(y, y )
68 Link between HSIC and MMD (1) Define the product space F G with kernel Φ(x, y),φ(x, y ) = K((x, y),(x, y )) = k(x, x )l(y, y ) Define the mean elements µ xy,φ(x, y) := E x,y Φ(x, y ),Φ(x, y) = E x,y k(x, x )l(y, y ) and µ x y,φ(x, y) := E x,y Φ(x, y ),Φ(x, y) = E x k(x, x )E y l(y, y )
69 Link between HSIC and MMD (1) Define the product space F G with kernel Φ(x, y),φ(x, y ) = K((x, y),(x, y )) = k(x, x )l(y, y ) Define the mean elements µ xy,φ(x, y) := E x,y Φ(x, y ),Φ(x, y) = E x,y k(x, x )l(y, y ) and µ x y,φ(x, y) := E x,y Φ(x, y ),Φ(x, y) = E x k(x, x )E y l(y, y ) The MMD between these two mean elements is MMD(P,P x P y, F G) = µ xy µ x y 2 F G = µ xy µ x y, µ xy µ x y = HSIC(P, F, G)
70 Link between HSIC and MMD (2) Witness function for HSIC 1.5 Dependence witness and sample Y X.4
71 Independence test: verifying ICA and ISA HSICp: null distribution via sampling HSICg: null distribution via moment matching Compare with contingency table test (PD) [Read and Cressie, 1988] 3 Rotation θ = π/8 2 1 Y X Rotation θ = π/4 2 1 Y X
72 Independence test: verifying ICA and ISA HSICp: null distribution via sampling HSICg: null distribution via moment matching Compare with contingency table test (PD) [Read and Cressie, 1988] Y Y Rotation θ = π/8 2 2 X Rotation θ = π/4 2 2 X % acceptance of H % acceptance of H Samp:128, Dim: Angle ( π/4) PD HSICp HSICg Samp:512, Dim:2.5 1 Angle ( π/4)
73 Independence test: verifying ICA and ISA HSICp: null distribution via sampling HSICg: null distribution via moment matching Compare with contingency table test (PD) [Read and Cressie, 1988] Y Rotation θ = π/8 2 2 X % acceptance of H Samp:128, Dim:1.5 1 Angle ( π/4) PD HSICp HSICg % acceptance of H Samp:124, Dim:4.5 1 Angle ( π/4) Y Rotation θ = π/4 2 2 X % acceptance of H Samp:512, Dim:2.5 1 Angle ( π/4) % acceptance of H Samp:248, Dim:4.5 1 Angle ( π/4)
74 Other applications of HSIC Feature selection [Song et al., 27c,a] Clustering [Song et al., 27b]
75 Conclusions (dependence measures) COCO and HSIC: norms of covariance operator between feature spaces When feature spaces universal RKHSs, COCO = HSIC = iff P = P x P y Statistical test possible using asymptotic distribution Independent component analysis high accuracy less sensitive to initialisation
76 Questions?
77
78 Bibliography References N. Anderson, P. Hall, and D. Titterington. Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. Journal of Multivariate Analysis, 5:41 54, M. Arcones and E. Giné. On the bootstrap of u and v statistics. The Annals of Statistics, 2(2): , F. R. Bach and M. I. Jordan. Kernel independent component analysis. J. Mach. Learn. Res., 3:1 48, 22. G. Biau and L. Gyorfi. On the asymptotic properties of a nonparametric l 1 -test statistic of homogeneity. IEEE Transactions on Information Theory, 51(11): , 25. K. M. Borgwardt, A. Gretton, M. J. Rasch, H.-P. Kriegel, B. Schölkopf, and A. J. Smola. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics, 22(14):e49 e57, 26. R. M. Dudley. Real analysis and probability. Cambridge University Press, Cambridge, UK, 22. R. Fortet and E. Mourier. Convergence de la réparation empirique vers la réparation théorique. Ann. Scient. École Norm. Sup., 7: , J. Friedman and L. Rafsky. Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. The Annals of Statistics, 7(4): , K. Fukumizu, F. R. Bach, and M. I. Jordan. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. J. Mach. Learn. Res., 5:73 99, 24. K. Fukumizu, F. Bach, and A. Gretton. Consistency of kernel canonical correlation analysis. Technical Report
79 Hard-to-detect dependence (1) COCO can be for dependent RVs with highly non-smooth densities
80 Hard-to-detect dependence (1) COCO can be for dependent RVs with highly non-smooth densities Reason: norms in the denominator COCO(P; F, G) := sup f F, g G cov (f(x), g(y)) f F g G RESULT: not detectable with finite sample size More formally: see Ingster [1989]
81 Hard-to-detect dependence (2) 3 Smooth density 3 Rough density Y Y X 5 Samples, smooth density X 5 samples, rough density Density takes the form: P x,y 1 + sin(ωx) sin(ωy) 2 2 Y Y X X
82 Hard-to-detect dependence (3) Example: sinusoids of increasing frequency ω=1 ω=2.1 COCO (empirical average, 15 samples).9 ω=3 ω=4.8.7 COCO.6.5 ω=5 ω= Frequency of non constant density component
83 Choosing kernel size (1) The RKHS norm of f is f 2 H X := i=1 f 2 i ( ki ) 1. If kernel decays quickly, its spectrum decays slowly: then non-smooth functions have smaller RKHS norm Example: spectrum of two Gaussian kernels 1 Gaussian kernels 1 2 Gaussian kernel spectra k(x) 1 1 K(f) x kernel, σ 2 =1/2 kernel, σ 2 = f kernel, σ 2 =1/2 kernel, σ 2 =2
84 Choosing kernel size (2) Could we just decrease kernel size? Yes, but only up to a point.45 COCO (empirical average, 1 samples) COCO Log kernel size
Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationMaximum Mean Discrepancy
Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia
More informationA Kernel Method for the Two-Sample-Problem
A Kernel Method for the Two-Sample-Problem Arthur Gretton MPI for Biological Cybernetics Tübingen, Germany arthur@tuebingen.mpg.de Karsten M. Borgwardt Ludwig-Maximilians-Univ. Munich, Germany kb@dbs.ifi.lmu.de
More informationA Kernel Statistical Test of Independence
A Kernel Statistical Test of Independence Arthur Gretton MPI for Biological Cybernetics Tübingen, Germany arthur@tuebingen.mpg.de Kenji Fukumizu Inst. of Statistical Mathematics Tokyo Japan fukumizu@ism.ac.jp
More informationLearning Interpretable Features to Compare Distributions
Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections
More informationStatistical Convergence of Kernel CCA
Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,
More informationarxiv: v1 [cs.lg] 15 May 2008
Journal of Machine Learning Research (008) -0 Submitted 04/08; Published 04/08 A Kernel Method for the Two-Sample Problem arxiv:0805.368v [cs.lg] 5 May 008 Arthur Gretton MPI for Biological Cybernetics
More informationarxiv: v1 [cs.lg] 22 Jun 2009
Bayesian two-sample tests arxiv:0906.4032v1 [cs.lg] 22 Jun 2009 Karsten M. Borgwardt 1 and Zoubin Ghahramani 2 1 Max-Planck-Institutes Tübingen, 2 University of Cambridge June 22, 2009 Abstract In this
More informationUnsupervised Nonparametric Anomaly Detection: A Kernel Method
Fifty-second Annual Allerton Conference Allerton House, UIUC, Illinois, USA October - 3, 24 Unsupervised Nonparametric Anomaly Detection: A Kernel Method Shaofeng Zou Yingbin Liang H. Vincent Poor 2 Xinghua
More informationLecture 2: Mappings of Probabilities to RKHS and Applications
Lecture 2: Mappings of Probabilities to RKHS and Applications Lille, 214 Arthur Gretton Gatsby Unit, CSML, UCL Detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange
More informationBIOINFORMATICS. Integrating structured biological data by Kernel Maximum Mean Discrepancy
BIOINFORMATICS Vol. 00 no. 00 2006 Pages 1 10 Integrating structured biological data by Kernel Maximum Mean Discrepancy Karsten M. Borgwardt a, Arthur Gretton b, Malte J. Rasch, c Hans-Peter Kriegel, a
More informationKernel Measures of Conditional Dependence
Kernel Measures of Conditional Dependence Kenji Fukumizu Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 6-8569 Japan fukumizu@ism.ac.jp Arthur Gretton Max-Planck Institute for
More informationKernel methods for Bayesian inference
Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20
More informationMeasuring Statistical Dependence with Hilbert-Schmidt Norms
Max Planck Institut für biologische Kybernetik Max Planck Institute for Biological Cybernetics Technical Report No. 140 Measuring Statistical Dependence with Hilbert-Schmidt Norms Arthur Gretton, 1 Olivier
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney
More informationHilbert Space Embedding of Probability Measures
Lecture 2 Hilbert Space Embedding of Probability Measures Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 2017 Recap of Lecture
More informationNonparametric Independence Tests: Space Partitioning and Kernel Approaches
Nonparametric Independence Tests: Space Partitioning and Kernel Approaches Arthur Gretton and László Györfi 2 MPI for Biological Cybernetics, Spemannstr. 38, 7276 Tübingen, Germany, arthur@tuebingen.mpg.de
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationHilbert Schmidt Independence Criterion
Hilbert Schmidt Independence Criterion Thanks to Arthur Gretton, Le Song, Bernhard Schölkopf, Olivier Bousquet Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationHypothesis Testing with Kernel Embeddings on Interdependent Data
Hypothesis Testing with Kernel Embeddings on Interdependent Data Dino Sejdinovic Department of Statistics University of Oxford joint work with Kacper Chwialkowski and Arthur Gretton (Gatsby Unit, UCL)
More informationLearning features to compare distributions
Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute
More informationarxiv: v2 [stat.ml] 3 Sep 2015
Nonparametric Independence Testing for Small Sample Sizes arxiv:46.9v [stat.ml] 3 Sep 5 Aaditya Ramdas Dept. of Statistics and Machine Learning Dept. Carnegie Mellon University aramdas@cs.cmu.edu September
More informationA Linear-Time Kernel Goodness-of-Fit Test
A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum 1; Wenkai Xu 1 Zoltán Szabó 2 NIPS 2017 Best paper! Kenji Fukumizu 3 Arthur Gretton 1 wittawatj@gmail.com 1 Gatsby Unit, University College
More informationFeature Selection via Dependence Maximization
Journal of Machine Learning Research 1 (2007) xxx-xxx Submitted xx/07; Published xx/07 Feature Selection via Dependence Maximization Le Song lesong@it.usyd.edu.au NICTA, Statistical Machine Learning Program,
More informationRobust Support Vector Machines for Probability Distributions
Robust Support Vector Machines for Probability Distributions Andreas Christmann joint work with Ingo Steinwart (Los Alamos National Lab) ICORS 2008, Antalya, Turkey, September 8-12, 2008 Andreas Christmann,
More informationA Linear-Time Kernel Goodness-of-Fit Test
A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum Gatsby Unit, UCL wittawatj@gmail.com Wenkai Xu Gatsby Unit, UCL wenkaix@gatsby.ucl.ac.uk Zoltán Szabó CMAP, École Polytechnique zoltan.szabo@polytechnique.edu
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationSemi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis
Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analsis Matthew B. Blaschko, Christoph H. Lampert, & Arthur Gretton Ma Planck Institute for Biological Cbernetics Tübingen, German
More informationA Hilbert Space Embedding for Distributions
A Hilbert Space Embedding for Distributions Alex Smola 1, Arthur Gretton 2, Le Song 1, and Bernhard Schölkopf 2 1 National ICT Australia, North Road, Canberra 0200 ACT, Australia, alex.smola@nicta.com.au,lesong@it.usyd.edu.au
More informationStatistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Michel Besserve MPI for Intelligent Systems, Tübingen michel.besserve@tuebingen.mpg.de Nikos K. Logothetis MPI
More informationPackage dhsic. R topics documented: July 27, 2017
Package dhsic July 27, 2017 Type Package Title Independence Testing via Hilbert Schmidt Independence Criterion Version 2.0 Date 2017-07-24 Author Niklas Pfister and Jonas Peters Maintainer Niklas Pfister
More informationForefront of the Two Sample Problem
Forefront of the Two Sample Problem From classical to state-of-the-art methods Yuchi Matsuoka What is the Two Sample Problem? X ",, X % ~ P i. i. d Y ",, Y - ~ Q i. i. d Two Sample Problem P = Q? Example:
More informationINDEPENDENCE MEASURES
INDEPENDENCE MEASURES Beatriz Bueno Larraz Máster en Investigación e Innovación en Tecnologías de la Información y las Comunicaciones. Escuela Politécnica Superior. Máster en Matemáticas y Aplicaciones.
More informationKernel Methods. Bernhard Schölkopf. Max-Planck-Institut für biologische Kybernetik. B. Schölkopf, Cambridge, 2009
Kernel Methods Bernhard Schölkopf Max-Planck-Institut für biologische Kybernetik Roadmap Similarity, kernels, feature spaces Positive definite kernels and their RKHS Kernel means, representer theorem Support
More informationBrisk Kernel ICA 1. Stefanie Jegelka 2, Arthur Gretton 3. 1 Introduction
Brisk Kernel ICA 1 Stefanie Jegelka 2, Arthur Gretton 3 Abstract. Recent approaches to independent component analysis have used kernel independence measures to obtain very good performance in ICA, particularly
More informationB-tests: Low Variance Kernel Two-Sample Tests
B-tests: Low Variance Kernel Two-Sample Tests Wojciech Zaremba, Arthur Gretton, Matthew Blaschko To cite this version: Wojciech Zaremba, Arthur Gretton, Matthew Blaschko. B-tests: Low Variance Kernel Two-
More informationLearning features to compare distributions
Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this
More informationHigh-dimensional test for normality
High-dimensional test for normality Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Inria joint work with Alain Celisse Rennes - June 5th, 2014 Jérémie Kellner Ph.D Student University
More informationTwo-sample hypothesis testing for random dot product graphs
Two-sample hypothesis testing for random dot product graphs Minh Tang Department of Applied Mathematics and Statistics Johns Hopkins University JSM 2014 Joint work with Avanti Athreya, Vince Lyzinski,
More informationNon-parametric Estimation of Integral Probability Metrics
Non-parametric Estimation of Integral Probability Metrics Bharath K. Sriperumbudur, Kenji Fukumizu 2, Arthur Gretton 3,4, Bernhard Schölkopf 4 and Gert R. G. Lanckriet Department of ECE, UC San Diego,
More informationBig Hypothesis Testing with Kernel Embeddings
Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big
More informationNonparametric Indepedence Tests: Space Partitioning and Kernel Approaches
Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Arthur Gretton 1 and László Györfi 2 1. Gatsby Computational Neuroscience Unit London, UK 2. Budapest University of Technology
More informationGeneralized clustering via kernel embeddings
Generalized clustering via kernel embeddings Stefanie Jegelka 1, Arthur Gretton 2,1, Bernhard Schölkopf 1, Bharath K. Sriperumbudur 3, and Ulrike von Luxburg 1 1 Max Planck Institute for Biological Cybernetics,
More informationKernel Bayes Rule: Nonparametric Bayesian inference with kernels
Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December
More informationBIOINFORMATICS. Integrating structured biological data by Kernel Maximum Mean Discrepancy
BIOINFORMATICS Vol. 22 no. 14 2006, pages e49 e57 doi:10.1093/bioinformatics/btl242 Integrating structured biological data by Kernel Maximum Mean Discrepancy Karsten M. Borgwardt 1,, Arthur Gretton 2,
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationM-Statistic for Kernel Change-Point Detection
M-Statistic for Kernel Change-Point Detection Shuang Li, Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgian Institute of Technology sli37@gatech.edu yao.xie@isye.gatech.edu
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationLecture 4 February 2
4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have
More informationUnderstanding the relationship between Functional and Structural Connectivity of Brain Networks
Understanding the relationship between Functional and Structural Connectivity of Brain Networks Sashank J. Reddi Machine Learning Department, Carnegie Mellon University SJAKKAMR@CS.CMU.EDU Abstract Background.
More informationTensor Product Kernels: Characteristic Property, Universality
Tensor Product Kernels: Characteristic Property, Universality Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Hangzhou International Conference on Frontiers of Data Science
More informationDistribution Regression
Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481
More informationOn a Nonparametric Notion of Residual and its Applications
On a Nonparametric Notion of Residual and its Applications Bodhisattva Sen and Gábor Székely arxiv:1409.3886v1 [stat.me] 12 Sep 2014 Columbia University and National Science Foundation September 16, 2014
More informationDistribution Regression: A Simple Technique with Minimax-optimal Guarantee
Distribution Regression: A Simple Technique with Minimax-optimal Guarantee (CMAP, École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department,
More informationA Kernelized Stein Discrepancy for Goodness-of-fit Tests
Qiang Liu QLIU@CS.DARTMOUTH.EDU Computer Science, Dartmouth College, NH, 3755 Jason D. Lee JASONDLEE88@EECS.BERKELEY.EDU Michael Jordan JORDAN@CS.BERKELEY.EDU Department of Electrical Engineering and Computer
More informationA Wild Bootstrap for Degenerate Kernel Tests
A Wild Bootstrap for Degenerate Kernel Tests Kacper Chwialkowski Department of Computer Science University College London London, Gower Street, WCE 6BT kacper.chwialkowski@gmail.com Dino Sejdinovic Gatsby
More informationColored Maximum Variance Unfolding
Colored Maximum Variance Unfolding Le Song, Alex Smola, Karsten Borgwardt and Arthur Gretton National ICT Australia, Canberra, Australia University of Cambridge, Cambridge, United Kingdom MPI for Biological
More informationDistribution Regression with Minimax-Optimal Guarantee
Distribution Regression with Minimax-Optimal Guarantee (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton
More informationTopics in kernel hypothesis testing
Topics in kernel hypothesis testing Kacper Chwialkowski A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of University College London. Department
More informationLeast-Squares Independence Test
IEICE Transactions on Information and Sstems, vol.e94-d, no.6, pp.333-336,. Least-Squares Independence Test Masashi Sugiama Toko Institute of Technolog, Japan PRESTO, Japan Science and Technolog Agenc,
More informationRecovering Distributions from Gaussian RKHS Embeddings
Motonobu Kanagawa Graduate University for Advanced Studies kanagawa@ism.ac.jp Kenji Fukumizu Institute of Statistical Mathematics fukumizu@ism.ac.jp Abstract Recent advances of kernel methods have yielded
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationStochastic gradient descent and robustness to ill-conditioning
Stochastic gradient descent and robustness to ill-conditioning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,
More information1 Exponential manifold by reproducing kernel Hilbert spaces
1 Exponential manifold by reproducing kernel Hilbert spaces 1.1 Introduction The purpose of this paper is to propose a method of constructing exponential families of Hilbert manifold, on which estimation
More informationKernel-Based Nonparametric Anomaly Detection
Kernel-Based Nonparaetric Anoaly Detection Shaofeng Zou Dept of EECS Syracuse University Eail: szou@syr.edu Yingbin Liang Dept of EECS Syracuse University Eail: yliang6@syr.edu H. Vincent Poor Dept of
More informationCS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings
CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,
More informationA spectral clustering algorithm based on Gram operators
A spectral clustering algorithm based on Gram operators Ilaria Giulini De partement de Mathe matiques et Applications ENS, Paris Joint work with Olivier Catoni 1 july 2015 Clustering task of grouping
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationStatistical Optimality of Stochastic Gradient Descent through Multiple Passes
Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Loucas Pillaud-Vivien
More informationHilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems
with Applications to Dynamical Systems Le Song Jonathan Huang School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Alex Smola Yahoo! Research, Santa Clara, CA 95051, USA Kenji
More informationTest Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics
Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationAdvances in kernel exponential families
Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationSequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process
Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University
More informationHilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems
with Applications to Dynamical Systems Le Song Jonathan Huang School of Computer Science, Carnegie Mellon University Alex Smola Yahoo! Research, Santa Clara, CA, USA Kenji Fukumizu Institute of Statistical
More informationDependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise
Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Makoto Yamada and Masashi Sugiyama Department of Computer Science, Tokyo Institute of Technology
More informationStat 710: Mathematical Statistics Lecture 31
Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationMinimax-optimal distribution regression
Zoltán Szabó (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) ISNPS, Avignon June 12,
More informationStrictly Positive Definite Functions on a Real Inner Product Space
Strictly Positive Definite Functions on a Real Inner Product Space Allan Pinkus Abstract. If ft) = a kt k converges for all t IR with all coefficients a k 0, then the function f< x, y >) is positive definite
More information22 : Hilbert Space Embeddings of Distributions
10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation
More informationGeometric Analysis of Hilbert-Schmidt Independence Criterion Based ICA Contrast Function
Geometric Analysis of Hilbert-Schmidt Independence Criterion Based ICA Contrast Function NICTA Technical Report: PA006080 ISSN 1833-9646 Hao Shen 1, Stefanie Jegelka 2 and Arthur Gretton 2 1 Systems Engineering
More informationKernel-based Information Criterion
1 Kernel-based Information Criterion Somayeh Danafar, Kenji Fukumizu, Faustino Gomez IDSIA/SUPSI, Manno-Lugano, Switzerland Università della Svizzera Italiana, Lugano, Switzerland The Institue of Statistical
More informationReproducing Kernel Hilbert Spaces
9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we
More informationNonparametric Tree Graphical Models via Kernel Embeddings
Le Song, 1 Arthur Gretton, 1,2 Carlos Guestrin 1 1 School of Computer Science, Carnegie Mellon University; 2 MPI for Biological Cybernetics Abstract We introduce a nonparametric representation for graphical
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationMultiple Random Variables
Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An
More informationSpline Density Estimation and Inference with Model-Based Penalities
Spline Density Estimation and Inference with Model-Based Penalities December 7, 016 Abstract In this paper we propose model-based penalties for smoothing spline density estimation and inference. These
More informationOslo Class 2 Tikhonov regularization and kernels
RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n
More informationThe Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway
More informationUnsupervised Kernel Dimension Reduction Supplemental Material
Unsupervised Kernel Dimension Reduction Supplemental Material Meihong Wang Dept. of Computer Science U. of Southern California Los Angeles, CA meihongw@usc.edu Fei Sha Dept. of Computer Science U. of Southern
More informationA Bahadur Representation of the Linear Support Vector Machine
A Bahadur Representation of the Linear Support Vector Machine Yoonkyung Lee Department of Statistics The Ohio State University October 7, 2008 Data Mining and Statistical Learning Study Group Outline Support
More informationH 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that
Lecture 28 28.1 Kolmogorov-Smirnov test. Suppose that we have an i.i.d. sample X 1,..., X n with some unknown distribution and we would like to test the hypothesis that is equal to a particular distribution
More information