Tensor Product Kernels: Characteristic Property, Universality

Size: px
Start display at page:

Download "Tensor Product Kernels: Characteristic Property, Universality"

Transcription

1 Tensor Product Kernels: Characteristic Property, Universality Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Hangzhou International Conference on Frontiers of Data Science May 19, 2018 Zolta n Szabo

2 Motivation: Classical Information Theory Kullback-Leibler divergence: ż j ppxq KL pp, Qq ppxq log dx. R d qpxq

3 Motivation: Classical Information Theory Kullback-Leibler divergence: ż j ppxq KL pp, Qq ppxq log dx. R d qpxq Mutual information: I ppq KL P, b M m 1P m.

4 Motivation: Classical Information Theory Kullback-Leibler divergence: ż j ppxq KL pp, Qq ppxq log dx. R d qpxq Mutual information: Properties: 1 I ppq ě 0. I ppq KL P, b M m 1P m.

5 Motivation: Classical Information Theory Kullback-Leibler divergence: ż j ppxq KL pp, Qq ppxq log dx. R d qpxq Mutual information: Properties: I ppq KL 1 I ppq ě 0. 2 I ppq 0 ô P b M m 1 P m. P, b M m 1P m.

6 Motivation: Classical Information Theory Kullback-Leibler divergence: ż j ppxq KL pp, Qq ppxq log dx. R d qpxq Mutual information: Properties: I ppq KL 1 I ppq ě 0. 2 I ppq 0 ô P b M m 1 P m. P, b M m 1P m. Alternatives: Rényi, Tsallis, L 2 divergence... Typically: X R d.

7 From Rd to Diverse Set of Domains, Kernel Examples X Rd, γ ą 0: kpx, y q hϕpxq, ϕpy qih kp px, yq phx, yi ` γqp, ke px, yq e γ}x y}2, Zolta n Szabo 2 kg px, yq e γ}x y}2, 1 kc px, yq 1 `. γ }x y}22

8 From Rd to Diverse Set of Domains, Kernel Examples X Rd, γ ą 0: kpx, y q hϕpxq, ϕpy qih kp px, yq phx, yi ` γqp, ke px, yq e γ}x y}2, 2 kg px, yq e γ}x y}2, 1 kc px, yq 1 `. γ }x y}22 X = strings, texts: r -spectrum kernel: # of common ď r -substrings. Zolta n Szabo

9 From Rd to Diverse Set of Domains, Kernel Examples X Rd, γ ą 0: kpx, y q hϕpxq, ϕpy qih kp px, yq phx, yi ` γqp, ke px, yq e γ}x y}2, 2 kg px, yq e γ}x y}2, 1 kc px, yq 1 `. γ }x y}22 X = strings, texts: r -spectrum kernel: # of common ď r -substrings. X = time-series: dynamic time-warping. Zolta n Szabo

10 From Rd to Diverse Set of Domains, Kernel Examples X Rd, γ ą 0: kpx, y q hϕpxq, ϕpy qih kp px, yq phx, yi ` γqp, ke px, yq e γ}x y}2, 2 kg px, yq e γ}x y}2, 1 kc px, yq 1 `. γ }x y}22 X = strings, texts: r -spectrum kernel: # of common ď r -substrings. X = time-series: dynamic time-warping. X = graphs, sets, permutations,... Zolta n Szabo

11 Distribution Representation P ÞÑ µ P ş X ϕpxqdppxq.

12 Distribution Representation Cdf: P ÞÑ µ P ş X ϕpxqdppxq. P ÞÑ F P pzq E x P χ p 8,zq pxq.

13 Distribution Representation Cdf: P ÞÑ µ P ş X ϕpxqdppxq. P ÞÑ F P pzq E x P χ p 8,zq pxq. Characteristic function: ż P ÞÑ c P pzq e i z,x dppxq.

14 Distribution Representation Cdf: P ÞÑ µ P ş X ϕpxqdppxq. P ÞÑ F P pzq E x P χ p 8,zq pxq. Characteristic function: ż P ÞÑ c P pzq Moment generating function: ż P ÞÑ M P pzq e i z,x dppxq. e z,x dp pxq.

15 Distribution Representation Cdf: P ÞÑ µ P ş X ϕpxqdppxq. P ÞÑ F P pzq E x P χ p 8,zq pxq. Characteristic function: ż P ÞÑ c P pzq Moment generating function: ż P ÞÑ M P pzq e i z,x dppxq. e z,x dp pxq. Trick ϕ: on any kernel-endowed domain!

16 Objects of Interest KL divergence & mutual information on kernel-endowed domains. Mean embedding: ż µppq : X ϕpxq dppxq

17 Objects of Interest KL divergence & mutual information on kernel-endowed domains. Mean embedding: ż µ k ppq : Maximum mean discrepancy: X ϕpxq lomon kp,xq dppxq P H k. MMD k pp, Qq : }µ k ppq µ k pqq} Hk.

18 Objects of Interest KL divergence & mutual information on kernel-endowed domains. Mean embedding: ż µ k ppq : Maximum mean discrepancy: X ϕpxq lomon kp,xq dppxq P H k. MMD k pp, Qq : }µ k ppq µ k pqq} Hk. Hilbert-Schmidt independence criterion, k b M m 1 k m: HSIC k ppq : MMD k P, b M m 1P m.

19 RKHS intuition (X : X m, k : k m ) Given: X set. H(ilbert space). Kernel: kpa, bq xϕpaq, ϕpbqy H.

20 RKHS intuition (X : X m, k : k m ) Given: X set. H(ilbert space). Kernel: Reproducing kernel of a H Ă R X : kpa, bq xϕpaq, ϕpbqy H. lomon kp, bq P H, f looooooooooomooooooooooon pbq xf, kp, bqy H. reproducing property

21 RKHS intuition (X : X m, k : k m ) Given: X set. H(ilbert space). Kernel: Reproducing kernel of a H Ă R X : kpa, bq xϕpaq, ϕpbqy H. lomon kp, bq P H, f looooooooooomooooooooooon pbq xf, kp, bqy H. reproducing property spec. ÝÝÝÑ kpa, bq xkp, aq, kp, bqy H.

22 RKHS intuition (X : X m, k : k m ) Given: X set. H(ilbert space). Kernel: Reproducing kernel of a H Ă R X : kpa, bq xϕpaq, ϕpbqy H. lomon kp, bq P H, f looooooooooomooooooooooon pbq xf, kp, bqy H. reproducing property spec. ÝÝÝÑ kpa, bq xkp, aq, kp, bqy H. H k t ř n i 1 α ikp, x i qu.

23 Mean Embedding, MMD: Applications & Review Applications: two-sample testing [Borgwardt et al., 2006, Gretton et al., 2012], domain adaptation [Zhang et al., 2013], -generalization [Blanchard et al., 2017], kernel Bayesian inference [Song et al., 2011, Fukumizu et al., 2013] approximate Bayesian computation [Park et al., 2016], probabilistic programming [Schölkopf et al., 2015], model criticism [Lloyd et al., 2014, Kim et al., 2016], goodness-of-fit [Balasubramanian et al., 2017], distribution classification [Muandet et al., 2011, Lopez-Paz et al., 2015], [Zaheer et al., 2017], distribution regression [Szabó et al., 2016], [Law et al., 2018], topological data analysis [Kusano et al., 2016]. Review [Muandet et al., 2017]. Switching to HSIC...

24 MMD spec. ÝÝÑ HSIC MMD with k b M m 1 k m: k `x, x 1 : Mź k m `xm, xm 1, m 1 HSIC k ppq : MMD k P, b M m 1P m.

25 MMD spec. ÝÝÑ HSIC MMD with k b M m 1 k m: Applications : k `x, x 1 : Mź k m `xm, xm 1, m 1 HSIC k ppq : MMD k P, b M m 1P m. blind source separation [Gretton et al., 2005], feature selection [Song et al., 2012], post selection inference [Yamada et al., 2018], independence testing [Gretton et al., 2008], causal inference [Mooij et al., 2016, Pfister et al., 2017, Strobl et al., 2017].

26 Central in Applications: Characteristic Property MMD: k is called characteristic [Fukumizu et al., 2008] if MMD k pp, Qq 0 ô P Q. Injectivitity of P ÞÑ µ P on finite signed measures: universality [Steinwart, 2001].

27 Central in Applications: Characteristic Property MMD: k is called characteristic [Fukumizu et al., 2008] if MMD k pp, Qq 0 ô P Q. Injectivitity of P ÞÑ µ P on finite signed measures: universality [Steinwart, 2001]. HSIC: k b M m 1 k m will be called I-characteristic if HSIC k ppq 0 ô P b M m 1P m.

28 Central in Applications: Characteristic Property MMD: k is called characteristic [Fukumizu et al., 2008] if MMD k pp, Qq 0 ô P Q. Injectivitity of P ÞÑ µ P on finite signed measures: universality [Steinwart, 2001]. HSIC: k b M m 1 k m will be called I-characteristic if HSIC k ppq 0 ô P b M m 1P m. b M m 1 k m: universal ñ characteristic ñ I-characteristic.

29 Central in Applications: Characteristic Property Wanted MMD: k is called characteristic [Fukumizu et al., 2008] if MMD k pp, Qq 0 ô P Q. Injectivitity of P ÞÑ µ P on finite signed measures: universality [Steinwart, 2001]. HSIC: k b M m 1 k m will be called I-characteristic if HSIC k ppq 0 ô P b M m 1P m. b M m 1 k m: universal ñ characteristic ñ I-characteristic. Characteristic properties of b M m 1 k m in terms of k m -s?

30 Well-known: shift-invariant kernels on R d Theorem ([Sriperumbudur et al., 2010]) k is characteristic iff. supppλq R d, where ż kpx, x 1 q e i x x1,ω dλpωq. R d

31 Well-known: shift-invariant kernels on R d Theorem ([Sriperumbudur et al., 2010]) k is characteristic iff. supppλq R d, where ż kpx, x 1 q k 0 px x 1 q e i x x1,ω dλpωq. R d Example on R: kernel name k 0 p k0 pωq supp` p k0 Gaussian e x2 2σ 2 σe σ2 ω 2 2 R b Laplacian e σ x 2 π σ a 2`ω2 Sinc π 2 χ r σ,σspωq sinpσxq x σ R r σ, σs

32 Well-known: M 2 [Blanchard et al., 2011, Gretton, 2015]: k 1 &k 2 : universal ñ k 1 b k 2 : universal (ñ I-characteristic).

33 Well-known: M 2 [Blanchard et al., 2011, Gretton, 2015]: k 1 &k 2 : universal ñ k 1 b k 2 : universal (ñ I-characteristic). Distance covariance [Lyons, 2013, Sejdinovic et al., 2013]: k 1 &k 2 : characteristic ô k 1 b k 2 : I-characteristic.

34 Well-known: M 2 [Blanchard et al., 2011, Gretton, 2015]: k 1 &k 2 : universal ñ k 1 b k 2 : universal (ñ I-characteristic). Distance covariance [Lyons, 2013, Sejdinovic et al., 2013]: k 1 &k 2 : characteristic ô k 1 b k 2 : I-characteristic. Goal Extension to M ě 2.

35 k 1, k 2, k 3 : characteristic œ b 3 m 1 k m: I-characteristic Example X m t1, 2u, τ Xm Ppt1, 2uq, k m px, x 1 q 2δ x,x 1 1, M 3. Then pk m q 3 m 1 : characteristic. b 3 m 1 k m: is not I-characteristic. Witness: p 1,1,1 1 5, p 1,1,2 1 10, p 1,2,1 1 10, p 1,2,2 1 10, p 2,1,1 1 5, p 2,1,2 1 10, p 2,2,1 1 10, p 2,2,

36 Non-I-characteristicity: Analytical Solution Parameter: z pz 0, z 1,..., z 5 q P r0, 1s 6.

37 Non-I-characteristicity: Analytical Solution Parameter: z pz 0, z 1,..., z 5 q P r0, 1s 6. Example: p 1,1,1 z 2 ` z 1 ` z 4 ` z 5 3z 2 z 1 4z 2 z 4 4z 1 z 4 z 2 z 3 2z 2 z 0 2z 1 z 3 3z 2 z 5 2z 4 z 3 z 1 z 0 3z 1 z 5 2z 4 z 0 4z 4 z 5 z 3 z 0 z 3 z 5 z 0 z 5 ` 2z 2 z1 2 ` 2z2 2 z 1 ` 4z 2 z4 2 ` 2z2 2 z 4 ` 4z 1 z4 2 ` 2z1 2 z 4 ` 2z2 2 z 0 ` 2z1 2 z 3 ` 2z 2 z5 2 ` 2z2 2 z 5 ` 2z4 2 z 3 ` 2z 1 z5 2 ` 2z1 2 z 5 ` 2z4 2 z 0 ` 2z 4 z5 2 ` 4z4 2 z 5 z2 2 z1 2 3z4 2 ` 2z4 3 z5 2 ` 6z 2 z 1 z 4 ` 2z 2 z 1 z 3 ` 2z 2 z 4 z 3 ` 2z 2 z 1 z 0 ` 4z 2 z 1 z 5 ` 4z 2 z 4 z 0 ` 4z 1 z 4 z 3 ` 6z 2 z 4 z 5 ` 2z 1 z 4 z 0 ` 6z 1 z 4 z 5 ` 2z 2 z 3 z 0 ` 2z 2 z 3 z 5 ` 2z 1 z 3 z 0 ` 2z 2 z 0 z 5 ` 2z 1 z 3 z 5 ` 2z 4 z 3 z 0 ` 2z 4 z 3 z 5 ` 2z 1 z 0 z 5 ` 2z 4 z 0 z 5. 2z 2 z 1 z 1 2z 4 z 3 z 0 2z 5 z 2 ` 2z 2 z 4 ` 2z 1 z 4 ` 2z 2 z 0 ` 2z 1 z 3 ` 2z 2 z 5 ` 2z 4 z 3 ` 2z 1 z 5 ` 2z 4 z 0 ` 4z 4 z 5 ` 2z 3 z 0 ` 2z 3 z 5 ` 2z 0 z 5 ` 2z4 2 ` 2z5 2

38 Non-I-characteristicity: Analytical Solution Parameter: z pz 0, z 1,..., z 5 q P r0, 1s 6. Example: p 1,1,1 z 2 ` z 1 ` z 4 ` z 5 3z 2 z 1 4z 2 z 4 4z 1 z 4 z 2 z 3 2z 2 z 0 2z 1 z 3 3z 2 z 5 2z 4 z 3 z 1 z 0 3z 1 z 5 2z 4 z 0 4z 4 z 5 z 3 z 0 z 3 z 5 z 0 z 5 ` 2z 2 z1 2 ` 2z2 2 z 1 ` 4z 2 z4 2 ` 2z2 2 z 4 ` 4z 1 z4 2 ` 2z1 2 z 4 ` 2z2 2 z 0 ` 2z1 2 z 3 ` 2z 2 z5 2 ` 2z2 2 z 5 ` 2z4 2 z 3 ` 2z 1 z5 2 ` 2z1 2 z 5 ` 2z4 2 z 0 ` 2z 4 z5 2 ` 4z4 2 z 5 z2 2 z1 2 3z4 2 ` 2z4 3 z5 2 ` 6z 2 z 1 z 4 ` 2z 2 z 1 z 3 ` 2z 2 z 4 z 3 ` 2z 2 z 1 z 0 ` 4z 2 z 1 z 5 ` 4z 2 z 4 z 0 ` 4z 1 z 4 z 3 ` 6z 2 z 4 z 5 ` 2z 1 z 4 z 0 ` 6z 1 z 4 z 5 ` 2z 2 z 3 z 0 ` 2z 2 z 3 z 5 ` 2z 1 z 3 z 0 ` 2z 2 z 0 z 5 ` 2z 1 z 3 z 5 ` 2z 4 z 3 z 0 ` 2z 4 z 3 z 5 ` 2z 1 z 0 z 5 ` 2z 4 z 0 z 5. 2z 2 z 1 z 1 2z 4 z 3 z 0 2z 5 z 2 ` 2z 2 z 4 ` 2z 1 z 4 ` 2z 2 z 0 ` 2z 1 z 3 ` 2z 2 z 5 ` 2z 4 z 3 ` 2z 1 z 5 ` 2z 4 z 0 ` 4z 4 z 5 ` 2z 3 z 0 ` 2z 3 z 5 ` 2z 0 z 5 ` 2z4 2 ` 2z5 2 We chose: z ` 1 10, 1 10, 1 10, 1 10, 1 10, 1 10.

39 Non-I-characteristicity: Analytical Solution Parameter: z pz 0, z 1,..., z 5 q P r0, 1s 6. Example: p 1,1,1 z 2 ` z 1 ` z 4 ` z 5 3z 2 z 1 4z 2 z 4 4z 1 z 4 z 2 z 3 2z 2 z 0 2z 1 z 3 3z 2 z 5 2z 4 z 3 z 1 z 0 3z 1 z 5 2z 4 z 0 4z 4 z 5 z 3 z 0 z 3 z 5 z 0 z 5 ` 2z 2 z1 2 ` 2z2 2 z 1 ` 4z 2 z4 2 ` 2z2 2 z 4 ` 4z 1 z4 2 ` 2z1 2 z 4 ` 2z2 2 z 0 ` 2z1 2 z 3 ` 2z 2 z5 2 ` 2z2 2 z 5 ` 2z4 2 z 3 ` 2z 1 z5 2 ` 2z1 2 z 5 ` 2z4 2 z 0 ` 2z 4 z5 2 ` 4z4 2 z 5 z2 2 z1 2 3z4 2 ` 2z4 3 z5 2 ` 6z 2 z 1 z 4 ` 2z 2 z 1 z 3 ` 2z 2 z 4 z 3 ` 2z 2 z 1 z 0 ` 4z 2 z 1 z 5 ` 4z 2 z 4 z 0 ` 4z 1 z 4 z 3 ` 6z 2 z 4 z 5 ` 2z 1 z 4 z 0 ` 6z 1 z 4 z 5 ` 2z 2 z 3 z 0 ` 2z 2 z 3 z 5 ` 2z 1 z 3 z 0 ` 2z 2 z 0 z 5 ` 2z 1 z 3 z 5 ` 2z 4 z 3 z 0 ` 2z 4 z 3 z 5 ` 2z 1 z 0 z 5 ` 2z 4 z 0 z 5. 2z 2 z 1 z 1 2z 4 z 3 z 0 2z 5 z 2 ` 2z 2 z 4 ` 2z 1 z 4 ` 2z 2 z 0 ` 2z 1 z 3 ` 2z 2 z 5 ` 2z 4 z 3 ` 2z 1 z 5 ` 2z 4 z 0 ` 4z 4 z 5 ` 2z 3 z 0 ` 2z 3 z 5 ` 2z 0 z 5 ` 2z4 2 ` 2z5 2 We chose: z ` 1 10, 1 10, 1 10, 1 10, 1 10, Universality: helps?

40 k 1, k 2 : universal, k 3 : char œ b 3 m 1 k m: I-characteristic Example X m t1, 2u, τ Xm Ppt1, 2uq, M 3. k 1 px, x 1 q k 2 px, x 1 q δ x,x 1: universal. k 3 px, x 1 q 2δ x,x 1 1: characteristic. Different constraints & Ppzq solution; same witness: useful. p 1,1,1 1 5, p 1,1,2 1 10, p 1,2,1 1 10, p 1,2,2 1 10, p 2,1,1 1 5, p 2,1,2 1 10, p 2,2,1 1 10, p 2,2,

41 Results [Szabó and Sriperumbudur, 2017] Proposition (characteristic property) b M m 1 k m: characteristic ñ pk m q M m 1 are characteristic. ö [ X m 2, k m px, x 1 q 2δ x,x 1 1]

42 Results [Szabó and Sriperumbudur, 2017] Proposition (characteristic property) b M m 1 k m: characteristic ñ pk m q M m 1 are characteristic. ö [ X m 2, k m px, x 1 q 2δ x,x 1 1] Proposition (I-characteristic property) k 1, k 2 : characteristic ñ k 1 b k 2 : I-characteristic. ð: ě 2. k 1, k 2, k 3 : characteristic œ b 3 m 1 k m: I-characteristic [Ex]. k 1, k 2 : universal, k 3 : char œ b 3 m 1 k m: I-characteristic [Ex].

43 Results: continued [Szabó and Sriperumbudur, 2017] Proposition (X m R dm, k m : continuous, shift-invariant, bounded) pk m q M m 1 -s are characteristic ô bm m 1 k m: I-characteristic ô b M m 1 k m: characteristic.

44 Results: continued [Szabó and Sriperumbudur, 2017] Proposition (X m R dm, k m : continuous, shift-invariant, bounded) pk m q M m 1 -s are characteristic ô bm m 1 k m: I-characteristic ô b M m 1 k m: characteristic. Proposition (Universality) b M m 1 k m: universal ô pk m q M m 1 are universal.

45 Summary We studied the validness of HSIC. HSIC ñ product structure: Space: X ˆMm 1 X m. Kernel: k b M m 1 k m. Complete answer in terms of k m -s.

46 Summary We studied the validness of HSIC. HSIC ñ product structure: Space: X ˆMm 1 X m. Kernel: k b M m 1 k m. Complete answer in terms of k m -s. ITE toolkit, preprint (with minor revision in JMLR):

47 Thank you for the attention! Acks: A part of the work was carried out while BKS was visiting ZSz at CMAP, École Polytechnique. BKS is supported by NSF-DMS

48 Balasubramanian, K., Li, T., and Yuan, M. (2017). On the optimality of kernel-embedding based goodness-of-fit tests. Technical report. ( Blanchard, G., Deshmukh, A. A., Dogan, U., Lee, G., and Scott, C. (2017). Domain generalization by marginal transfer learning. Technical report. ( Blanchard, G., Lee, G., and Scott, C. (2011). Generalizing from several related classification tasks to a new unlabeled sample. In Advances in Neural Information Processing Systems (NIPS), pages Borgwardt, K., Gretton, A., Rasch, M. J., Kriegel, H.-P., Schölkopf, B., and Smola, A. J. (2006).

49 Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics, 22:e Fukumizu, K., Gretton, A., Sun, X., and Schölkopf, B. (2008). Kernel measures of conditional dependence. In Neural Information Processing Systems (NIPS), pages Fukumizu, K., Song, L., and Gretton, A. (2013). Kernel Bayes rule: Bayesian inference with positive definite kernels. Journal of Machine Learning Research, 14: Gretton, A. (2015). A simpler condition for consistency of a kernel independence test. Technical report, University College London. (

50 Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., and Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13: Gretton, A., Bousquet, O., Smola, A., and Schölkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. In Algorithmic Learning Theory (ALT), pages Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., and Smola, A. J. (2008). A kernel statistical test of independence. In Neural Information Processing Systems (NIPS), pages Kim, B., Khanna, R., and Koyejo, O. O. (2016). Examples are not enough, learn to criticize! criticism for interpretability.

51 In Advances in Neural Information Processing Systems (NIPS), pages Kusano, G., Fukumizu, K., and Hiraoka, Y. (2016). Persistence weighted Gaussian kernel for topological data analysis. In International Conference on Machine Learning (ICML), pages Law, H. C. L., Sutherland, D. J., Sejdinovic, D., and Flaxman, S. (2018). Bayesian approaches to distribution regression. In International Conference on Artificial Intelligence and Statistics (AISTATS). Lloyd, J. R., Duvenaud, D., Grosse, R., Tenenbaum, J. B., and Ghahramani, Z. (2014). Automatic construction and natural-language description of nonparametric regression models.

52 In AAAI Conference on Artificial Intelligence, pages Lopez-Paz, D., Muandet, K., Schölkopf, B., and Tolstikhin, I. (2015). Towards a learning theory of cause-effect inference. International Conference on Machine Learning (ICML; PMLR), 37: Lyons, R. (2013). Distance covariance in metric spaces. The Annals of Probability, 41: Mooij, J. M., Peters, J., Janzing, D., Zscheischler, J., and Schölkopf, B. (2016). Distinguishing cause from effect using observational data: Methods and benchmarks. Journal of Machine Learning Research, 17: Muandet, K., Fukumizu, K., Dinuzzo, F., and Schölkopf, B. (2011).

53 Learning from distributions via support measure machines. In Neural Information Processing Systems (NIPS), pages Muandet, K., Fukumizu, K., Sriperumbudur, B., and Schölkopf, B. (2017). Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, 10(1-2): Park, M., Jitkrittum, W., and Sejdinovic, D. (2016). K2-ABC: Approximate Bayesian computation with kernel embeddings. In International Conference on Artificial Intelligence and Statistics (AISTATS; PMLR), volume 51, pages 51: Pfister, N., Bühlmann, P., Schölkopf, B., and Peters, J. (2017). Kernel-based tests for joint independence.

54 Journal of the Royal Statistical Society: Series B (Statistical Methodology). Schölkopf, B., Muandet, K., Fukumizu, K., Harmeling, S., and Peters, J. (2015). Computing functions of random variables via reproducing kernel Hilbert space representations. Statistics and Computing, 25(4): Sejdinovic, D., Sriperumbudur, B. K., Gretton, A., and Fukumizu, K. (2013). Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Annals of Statistics, 41: Song, L., Gretton, A., Bickson, D., Low, Y., and Guestrin, C. (2011). Kernel belief propagation. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages

55 Song, L., Smola, A., Gretton, A., Bedo, J., and Borgwardt, K. (2012). Feature selection via dependence maximization. Journal of Machine Learning Research, 13: Sriperumbudur, B. K., Gretton, A., Fukumizu, K., Schölkopf, B., and Lanckriet, G. R. (2010). Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11: Steinwart, I. (2001). On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 6(3): Strobl, E. V., Visweswaran, S., and Zhang, K. (2017). Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. Technical report.

56 ( Szabó, Z. and Sriperumbudur, B. (2017). Characteristic and universal tensor product kernels. Technical report. ( Szabó, Z., Sriperumbudur, B., Póczos, B., and Gretton, A. (2016). Learning theory for distribution regression. Journal of Machine Learning Research, 17(152):1 40. Yamada, M., Umezu, Y., Fukumizu, K., and Takeuchi, I. (2018). Post selection inference with kernels. In International Conference on Artificial Intelligence and Statistics (AISTATS; PMLR), volume 84, pages Zaheer, M., Kottur, S., Ravanbakhsh, S., Póczos, B., Salakhutdinov, R. R., and Smola, A. J. (2017). Deep sets.

57 In Advances in Neural Information Processing Systems (NIPS), pages Zhang, K., Schölkopf, B., Muandet, K., and Wang, Z. (2013). Domain adaptation under target and conditional shift. Journal of Machine Learning Research, 28(3):

Characterizing Independence with Tensor Product Kernels

Characterizing Independence with Tensor Product Kernels Characterizing Independence with Tensor Product Kernels Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Department of Statistics, PSU December 13, 2017 Zolta n Szabo

More information

Kernels. B.Sc. École Polytechnique September 4, 2018

Kernels. B.Sc. École Polytechnique September 4, 2018 CMAP B.Sc. Day @ École Polytechnique September 4, 2018 Inner product Ñ kernel: similarity between features Extension of kpx, yq x, y ř i x iy i : kpx, yq ϕpxq, ϕpyq H. Inner product Ñ kernel: similarity

More information

Kernel-based Dependency Measures & Hypothesis Testing

Kernel-based Dependency Measures & Hypothesis Testing Kernel-based Dependency Measures and Hypothesis Testing CMAP, École Polytechnique Carnegie Mellon University November 27, 2017 Outline Motivation. Diverse set of domains: kernel. Dependency measures: Kernel

More information

Distribution Regression: A Simple Technique with Minimax-optimal Guarantee

Distribution Regression: A Simple Technique with Minimax-optimal Guarantee Distribution Regression: A Simple Technique with Minimax-optimal Guarantee (CMAP, École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department,

More information

arxiv: v1 [stat.ml] 28 Aug 2017

arxiv: v1 [stat.ml] 28 Aug 2017 Characteristic and Universal Tensor Product Kernels arxiv:1708.08157v1 [stat.l] 8 Aug 017 Zoltán Szabó CAP, École Polytechnique Route de Saclay, 9118 Palaiseau, France Bharath K. Sriperumbudur Department

More information

A Linear-Time Kernel Goodness-of-Fit Test

A Linear-Time Kernel Goodness-of-Fit Test A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum 1; Wenkai Xu 1 Zoltán Szabó 2 NIPS 2017 Best paper! Kenji Fukumizu 3 Arthur Gretton 1 wittawatj@gmail.com 1 Gatsby Unit, University College

More information

Hilbert Space Embedding of Probability Measures

Hilbert Space Embedding of Probability Measures Lecture 2 Hilbert Space Embedding of Probability Measures Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 2017 Recap of Lecture

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney

More information

Distribution Regression

Distribution Regression Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481

More information

Learning Interpretable Features to Compare Distributions

Learning Interpretable Features to Compare Distributions Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections

More information

Learning features to compare distributions

Learning features to compare distributions Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this

More information

Learning features to compare distributions

Learning features to compare distributions Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this

More information

Hilbert Space Representations of Probability Distributions

Hilbert Space Representations of Probability Distributions Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

Minimax-optimal distribution regression

Minimax-optimal distribution regression Zoltán Szabó (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) ISNPS, Avignon June 12,

More information

Kernel methods for Bayesian inference

Kernel methods for Bayesian inference Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20

More information

Distribution Regression with Minimax-Optimal Guarantee

Distribution Regression with Minimax-Optimal Guarantee Distribution Regression with Minimax-Optimal Guarantee (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton

More information

Big Hypothesis Testing with Kernel Embeddings

Big Hypothesis Testing with Kernel Embeddings Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big

More information

Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise

Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Makoto Yamada and Masashi Sugiyama Department of Computer Science, Tokyo Institute of Technology

More information

Recovering Distributions from Gaussian RKHS Embeddings

Recovering Distributions from Gaussian RKHS Embeddings Motonobu Kanagawa Graduate University for Advanced Studies kanagawa@ism.ac.jp Kenji Fukumizu Institute of Statistical Mathematics fukumizu@ism.ac.jp Abstract Recent advances of kernel methods have yielded

More information

Hypothesis Testing with Kernel Embeddings on Interdependent Data

Hypothesis Testing with Kernel Embeddings on Interdependent Data Hypothesis Testing with Kernel Embeddings on Interdependent Data Dino Sejdinovic Department of Statistics University of Oxford joint work with Kacper Chwialkowski and Arthur Gretton (Gatsby Unit, UCL)

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Generalized clustering via kernel embeddings

Generalized clustering via kernel embeddings Generalized clustering via kernel embeddings Stefanie Jegelka 1, Arthur Gretton 2,1, Bernhard Schölkopf 1, Bharath K. Sriperumbudur 3, and Ulrike von Luxburg 1 1 Max Planck Institute for Biological Cybernetics,

More information

Causal Inference in Time Series via Supervised Learning

Causal Inference in Time Series via Supervised Learning Causal Inference in Time Series via Supervised Learning Yoichi Chikahara and Akinori Fujino NTT Communication Science Laboratories, Kyoto 619-0237, Japan chikahara.yoichi@lab.ntt.co.jp, fujino.akinori@lab.ntt.co.jp

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

On a Nonparametric Notion of Residual and its Applications

On a Nonparametric Notion of Residual and its Applications On a Nonparametric Notion of Residual and its Applications Bodhisattva Sen and Gábor Székely arxiv:1409.3886v1 [stat.me] 12 Sep 2014 Columbia University and National Science Foundation September 16, 2014

More information

Causality. Bernhard Schölkopf and Jonas Peters MPI for Intelligent Systems, Tübingen. MLSS, Tübingen 21st July 2015

Causality. Bernhard Schölkopf and Jonas Peters MPI for Intelligent Systems, Tübingen. MLSS, Tübingen 21st July 2015 Causality Bernhard Schölkopf and Jonas Peters MPI for Intelligent Systems, Tübingen MLSS, Tübingen 21st July 2015 Charig et al.: Comparison of treatment of renal calculi by open surgery, (...), British

More information

Feature-to-Feature Regression for a Two-Step Conditional Independence Test

Feature-to-Feature Regression for a Two-Step Conditional Independence Test Feature-to-Feature Regression for a Two-Step Conditional Independence Test Qinyi Zhang Department of Statistics University of Oxford qinyi.zhang@stats.ox.ac.uk Sarah Filippi School of Public Health Department

More information

Least-Squares Independence Test

Least-Squares Independence Test IEICE Transactions on Information and Sstems, vol.e94-d, no.6, pp.333-336,. Least-Squares Independence Test Masashi Sugiama Toko Institute of Technolog, Japan PRESTO, Japan Science and Technolog Agenc,

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University

More information

Two-stage Sampled Learning Theory on Distributions

Two-stage Sampled Learning Theory on Distributions Two-stage Sampled Learning Theory on Distributions Zoltán Szabó (Gatsby Unit, UCL) Joint work with Arthur Gretton (Gatsby Unit, UCL), Barnabás Póczos (ML Department, CMU), Bharath K. Sriperumbudur (Department

More information

Kernel Measures of Conditional Dependence

Kernel Measures of Conditional Dependence Kernel Measures of Conditional Dependence Kenji Fukumizu Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 6-8569 Japan fukumizu@ism.ac.jp Arthur Gretton Max-Planck Institute for

More information

Hilbert Schmidt Independence Criterion

Hilbert Schmidt Independence Criterion Hilbert Schmidt Independence Criterion Thanks to Arthur Gretton, Le Song, Bernhard Schölkopf, Olivier Bousquet Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeddings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.edu, bodai@gatech.edu Abstract Kernel embedding of

More information

Package dhsic. R topics documented: July 27, 2017

Package dhsic. R topics documented: July 27, 2017 Package dhsic July 27, 2017 Type Package Title Independence Testing via Hilbert Schmidt Independence Criterion Version 2.0 Date 2017-07-24 Author Niklas Pfister and Jonas Peters Maintainer Niklas Pfister

More information

Advances in kernel exponential families

Advances in kernel exponential families Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

arxiv: v1 [cs.lg] 22 Jun 2009

arxiv: v1 [cs.lg] 22 Jun 2009 Bayesian two-sample tests arxiv:0906.4032v1 [cs.lg] 22 Jun 2009 Karsten M. Borgwardt 1 and Zoubin Ghahramani 2 1 Max-Planck-Institutes Tübingen, 2 University of Cambridge June 22, 2009 Abstract In this

More information

Fast Two Sample Tests using Smooth Random Features

Fast Two Sample Tests using Smooth Random Features Kacper Chwialkowski University College London, Computer Science Department Aaditya Ramdas Carnegie Mellon University, Machine Learning and Statistics School of Computer Science Dino Sejdinovic University

More information

Deep Convolutional Neural Networks for Pairwise Causality

Deep Convolutional Neural Networks for Pairwise Causality Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, Delhi Tata Consultancy Services Ltd. {karamjit.singh,

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

Probabilistic latent variable models for distinguishing between cause and effect

Probabilistic latent variable models for distinguishing between cause and effect Probabilistic latent variable models for distinguishing between cause and effect Joris M. Mooij joris.mooij@tuebingen.mpg.de Oliver Stegle oliver.stegle@tuebingen.mpg.de Dominik Janzing dominik.janzing@tuebingen.mpg.de

More information

A Linear-Time Kernel Goodness-of-Fit Test

A Linear-Time Kernel Goodness-of-Fit Test A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum Gatsby Unit, UCL wittawatj@gmail.com Wenkai Xu Gatsby Unit, UCL wenkaix@gatsby.ucl.ac.uk Zoltán Szabó CMAP, École Polytechnique zoltan.szabo@polytechnique.edu

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Causal Inference on Discrete Data via Estimating Distance Correlations

Causal Inference on Discrete Data via Estimating Distance Correlations ARTICLE CommunicatedbyAapoHyvärinen Causal Inference on Discrete Data via Estimating Distance Correlations Furui Liu frliu@cse.cuhk.edu.hk Laiwan Chan lwchan@cse.euhk.edu.hk Department of Computer Science

More information

Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.

Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Michel Besserve MPI for Intelligent Systems, Tübingen michel.besserve@tuebingen.mpg.de Nikos K. Logothetis MPI

More information

Forefront of the Two Sample Problem

Forefront of the Two Sample Problem Forefront of the Two Sample Problem From classical to state-of-the-art methods Yuchi Matsuoka What is the Two Sample Problem? X ",, X % ~ P i. i. d Y ",, Y - ~ Q i. i. d Two Sample Problem P = Q? Example:

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University of Helsinki 14 Helsinki, Finland kun.zhang@cs.helsinki.fi Aapo Hyvärinen

More information

Nonparametric Tree Graphical Models via Kernel Embeddings

Nonparametric Tree Graphical Models via Kernel Embeddings Le Song, 1 Arthur Gretton, 1,2 Carlos Guestrin 1 1 School of Computer Science, Carnegie Mellon University; 2 MPI for Biological Cybernetics Abstract We introduce a nonparametric representation for graphical

More information

Gaussian Processes. Recommended reading: Rasmussen/Williams: Chapters 1, 2, 4, 5 Deisenroth & Ng (2015)[3] Data Analysis and Probabilistic Inference

Gaussian Processes. Recommended reading: Rasmussen/Williams: Chapters 1, 2, 4, 5 Deisenroth & Ng (2015)[3] Data Analysis and Probabilistic Inference Data Analysis and Probabilistic Inference Gaussian Processes Recommended reading: Rasmussen/Williams: Chapters 1, 2, 4, 5 Deisenroth & Ng (2015)[3] Marc Deisenroth Department of Computing Imperial College

More information

Manifold Regularization

Manifold Regularization Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1 The Problem of Learning is

More information

A Kernel Test for Three-Variable Interactions

A Kernel Test for Three-Variable Interactions A Kernel Test for Three-Variable Interactions Dino Sejdinovic, Arthur Gretton Gatsby Unit, CSML, UCL, UK {dino.sejdinovic, arthur.gretton}@gmail.com Wicher Bergsma Department of Statistics, LSE, UK w.p.bergsma@lse.ac.uk

More information

Adaptive HMC via the Infinite Exponential Family

Adaptive HMC via the Infinite Exponential Family Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family

More information

Maximum Mean Discrepancy

Maximum Mean Discrepancy Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia

More information

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,

More information

Robust Support Vector Machines for Probability Distributions

Robust Support Vector Machines for Probability Distributions Robust Support Vector Machines for Probability Distributions Andreas Christmann joint work with Ingo Steinwart (Los Alamos National Lab) ICORS 2008, Antalya, Turkey, September 8-12, 2008 Andreas Christmann,

More information

Hypothesis Testing with Kernels

Hypothesis Testing with Kernels (Gatsby Unit, UCL) PRNI, Trento June 22, 2016 Motivation: detecting differences in AM signals Amplitude modulation: simple technique to transmit voice over radio. in the example: 2 songs. Fragments from

More information

Hilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems

Hilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems with Applications to Dynamical Systems Le Song Jonathan Huang School of Computer Science, Carnegie Mellon University Alex Smola Yahoo! Research, Santa Clara, CA, USA Kenji Fukumizu Institute of Statistical

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Distinguishing between Cause and Effect: Estimation of Causal Graphs with two Variables

Distinguishing between Cause and Effect: Estimation of Causal Graphs with two Variables Distinguishing between Cause and Effect: Estimation of Causal Graphs with two Variables Jonas Peters ETH Zürich Tutorial NIPS 2013 Workshop on Causality 9th December 2013 F. H. Messerli: Chocolate Consumption,

More information

Causal Inference via Kernel Deviance Measures

Causal Inference via Kernel Deviance Measures Causal Inference via Kernel Deviance Measures Jovana Mitrovic Department of Statistics University of Oxford Dino Sejdinovic Department of Statistics University of Oxford Yee Whye Teh Department of Statistics

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Hilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems

Hilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems with Applications to Dynamical Systems Le Song Jonathan Huang School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Alex Smola Yahoo! Research, Santa Clara, CA 95051, USA Kenji

More information

A Kernelized Stein Discrepancy for Goodness-of-fit Tests

A Kernelized Stein Discrepancy for Goodness-of-fit Tests Qiang Liu QLIU@CS.DARTMOUTH.EDU Computer Science, Dartmouth College, NH, 3755 Jason D. Lee JASONDLEE88@EECS.BERKELEY.EDU Michael Jordan JORDAN@CS.BERKELEY.EDU Department of Electrical Engineering and Computer

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Independent Subspace Analysis

Independent Subspace Analysis Independent Subspace Analysis Barnabás Póczos Supervisor: Dr. András Lőrincz Eötvös Loránd University Neural Information Processing Group Budapest, Hungary MPI, Tübingen, 24 July 2007. Independent Component

More information

Mathematische Methoden der Unsicherheitsquantifizierung

Mathematische Methoden der Unsicherheitsquantifizierung Mathematische Methoden der Unsicherheitsquantifizierung Oliver Ernst Professur Numerische Mathematik Sommersemester 2014 Contents 1 Introduction 1.1 What is Uncertainty Quantification? 1.2 A Case Study:

More information

arxiv: v2 [stat.ml] 3 Sep 2015

arxiv: v2 [stat.ml] 3 Sep 2015 Nonparametric Independence Testing for Small Sample Sizes arxiv:46.9v [stat.ml] 3 Sep 5 Aaditya Ramdas Dept. of Statistics and Machine Learning Dept. Carnegie Mellon University aramdas@cs.cmu.edu September

More information

Unsupervised Nonparametric Anomaly Detection: A Kernel Method

Unsupervised Nonparametric Anomaly Detection: A Kernel Method Fifty-second Annual Allerton Conference Allerton House, UIUC, Illinois, USA October - 3, 24 Unsupervised Nonparametric Anomaly Detection: A Kernel Method Shaofeng Zou Yingbin Liang H. Vincent Poor 2 Xinghua

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Causal Modeling with Generative Neural Networks

Causal Modeling with Generative Neural Networks Causal Modeling with Generative Neural Networks Michele Sebag TAO, CNRS INRIA LRI Université Paris-Sud Joint work: D. Kalainathan, O. Goudet, I. Guyon, M. Hajaiej, A. Decelle, C. Furtlehner https://arxiv.org/abs/1709.05321

More information

arxiv: v1 [cs.lg] 3 Jan 2017

arxiv: v1 [cs.lg] 3 Jan 2017 Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, New-Delhi, India January 4, 2017 arxiv:1701.00597v1

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Inference of Cause and Effect with Unsupervised Inverse Regression

Inference of Cause and Effect with Unsupervised Inverse Regression Inference of Cause and Effect with Unsupervised Inverse Regression Eleni Sgouritsa Dominik Janzing Philipp Hennig Bernhard Schölkopf Max Planck Institute for Intelligent Systems, Tübingen, Germany {eleni.sgouritsa,

More information

Lecture 2: Mappings of Probabilities to RKHS and Applications

Lecture 2: Mappings of Probabilities to RKHS and Applications Lecture 2: Mappings of Probabilities to RKHS and Applications Lille, 214 Arthur Gretton Gatsby Unit, CSML, UCL Detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange

More information

Distributed Gaussian Processes

Distributed Gaussian Processes Distributed Gaussian Processes Marc Deisenroth Department of Computing Imperial College London http://wp.doc.ic.ac.uk/sml/marc-deisenroth Gaussian Process Summer School, University of Sheffield 15th September

More information

Kernel Adaptive Metropolis-Hastings

Kernel Adaptive Metropolis-Hastings Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1

More information

Semi-Supervised Learning through Principal Directions Estimation

Semi-Supervised Learning through Principal Directions Estimation Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de

More information

Foundations of Causal Inference

Foundations of Causal Inference Foundations of Causal Inference Causal inference is usually concerned with exploring causal relations among random variables X 1,..., X n after observing sufficiently many samples drawn from the joint

More information

Feature Selection via Dependence Maximization

Feature Selection via Dependence Maximization Journal of Machine Learning Research 1 (2007) xxx-xxx Submitted xx/07; Published xx/07 Feature Selection via Dependence Maximization Le Song lesong@it.usyd.edu.au NICTA, Statistical Machine Learning Program,

More information

arxiv: v1 [cs.ai] 22 Apr 2015

arxiv: v1 [cs.ai] 22 Apr 2015 Distinguishing Cause from Effect Based on Exogeneity [Extended Abstract] Kun Zhang MPI for Intelligent Systems 7276 Tübingen, Germany & Info. Sci. Inst., USC 4676 Admiralty Way, CA 9292 kzhang@tuebingen.mpg.de

More information

Importance Reweighting Using Adversarial-Collaborative Training

Importance Reweighting Using Adversarial-Collaborative Training Importance Reweighting Using Adversarial-Collaborative Training Yifan Wu yw4@andrew.cmu.edu Tianshu Ren tren@andrew.cmu.edu Lidan Mu lmu@andrew.cmu.edu Abstract We consider the problem of reweighting a

More information

Non-Linear Regression for Bag-of-Words Data via Gaussian Process Latent Variable Set Model

Non-Linear Regression for Bag-of-Words Data via Gaussian Process Latent Variable Set Model Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Non-Linear Regression for Bag-of-Words Data via Gaussian Process Latent Variable Set Model Yuya Yoshikawa Nara Institute of Science

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

Human-level concept learning through probabilistic program induction

Human-level concept learning through probabilistic program induction B.M Lake, R. Salakhutdinov, J.B. Tenenbaum Human-level concept learning through probabilistic program induction journal club at two aspects in which machine learning spectacularly lags behind human learning

More information

A Kernel Statistical Test of Independence

A Kernel Statistical Test of Independence A Kernel Statistical Test of Independence Arthur Gretton MPI for Biological Cybernetics Tübingen, Germany arthur@tuebingen.mpg.de Kenji Fukumizu Inst. of Statistical Mathematics Tokyo Japan fukumizu@ism.ac.jp

More information

Doubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London

Doubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London Doubly Stochastic Inference for Deep Gaussian Processes Hugh Salimbeni Department of Computing Imperial College London 29/5/2017 Motivation DGPs promise much, but are difficult to train Doubly Stochastic

More information

The Perturbed Variation

The Perturbed Variation The Perturbed Variation Maayan Harel Department of Electrical Engineering Technion, Haifa, Israel maayanga@tx.technion.ac.il Shie Mannor Department of Electrical Engineering Technion, Haifa, Israel shie@ee.technion.ac.il

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,

More information

Ecological inference with distribution regression

Ecological inference with distribution regression Ecological inference with distribution regression Seth Flaxman 10 May 2017 Department of Politics and International Relations Ecological inference I How to draw conclusions about individuals from aggregate-level

More information

Generalized Transfer Component Analysis for Mismatched Jpeg Steganalysis

Generalized Transfer Component Analysis for Mismatched Jpeg Steganalysis Generalized Transfer Component Analysis for Mismatched Jpeg Steganalysis Xiaofeng Li 1, Xiangwei Kong 1, Bo Wang 1, Yanqing Guo 1,Xingang You 2 1 School of Information and Communication Engineering Dalian

More information

Domain Adaptation with Conditional Transferable Components

Domain Adaptation with Conditional Transferable Components Mingming Gong 1 MINGMING.GONG@STUDENT.UTS.EDU.AU Kun Zhang 2,3 KUNZ1@CMU.EDU Tongliang Liu 1 TLIANG.LIU@GMAIL.COM Dacheng Tao 1 DACHENG.TAO@UTS.EDU.AU Clark Glymour 2 CG09@ANDREW.CMU.EDU Bernhard Schölkopf

More information