Tensor Product Kernels: Characteristic Property, Universality
|
|
- Mabel Hopkins
- 5 years ago
- Views:
Transcription
1 Tensor Product Kernels: Characteristic Property, Universality Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Hangzhou International Conference on Frontiers of Data Science May 19, 2018 Zolta n Szabo
2 Motivation: Classical Information Theory Kullback-Leibler divergence: ż j ppxq KL pp, Qq ppxq log dx. R d qpxq
3 Motivation: Classical Information Theory Kullback-Leibler divergence: ż j ppxq KL pp, Qq ppxq log dx. R d qpxq Mutual information: I ppq KL P, b M m 1P m.
4 Motivation: Classical Information Theory Kullback-Leibler divergence: ż j ppxq KL pp, Qq ppxq log dx. R d qpxq Mutual information: Properties: 1 I ppq ě 0. I ppq KL P, b M m 1P m.
5 Motivation: Classical Information Theory Kullback-Leibler divergence: ż j ppxq KL pp, Qq ppxq log dx. R d qpxq Mutual information: Properties: I ppq KL 1 I ppq ě 0. 2 I ppq 0 ô P b M m 1 P m. P, b M m 1P m.
6 Motivation: Classical Information Theory Kullback-Leibler divergence: ż j ppxq KL pp, Qq ppxq log dx. R d qpxq Mutual information: Properties: I ppq KL 1 I ppq ě 0. 2 I ppq 0 ô P b M m 1 P m. P, b M m 1P m. Alternatives: Rényi, Tsallis, L 2 divergence... Typically: X R d.
7 From Rd to Diverse Set of Domains, Kernel Examples X Rd, γ ą 0: kpx, y q hϕpxq, ϕpy qih kp px, yq phx, yi ` γqp, ke px, yq e γ}x y}2, Zolta n Szabo 2 kg px, yq e γ}x y}2, 1 kc px, yq 1 `. γ }x y}22
8 From Rd to Diverse Set of Domains, Kernel Examples X Rd, γ ą 0: kpx, y q hϕpxq, ϕpy qih kp px, yq phx, yi ` γqp, ke px, yq e γ}x y}2, 2 kg px, yq e γ}x y}2, 1 kc px, yq 1 `. γ }x y}22 X = strings, texts: r -spectrum kernel: # of common ď r -substrings. Zolta n Szabo
9 From Rd to Diverse Set of Domains, Kernel Examples X Rd, γ ą 0: kpx, y q hϕpxq, ϕpy qih kp px, yq phx, yi ` γqp, ke px, yq e γ}x y}2, 2 kg px, yq e γ}x y}2, 1 kc px, yq 1 `. γ }x y}22 X = strings, texts: r -spectrum kernel: # of common ď r -substrings. X = time-series: dynamic time-warping. Zolta n Szabo
10 From Rd to Diverse Set of Domains, Kernel Examples X Rd, γ ą 0: kpx, y q hϕpxq, ϕpy qih kp px, yq phx, yi ` γqp, ke px, yq e γ}x y}2, 2 kg px, yq e γ}x y}2, 1 kc px, yq 1 `. γ }x y}22 X = strings, texts: r -spectrum kernel: # of common ď r -substrings. X = time-series: dynamic time-warping. X = graphs, sets, permutations,... Zolta n Szabo
11 Distribution Representation P ÞÑ µ P ş X ϕpxqdppxq.
12 Distribution Representation Cdf: P ÞÑ µ P ş X ϕpxqdppxq. P ÞÑ F P pzq E x P χ p 8,zq pxq.
13 Distribution Representation Cdf: P ÞÑ µ P ş X ϕpxqdppxq. P ÞÑ F P pzq E x P χ p 8,zq pxq. Characteristic function: ż P ÞÑ c P pzq e i z,x dppxq.
14 Distribution Representation Cdf: P ÞÑ µ P ş X ϕpxqdppxq. P ÞÑ F P pzq E x P χ p 8,zq pxq. Characteristic function: ż P ÞÑ c P pzq Moment generating function: ż P ÞÑ M P pzq e i z,x dppxq. e z,x dp pxq.
15 Distribution Representation Cdf: P ÞÑ µ P ş X ϕpxqdppxq. P ÞÑ F P pzq E x P χ p 8,zq pxq. Characteristic function: ż P ÞÑ c P pzq Moment generating function: ż P ÞÑ M P pzq e i z,x dppxq. e z,x dp pxq. Trick ϕ: on any kernel-endowed domain!
16 Objects of Interest KL divergence & mutual information on kernel-endowed domains. Mean embedding: ż µppq : X ϕpxq dppxq
17 Objects of Interest KL divergence & mutual information on kernel-endowed domains. Mean embedding: ż µ k ppq : Maximum mean discrepancy: X ϕpxq lomon kp,xq dppxq P H k. MMD k pp, Qq : }µ k ppq µ k pqq} Hk.
18 Objects of Interest KL divergence & mutual information on kernel-endowed domains. Mean embedding: ż µ k ppq : Maximum mean discrepancy: X ϕpxq lomon kp,xq dppxq P H k. MMD k pp, Qq : }µ k ppq µ k pqq} Hk. Hilbert-Schmidt independence criterion, k b M m 1 k m: HSIC k ppq : MMD k P, b M m 1P m.
19 RKHS intuition (X : X m, k : k m ) Given: X set. H(ilbert space). Kernel: kpa, bq xϕpaq, ϕpbqy H.
20 RKHS intuition (X : X m, k : k m ) Given: X set. H(ilbert space). Kernel: Reproducing kernel of a H Ă R X : kpa, bq xϕpaq, ϕpbqy H. lomon kp, bq P H, f looooooooooomooooooooooon pbq xf, kp, bqy H. reproducing property
21 RKHS intuition (X : X m, k : k m ) Given: X set. H(ilbert space). Kernel: Reproducing kernel of a H Ă R X : kpa, bq xϕpaq, ϕpbqy H. lomon kp, bq P H, f looooooooooomooooooooooon pbq xf, kp, bqy H. reproducing property spec. ÝÝÝÑ kpa, bq xkp, aq, kp, bqy H.
22 RKHS intuition (X : X m, k : k m ) Given: X set. H(ilbert space). Kernel: Reproducing kernel of a H Ă R X : kpa, bq xϕpaq, ϕpbqy H. lomon kp, bq P H, f looooooooooomooooooooooon pbq xf, kp, bqy H. reproducing property spec. ÝÝÝÑ kpa, bq xkp, aq, kp, bqy H. H k t ř n i 1 α ikp, x i qu.
23 Mean Embedding, MMD: Applications & Review Applications: two-sample testing [Borgwardt et al., 2006, Gretton et al., 2012], domain adaptation [Zhang et al., 2013], -generalization [Blanchard et al., 2017], kernel Bayesian inference [Song et al., 2011, Fukumizu et al., 2013] approximate Bayesian computation [Park et al., 2016], probabilistic programming [Schölkopf et al., 2015], model criticism [Lloyd et al., 2014, Kim et al., 2016], goodness-of-fit [Balasubramanian et al., 2017], distribution classification [Muandet et al., 2011, Lopez-Paz et al., 2015], [Zaheer et al., 2017], distribution regression [Szabó et al., 2016], [Law et al., 2018], topological data analysis [Kusano et al., 2016]. Review [Muandet et al., 2017]. Switching to HSIC...
24 MMD spec. ÝÝÑ HSIC MMD with k b M m 1 k m: k `x, x 1 : Mź k m `xm, xm 1, m 1 HSIC k ppq : MMD k P, b M m 1P m.
25 MMD spec. ÝÝÑ HSIC MMD with k b M m 1 k m: Applications : k `x, x 1 : Mź k m `xm, xm 1, m 1 HSIC k ppq : MMD k P, b M m 1P m. blind source separation [Gretton et al., 2005], feature selection [Song et al., 2012], post selection inference [Yamada et al., 2018], independence testing [Gretton et al., 2008], causal inference [Mooij et al., 2016, Pfister et al., 2017, Strobl et al., 2017].
26 Central in Applications: Characteristic Property MMD: k is called characteristic [Fukumizu et al., 2008] if MMD k pp, Qq 0 ô P Q. Injectivitity of P ÞÑ µ P on finite signed measures: universality [Steinwart, 2001].
27 Central in Applications: Characteristic Property MMD: k is called characteristic [Fukumizu et al., 2008] if MMD k pp, Qq 0 ô P Q. Injectivitity of P ÞÑ µ P on finite signed measures: universality [Steinwart, 2001]. HSIC: k b M m 1 k m will be called I-characteristic if HSIC k ppq 0 ô P b M m 1P m.
28 Central in Applications: Characteristic Property MMD: k is called characteristic [Fukumizu et al., 2008] if MMD k pp, Qq 0 ô P Q. Injectivitity of P ÞÑ µ P on finite signed measures: universality [Steinwart, 2001]. HSIC: k b M m 1 k m will be called I-characteristic if HSIC k ppq 0 ô P b M m 1P m. b M m 1 k m: universal ñ characteristic ñ I-characteristic.
29 Central in Applications: Characteristic Property Wanted MMD: k is called characteristic [Fukumizu et al., 2008] if MMD k pp, Qq 0 ô P Q. Injectivitity of P ÞÑ µ P on finite signed measures: universality [Steinwart, 2001]. HSIC: k b M m 1 k m will be called I-characteristic if HSIC k ppq 0 ô P b M m 1P m. b M m 1 k m: universal ñ characteristic ñ I-characteristic. Characteristic properties of b M m 1 k m in terms of k m -s?
30 Well-known: shift-invariant kernels on R d Theorem ([Sriperumbudur et al., 2010]) k is characteristic iff. supppλq R d, where ż kpx, x 1 q e i x x1,ω dλpωq. R d
31 Well-known: shift-invariant kernels on R d Theorem ([Sriperumbudur et al., 2010]) k is characteristic iff. supppλq R d, where ż kpx, x 1 q k 0 px x 1 q e i x x1,ω dλpωq. R d Example on R: kernel name k 0 p k0 pωq supp` p k0 Gaussian e x2 2σ 2 σe σ2 ω 2 2 R b Laplacian e σ x 2 π σ a 2`ω2 Sinc π 2 χ r σ,σspωq sinpσxq x σ R r σ, σs
32 Well-known: M 2 [Blanchard et al., 2011, Gretton, 2015]: k 1 &k 2 : universal ñ k 1 b k 2 : universal (ñ I-characteristic).
33 Well-known: M 2 [Blanchard et al., 2011, Gretton, 2015]: k 1 &k 2 : universal ñ k 1 b k 2 : universal (ñ I-characteristic). Distance covariance [Lyons, 2013, Sejdinovic et al., 2013]: k 1 &k 2 : characteristic ô k 1 b k 2 : I-characteristic.
34 Well-known: M 2 [Blanchard et al., 2011, Gretton, 2015]: k 1 &k 2 : universal ñ k 1 b k 2 : universal (ñ I-characteristic). Distance covariance [Lyons, 2013, Sejdinovic et al., 2013]: k 1 &k 2 : characteristic ô k 1 b k 2 : I-characteristic. Goal Extension to M ě 2.
35 k 1, k 2, k 3 : characteristic œ b 3 m 1 k m: I-characteristic Example X m t1, 2u, τ Xm Ppt1, 2uq, k m px, x 1 q 2δ x,x 1 1, M 3. Then pk m q 3 m 1 : characteristic. b 3 m 1 k m: is not I-characteristic. Witness: p 1,1,1 1 5, p 1,1,2 1 10, p 1,2,1 1 10, p 1,2,2 1 10, p 2,1,1 1 5, p 2,1,2 1 10, p 2,2,1 1 10, p 2,2,
36 Non-I-characteristicity: Analytical Solution Parameter: z pz 0, z 1,..., z 5 q P r0, 1s 6.
37 Non-I-characteristicity: Analytical Solution Parameter: z pz 0, z 1,..., z 5 q P r0, 1s 6. Example: p 1,1,1 z 2 ` z 1 ` z 4 ` z 5 3z 2 z 1 4z 2 z 4 4z 1 z 4 z 2 z 3 2z 2 z 0 2z 1 z 3 3z 2 z 5 2z 4 z 3 z 1 z 0 3z 1 z 5 2z 4 z 0 4z 4 z 5 z 3 z 0 z 3 z 5 z 0 z 5 ` 2z 2 z1 2 ` 2z2 2 z 1 ` 4z 2 z4 2 ` 2z2 2 z 4 ` 4z 1 z4 2 ` 2z1 2 z 4 ` 2z2 2 z 0 ` 2z1 2 z 3 ` 2z 2 z5 2 ` 2z2 2 z 5 ` 2z4 2 z 3 ` 2z 1 z5 2 ` 2z1 2 z 5 ` 2z4 2 z 0 ` 2z 4 z5 2 ` 4z4 2 z 5 z2 2 z1 2 3z4 2 ` 2z4 3 z5 2 ` 6z 2 z 1 z 4 ` 2z 2 z 1 z 3 ` 2z 2 z 4 z 3 ` 2z 2 z 1 z 0 ` 4z 2 z 1 z 5 ` 4z 2 z 4 z 0 ` 4z 1 z 4 z 3 ` 6z 2 z 4 z 5 ` 2z 1 z 4 z 0 ` 6z 1 z 4 z 5 ` 2z 2 z 3 z 0 ` 2z 2 z 3 z 5 ` 2z 1 z 3 z 0 ` 2z 2 z 0 z 5 ` 2z 1 z 3 z 5 ` 2z 4 z 3 z 0 ` 2z 4 z 3 z 5 ` 2z 1 z 0 z 5 ` 2z 4 z 0 z 5. 2z 2 z 1 z 1 2z 4 z 3 z 0 2z 5 z 2 ` 2z 2 z 4 ` 2z 1 z 4 ` 2z 2 z 0 ` 2z 1 z 3 ` 2z 2 z 5 ` 2z 4 z 3 ` 2z 1 z 5 ` 2z 4 z 0 ` 4z 4 z 5 ` 2z 3 z 0 ` 2z 3 z 5 ` 2z 0 z 5 ` 2z4 2 ` 2z5 2
38 Non-I-characteristicity: Analytical Solution Parameter: z pz 0, z 1,..., z 5 q P r0, 1s 6. Example: p 1,1,1 z 2 ` z 1 ` z 4 ` z 5 3z 2 z 1 4z 2 z 4 4z 1 z 4 z 2 z 3 2z 2 z 0 2z 1 z 3 3z 2 z 5 2z 4 z 3 z 1 z 0 3z 1 z 5 2z 4 z 0 4z 4 z 5 z 3 z 0 z 3 z 5 z 0 z 5 ` 2z 2 z1 2 ` 2z2 2 z 1 ` 4z 2 z4 2 ` 2z2 2 z 4 ` 4z 1 z4 2 ` 2z1 2 z 4 ` 2z2 2 z 0 ` 2z1 2 z 3 ` 2z 2 z5 2 ` 2z2 2 z 5 ` 2z4 2 z 3 ` 2z 1 z5 2 ` 2z1 2 z 5 ` 2z4 2 z 0 ` 2z 4 z5 2 ` 4z4 2 z 5 z2 2 z1 2 3z4 2 ` 2z4 3 z5 2 ` 6z 2 z 1 z 4 ` 2z 2 z 1 z 3 ` 2z 2 z 4 z 3 ` 2z 2 z 1 z 0 ` 4z 2 z 1 z 5 ` 4z 2 z 4 z 0 ` 4z 1 z 4 z 3 ` 6z 2 z 4 z 5 ` 2z 1 z 4 z 0 ` 6z 1 z 4 z 5 ` 2z 2 z 3 z 0 ` 2z 2 z 3 z 5 ` 2z 1 z 3 z 0 ` 2z 2 z 0 z 5 ` 2z 1 z 3 z 5 ` 2z 4 z 3 z 0 ` 2z 4 z 3 z 5 ` 2z 1 z 0 z 5 ` 2z 4 z 0 z 5. 2z 2 z 1 z 1 2z 4 z 3 z 0 2z 5 z 2 ` 2z 2 z 4 ` 2z 1 z 4 ` 2z 2 z 0 ` 2z 1 z 3 ` 2z 2 z 5 ` 2z 4 z 3 ` 2z 1 z 5 ` 2z 4 z 0 ` 4z 4 z 5 ` 2z 3 z 0 ` 2z 3 z 5 ` 2z 0 z 5 ` 2z4 2 ` 2z5 2 We chose: z ` 1 10, 1 10, 1 10, 1 10, 1 10, 1 10.
39 Non-I-characteristicity: Analytical Solution Parameter: z pz 0, z 1,..., z 5 q P r0, 1s 6. Example: p 1,1,1 z 2 ` z 1 ` z 4 ` z 5 3z 2 z 1 4z 2 z 4 4z 1 z 4 z 2 z 3 2z 2 z 0 2z 1 z 3 3z 2 z 5 2z 4 z 3 z 1 z 0 3z 1 z 5 2z 4 z 0 4z 4 z 5 z 3 z 0 z 3 z 5 z 0 z 5 ` 2z 2 z1 2 ` 2z2 2 z 1 ` 4z 2 z4 2 ` 2z2 2 z 4 ` 4z 1 z4 2 ` 2z1 2 z 4 ` 2z2 2 z 0 ` 2z1 2 z 3 ` 2z 2 z5 2 ` 2z2 2 z 5 ` 2z4 2 z 3 ` 2z 1 z5 2 ` 2z1 2 z 5 ` 2z4 2 z 0 ` 2z 4 z5 2 ` 4z4 2 z 5 z2 2 z1 2 3z4 2 ` 2z4 3 z5 2 ` 6z 2 z 1 z 4 ` 2z 2 z 1 z 3 ` 2z 2 z 4 z 3 ` 2z 2 z 1 z 0 ` 4z 2 z 1 z 5 ` 4z 2 z 4 z 0 ` 4z 1 z 4 z 3 ` 6z 2 z 4 z 5 ` 2z 1 z 4 z 0 ` 6z 1 z 4 z 5 ` 2z 2 z 3 z 0 ` 2z 2 z 3 z 5 ` 2z 1 z 3 z 0 ` 2z 2 z 0 z 5 ` 2z 1 z 3 z 5 ` 2z 4 z 3 z 0 ` 2z 4 z 3 z 5 ` 2z 1 z 0 z 5 ` 2z 4 z 0 z 5. 2z 2 z 1 z 1 2z 4 z 3 z 0 2z 5 z 2 ` 2z 2 z 4 ` 2z 1 z 4 ` 2z 2 z 0 ` 2z 1 z 3 ` 2z 2 z 5 ` 2z 4 z 3 ` 2z 1 z 5 ` 2z 4 z 0 ` 4z 4 z 5 ` 2z 3 z 0 ` 2z 3 z 5 ` 2z 0 z 5 ` 2z4 2 ` 2z5 2 We chose: z ` 1 10, 1 10, 1 10, 1 10, 1 10, Universality: helps?
40 k 1, k 2 : universal, k 3 : char œ b 3 m 1 k m: I-characteristic Example X m t1, 2u, τ Xm Ppt1, 2uq, M 3. k 1 px, x 1 q k 2 px, x 1 q δ x,x 1: universal. k 3 px, x 1 q 2δ x,x 1 1: characteristic. Different constraints & Ppzq solution; same witness: useful. p 1,1,1 1 5, p 1,1,2 1 10, p 1,2,1 1 10, p 1,2,2 1 10, p 2,1,1 1 5, p 2,1,2 1 10, p 2,2,1 1 10, p 2,2,
41 Results [Szabó and Sriperumbudur, 2017] Proposition (characteristic property) b M m 1 k m: characteristic ñ pk m q M m 1 are characteristic. ö [ X m 2, k m px, x 1 q 2δ x,x 1 1]
42 Results [Szabó and Sriperumbudur, 2017] Proposition (characteristic property) b M m 1 k m: characteristic ñ pk m q M m 1 are characteristic. ö [ X m 2, k m px, x 1 q 2δ x,x 1 1] Proposition (I-characteristic property) k 1, k 2 : characteristic ñ k 1 b k 2 : I-characteristic. ð: ě 2. k 1, k 2, k 3 : characteristic œ b 3 m 1 k m: I-characteristic [Ex]. k 1, k 2 : universal, k 3 : char œ b 3 m 1 k m: I-characteristic [Ex].
43 Results: continued [Szabó and Sriperumbudur, 2017] Proposition (X m R dm, k m : continuous, shift-invariant, bounded) pk m q M m 1 -s are characteristic ô bm m 1 k m: I-characteristic ô b M m 1 k m: characteristic.
44 Results: continued [Szabó and Sriperumbudur, 2017] Proposition (X m R dm, k m : continuous, shift-invariant, bounded) pk m q M m 1 -s are characteristic ô bm m 1 k m: I-characteristic ô b M m 1 k m: characteristic. Proposition (Universality) b M m 1 k m: universal ô pk m q M m 1 are universal.
45 Summary We studied the validness of HSIC. HSIC ñ product structure: Space: X ˆMm 1 X m. Kernel: k b M m 1 k m. Complete answer in terms of k m -s.
46 Summary We studied the validness of HSIC. HSIC ñ product structure: Space: X ˆMm 1 X m. Kernel: k b M m 1 k m. Complete answer in terms of k m -s. ITE toolkit, preprint (with minor revision in JMLR):
47 Thank you for the attention! Acks: A part of the work was carried out while BKS was visiting ZSz at CMAP, École Polytechnique. BKS is supported by NSF-DMS
48 Balasubramanian, K., Li, T., and Yuan, M. (2017). On the optimality of kernel-embedding based goodness-of-fit tests. Technical report. ( Blanchard, G., Deshmukh, A. A., Dogan, U., Lee, G., and Scott, C. (2017). Domain generalization by marginal transfer learning. Technical report. ( Blanchard, G., Lee, G., and Scott, C. (2011). Generalizing from several related classification tasks to a new unlabeled sample. In Advances in Neural Information Processing Systems (NIPS), pages Borgwardt, K., Gretton, A., Rasch, M. J., Kriegel, H.-P., Schölkopf, B., and Smola, A. J. (2006).
49 Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics, 22:e Fukumizu, K., Gretton, A., Sun, X., and Schölkopf, B. (2008). Kernel measures of conditional dependence. In Neural Information Processing Systems (NIPS), pages Fukumizu, K., Song, L., and Gretton, A. (2013). Kernel Bayes rule: Bayesian inference with positive definite kernels. Journal of Machine Learning Research, 14: Gretton, A. (2015). A simpler condition for consistency of a kernel independence test. Technical report, University College London. (
50 Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., and Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13: Gretton, A., Bousquet, O., Smola, A., and Schölkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. In Algorithmic Learning Theory (ALT), pages Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., and Smola, A. J. (2008). A kernel statistical test of independence. In Neural Information Processing Systems (NIPS), pages Kim, B., Khanna, R., and Koyejo, O. O. (2016). Examples are not enough, learn to criticize! criticism for interpretability.
51 In Advances in Neural Information Processing Systems (NIPS), pages Kusano, G., Fukumizu, K., and Hiraoka, Y. (2016). Persistence weighted Gaussian kernel for topological data analysis. In International Conference on Machine Learning (ICML), pages Law, H. C. L., Sutherland, D. J., Sejdinovic, D., and Flaxman, S. (2018). Bayesian approaches to distribution regression. In International Conference on Artificial Intelligence and Statistics (AISTATS). Lloyd, J. R., Duvenaud, D., Grosse, R., Tenenbaum, J. B., and Ghahramani, Z. (2014). Automatic construction and natural-language description of nonparametric regression models.
52 In AAAI Conference on Artificial Intelligence, pages Lopez-Paz, D., Muandet, K., Schölkopf, B., and Tolstikhin, I. (2015). Towards a learning theory of cause-effect inference. International Conference on Machine Learning (ICML; PMLR), 37: Lyons, R. (2013). Distance covariance in metric spaces. The Annals of Probability, 41: Mooij, J. M., Peters, J., Janzing, D., Zscheischler, J., and Schölkopf, B. (2016). Distinguishing cause from effect using observational data: Methods and benchmarks. Journal of Machine Learning Research, 17: Muandet, K., Fukumizu, K., Dinuzzo, F., and Schölkopf, B. (2011).
53 Learning from distributions via support measure machines. In Neural Information Processing Systems (NIPS), pages Muandet, K., Fukumizu, K., Sriperumbudur, B., and Schölkopf, B. (2017). Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, 10(1-2): Park, M., Jitkrittum, W., and Sejdinovic, D. (2016). K2-ABC: Approximate Bayesian computation with kernel embeddings. In International Conference on Artificial Intelligence and Statistics (AISTATS; PMLR), volume 51, pages 51: Pfister, N., Bühlmann, P., Schölkopf, B., and Peters, J. (2017). Kernel-based tests for joint independence.
54 Journal of the Royal Statistical Society: Series B (Statistical Methodology). Schölkopf, B., Muandet, K., Fukumizu, K., Harmeling, S., and Peters, J. (2015). Computing functions of random variables via reproducing kernel Hilbert space representations. Statistics and Computing, 25(4): Sejdinovic, D., Sriperumbudur, B. K., Gretton, A., and Fukumizu, K. (2013). Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Annals of Statistics, 41: Song, L., Gretton, A., Bickson, D., Low, Y., and Guestrin, C. (2011). Kernel belief propagation. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages
55 Song, L., Smola, A., Gretton, A., Bedo, J., and Borgwardt, K. (2012). Feature selection via dependence maximization. Journal of Machine Learning Research, 13: Sriperumbudur, B. K., Gretton, A., Fukumizu, K., Schölkopf, B., and Lanckriet, G. R. (2010). Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11: Steinwart, I. (2001). On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 6(3): Strobl, E. V., Visweswaran, S., and Zhang, K. (2017). Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. Technical report.
56 ( Szabó, Z. and Sriperumbudur, B. (2017). Characteristic and universal tensor product kernels. Technical report. ( Szabó, Z., Sriperumbudur, B., Póczos, B., and Gretton, A. (2016). Learning theory for distribution regression. Journal of Machine Learning Research, 17(152):1 40. Yamada, M., Umezu, Y., Fukumizu, K., and Takeuchi, I. (2018). Post selection inference with kernels. In International Conference on Artificial Intelligence and Statistics (AISTATS; PMLR), volume 84, pages Zaheer, M., Kottur, S., Ravanbakhsh, S., Póczos, B., Salakhutdinov, R. R., and Smola, A. J. (2017). Deep sets.
57 In Advances in Neural Information Processing Systems (NIPS), pages Zhang, K., Schölkopf, B., Muandet, K., and Wang, Z. (2013). Domain adaptation under target and conditional shift. Journal of Machine Learning Research, 28(3):
Characterizing Independence with Tensor Product Kernels
Characterizing Independence with Tensor Product Kernels Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Department of Statistics, PSU December 13, 2017 Zolta n Szabo
More informationKernels. B.Sc. École Polytechnique September 4, 2018
CMAP B.Sc. Day @ École Polytechnique September 4, 2018 Inner product Ñ kernel: similarity between features Extension of kpx, yq x, y ř i x iy i : kpx, yq ϕpxq, ϕpyq H. Inner product Ñ kernel: similarity
More informationKernel-based Dependency Measures & Hypothesis Testing
Kernel-based Dependency Measures and Hypothesis Testing CMAP, École Polytechnique Carnegie Mellon University November 27, 2017 Outline Motivation. Diverse set of domains: kernel. Dependency measures: Kernel
More informationDistribution Regression: A Simple Technique with Minimax-optimal Guarantee
Distribution Regression: A Simple Technique with Minimax-optimal Guarantee (CMAP, École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department,
More informationarxiv: v1 [stat.ml] 28 Aug 2017
Characteristic and Universal Tensor Product Kernels arxiv:1708.08157v1 [stat.l] 8 Aug 017 Zoltán Szabó CAP, École Polytechnique Route de Saclay, 9118 Palaiseau, France Bharath K. Sriperumbudur Department
More informationA Linear-Time Kernel Goodness-of-Fit Test
A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum 1; Wenkai Xu 1 Zoltán Szabó 2 NIPS 2017 Best paper! Kenji Fukumizu 3 Arthur Gretton 1 wittawatj@gmail.com 1 Gatsby Unit, University College
More informationHilbert Space Embedding of Probability Measures
Lecture 2 Hilbert Space Embedding of Probability Measures Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 2017 Recap of Lecture
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney
More informationDistribution Regression
Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481
More informationLearning Interpretable Features to Compare Distributions
Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections
More informationLearning features to compare distributions
Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this
More informationLearning features to compare distributions
Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this
More informationHilbert Space Representations of Probability Distributions
Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationMinimax-optimal distribution regression
Zoltán Szabó (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) ISNPS, Avignon June 12,
More informationKernel methods for Bayesian inference
Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20
More informationDistribution Regression with Minimax-Optimal Guarantee
Distribution Regression with Minimax-Optimal Guarantee (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton
More informationBig Hypothesis Testing with Kernel Embeddings
Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big
More informationDependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise
Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Makoto Yamada and Masashi Sugiyama Department of Computer Science, Tokyo Institute of Technology
More informationRecovering Distributions from Gaussian RKHS Embeddings
Motonobu Kanagawa Graduate University for Advanced Studies kanagawa@ism.ac.jp Kenji Fukumizu Institute of Statistical Mathematics fukumizu@ism.ac.jp Abstract Recent advances of kernel methods have yielded
More informationHypothesis Testing with Kernel Embeddings on Interdependent Data
Hypothesis Testing with Kernel Embeddings on Interdependent Data Dino Sejdinovic Department of Statistics University of Oxford joint work with Kacper Chwialkowski and Arthur Gretton (Gatsby Unit, UCL)
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationGeneralized clustering via kernel embeddings
Generalized clustering via kernel embeddings Stefanie Jegelka 1, Arthur Gretton 2,1, Bernhard Schölkopf 1, Bharath K. Sriperumbudur 3, and Ulrike von Luxburg 1 1 Max Planck Institute for Biological Cybernetics,
More informationCausal Inference in Time Series via Supervised Learning
Causal Inference in Time Series via Supervised Learning Yoichi Chikahara and Akinori Fujino NTT Communication Science Laboratories, Kyoto 619-0237, Japan chikahara.yoichi@lab.ntt.co.jp, fujino.akinori@lab.ntt.co.jp
More informationKernel Bayes Rule: Nonparametric Bayesian inference with kernels
Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December
More informationOn a Nonparametric Notion of Residual and its Applications
On a Nonparametric Notion of Residual and its Applications Bodhisattva Sen and Gábor Székely arxiv:1409.3886v1 [stat.me] 12 Sep 2014 Columbia University and National Science Foundation September 16, 2014
More informationCausality. Bernhard Schölkopf and Jonas Peters MPI for Intelligent Systems, Tübingen. MLSS, Tübingen 21st July 2015
Causality Bernhard Schölkopf and Jonas Peters MPI for Intelligent Systems, Tübingen MLSS, Tübingen 21st July 2015 Charig et al.: Comparison of treatment of renal calculi by open surgery, (...), British
More informationFeature-to-Feature Regression for a Two-Step Conditional Independence Test
Feature-to-Feature Regression for a Two-Step Conditional Independence Test Qinyi Zhang Department of Statistics University of Oxford qinyi.zhang@stats.ox.ac.uk Sarah Filippi School of Public Health Department
More informationLeast-Squares Independence Test
IEICE Transactions on Information and Sstems, vol.e94-d, no.6, pp.333-336,. Least-Squares Independence Test Masashi Sugiama Toko Institute of Technolog, Japan PRESTO, Japan Science and Technolog Agenc,
More informationDistinguishing Causes from Effects using Nonlinear Acyclic Causal Models
JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University
More informationTwo-stage Sampled Learning Theory on Distributions
Two-stage Sampled Learning Theory on Distributions Zoltán Szabó (Gatsby Unit, UCL) Joint work with Arthur Gretton (Gatsby Unit, UCL), Barnabás Póczos (ML Department, CMU), Bharath K. Sriperumbudur (Department
More informationKernel Measures of Conditional Dependence
Kernel Measures of Conditional Dependence Kenji Fukumizu Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 6-8569 Japan fukumizu@ism.ac.jp Arthur Gretton Max-Planck Institute for
More informationHilbert Schmidt Independence Criterion
Hilbert Schmidt Independence Criterion Thanks to Arthur Gretton, Le Song, Bernhard Schölkopf, Olivier Bousquet Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au
More informationRobust Low Rank Kernel Embeddings of Multivariate Distributions
Robust Low Rank Kernel Embeddings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.edu, bodai@gatech.edu Abstract Kernel embedding of
More informationPackage dhsic. R topics documented: July 27, 2017
Package dhsic July 27, 2017 Type Package Title Independence Testing via Hilbert Schmidt Independence Criterion Version 2.0 Date 2017-07-24 Author Niklas Pfister and Jonas Peters Maintainer Niklas Pfister
More informationAdvances in kernel exponential families
Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationarxiv: v1 [cs.lg] 22 Jun 2009
Bayesian two-sample tests arxiv:0906.4032v1 [cs.lg] 22 Jun 2009 Karsten M. Borgwardt 1 and Zoubin Ghahramani 2 1 Max-Planck-Institutes Tübingen, 2 University of Cambridge June 22, 2009 Abstract In this
More informationFast Two Sample Tests using Smooth Random Features
Kacper Chwialkowski University College London, Computer Science Department Aaditya Ramdas Carnegie Mellon University, Machine Learning and Statistics School of Computer Science Dino Sejdinovic University
More informationDeep Convolutional Neural Networks for Pairwise Causality
Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, Delhi Tata Consultancy Services Ltd. {karamjit.singh,
More informationApproximate Kernel Methods
Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression
More informationProbabilistic latent variable models for distinguishing between cause and effect
Probabilistic latent variable models for distinguishing between cause and effect Joris M. Mooij joris.mooij@tuebingen.mpg.de Oliver Stegle oliver.stegle@tuebingen.mpg.de Dominik Janzing dominik.janzing@tuebingen.mpg.de
More informationA Linear-Time Kernel Goodness-of-Fit Test
A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum Gatsby Unit, UCL wittawatj@gmail.com Wenkai Xu Gatsby Unit, UCL wenkaix@gatsby.ucl.ac.uk Zoltán Szabó CMAP, École Polytechnique zoltan.szabo@polytechnique.edu
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationCausal Inference on Discrete Data via Estimating Distance Correlations
ARTICLE CommunicatedbyAapoHyvärinen Causal Inference on Discrete Data via Estimating Distance Correlations Furui Liu frliu@cse.cuhk.edu.hk Laiwan Chan lwchan@cse.euhk.edu.hk Department of Computer Science
More informationStatistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Michel Besserve MPI for Intelligent Systems, Tübingen michel.besserve@tuebingen.mpg.de Nikos K. Logothetis MPI
More informationForefront of the Two Sample Problem
Forefront of the Two Sample Problem From classical to state-of-the-art methods Yuchi Matsuoka What is the Two Sample Problem? X ",, X % ~ P i. i. d Y ",, Y - ~ Q i. i. d Two Sample Problem P = Q? Example:
More informationDistinguishing Causes from Effects using Nonlinear Acyclic Causal Models
Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University of Helsinki 14 Helsinki, Finland kun.zhang@cs.helsinki.fi Aapo Hyvärinen
More informationNonparametric Tree Graphical Models via Kernel Embeddings
Le Song, 1 Arthur Gretton, 1,2 Carlos Guestrin 1 1 School of Computer Science, Carnegie Mellon University; 2 MPI for Biological Cybernetics Abstract We introduce a nonparametric representation for graphical
More informationGaussian Processes. Recommended reading: Rasmussen/Williams: Chapters 1, 2, 4, 5 Deisenroth & Ng (2015)[3] Data Analysis and Probabilistic Inference
Data Analysis and Probabilistic Inference Gaussian Processes Recommended reading: Rasmussen/Williams: Chapters 1, 2, 4, 5 Deisenroth & Ng (2015)[3] Marc Deisenroth Department of Computing Imperial College
More informationManifold Regularization
Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1 The Problem of Learning is
More informationA Kernel Test for Three-Variable Interactions
A Kernel Test for Three-Variable Interactions Dino Sejdinovic, Arthur Gretton Gatsby Unit, CSML, UCL, UK {dino.sejdinovic, arthur.gretton}@gmail.com Wicher Bergsma Department of Statistics, LSE, UK w.p.bergsma@lse.ac.uk
More informationAdaptive HMC via the Infinite Exponential Family
Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family
More informationMaximum Mean Discrepancy
Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationRobust Support Vector Machines for Probability Distributions
Robust Support Vector Machines for Probability Distributions Andreas Christmann joint work with Ingo Steinwart (Los Alamos National Lab) ICORS 2008, Antalya, Turkey, September 8-12, 2008 Andreas Christmann,
More informationHypothesis Testing with Kernels
(Gatsby Unit, UCL) PRNI, Trento June 22, 2016 Motivation: detecting differences in AM signals Amplitude modulation: simple technique to transmit voice over radio. in the example: 2 songs. Fragments from
More informationHilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems
with Applications to Dynamical Systems Le Song Jonathan Huang School of Computer Science, Carnegie Mellon University Alex Smola Yahoo! Research, Santa Clara, CA, USA Kenji Fukumizu Institute of Statistical
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationDistinguishing between Cause and Effect: Estimation of Causal Graphs with two Variables
Distinguishing between Cause and Effect: Estimation of Causal Graphs with two Variables Jonas Peters ETH Zürich Tutorial NIPS 2013 Workshop on Causality 9th December 2013 F. H. Messerli: Chocolate Consumption,
More informationCausal Inference via Kernel Deviance Measures
Causal Inference via Kernel Deviance Measures Jovana Mitrovic Department of Statistics University of Oxford Dino Sejdinovic Department of Statistics University of Oxford Yee Whye Teh Department of Statistics
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationHilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems
with Applications to Dynamical Systems Le Song Jonathan Huang School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Alex Smola Yahoo! Research, Santa Clara, CA 95051, USA Kenji
More informationA Kernelized Stein Discrepancy for Goodness-of-fit Tests
Qiang Liu QLIU@CS.DARTMOUTH.EDU Computer Science, Dartmouth College, NH, 3755 Jason D. Lee JASONDLEE88@EECS.BERKELEY.EDU Michael Jordan JORDAN@CS.BERKELEY.EDU Department of Electrical Engineering and Computer
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationIndependent Subspace Analysis
Independent Subspace Analysis Barnabás Póczos Supervisor: Dr. András Lőrincz Eötvös Loránd University Neural Information Processing Group Budapest, Hungary MPI, Tübingen, 24 July 2007. Independent Component
More informationMathematische Methoden der Unsicherheitsquantifizierung
Mathematische Methoden der Unsicherheitsquantifizierung Oliver Ernst Professur Numerische Mathematik Sommersemester 2014 Contents 1 Introduction 1.1 What is Uncertainty Quantification? 1.2 A Case Study:
More informationarxiv: v2 [stat.ml] 3 Sep 2015
Nonparametric Independence Testing for Small Sample Sizes arxiv:46.9v [stat.ml] 3 Sep 5 Aaditya Ramdas Dept. of Statistics and Machine Learning Dept. Carnegie Mellon University aramdas@cs.cmu.edu September
More informationUnsupervised Nonparametric Anomaly Detection: A Kernel Method
Fifty-second Annual Allerton Conference Allerton House, UIUC, Illinois, USA October - 3, 24 Unsupervised Nonparametric Anomaly Detection: A Kernel Method Shaofeng Zou Yingbin Liang H. Vincent Poor 2 Xinghua
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationCausal Modeling with Generative Neural Networks
Causal Modeling with Generative Neural Networks Michele Sebag TAO, CNRS INRIA LRI Université Paris-Sud Joint work: D. Kalainathan, O. Goudet, I. Guyon, M. Hajaiej, A. Decelle, C. Furtlehner https://arxiv.org/abs/1709.05321
More informationarxiv: v1 [cs.lg] 3 Jan 2017
Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, New-Delhi, India January 4, 2017 arxiv:1701.00597v1
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationInference of Cause and Effect with Unsupervised Inverse Regression
Inference of Cause and Effect with Unsupervised Inverse Regression Eleni Sgouritsa Dominik Janzing Philipp Hennig Bernhard Schölkopf Max Planck Institute for Intelligent Systems, Tübingen, Germany {eleni.sgouritsa,
More informationLecture 2: Mappings of Probabilities to RKHS and Applications
Lecture 2: Mappings of Probabilities to RKHS and Applications Lille, 214 Arthur Gretton Gatsby Unit, CSML, UCL Detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange
More informationDistributed Gaussian Processes
Distributed Gaussian Processes Marc Deisenroth Department of Computing Imperial College London http://wp.doc.ic.ac.uk/sml/marc-deisenroth Gaussian Process Summer School, University of Sheffield 15th September
More informationKernel Adaptive Metropolis-Hastings
Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1
More informationSemi-Supervised Learning through Principal Directions Estimation
Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de
More informationFoundations of Causal Inference
Foundations of Causal Inference Causal inference is usually concerned with exploring causal relations among random variables X 1,..., X n after observing sufficiently many samples drawn from the joint
More informationFeature Selection via Dependence Maximization
Journal of Machine Learning Research 1 (2007) xxx-xxx Submitted xx/07; Published xx/07 Feature Selection via Dependence Maximization Le Song lesong@it.usyd.edu.au NICTA, Statistical Machine Learning Program,
More informationarxiv: v1 [cs.ai] 22 Apr 2015
Distinguishing Cause from Effect Based on Exogeneity [Extended Abstract] Kun Zhang MPI for Intelligent Systems 7276 Tübingen, Germany & Info. Sci. Inst., USC 4676 Admiralty Way, CA 9292 kzhang@tuebingen.mpg.de
More informationImportance Reweighting Using Adversarial-Collaborative Training
Importance Reweighting Using Adversarial-Collaborative Training Yifan Wu yw4@andrew.cmu.edu Tianshu Ren tren@andrew.cmu.edu Lidan Mu lmu@andrew.cmu.edu Abstract We consider the problem of reweighting a
More informationNon-Linear Regression for Bag-of-Words Data via Gaussian Process Latent Variable Set Model
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Non-Linear Regression for Bag-of-Words Data via Gaussian Process Latent Variable Set Model Yuya Yoshikawa Nara Institute of Science
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationKernels for Multi task Learning
Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano
More informationHuman-level concept learning through probabilistic program induction
B.M Lake, R. Salakhutdinov, J.B. Tenenbaum Human-level concept learning through probabilistic program induction journal club at two aspects in which machine learning spectacularly lags behind human learning
More informationA Kernel Statistical Test of Independence
A Kernel Statistical Test of Independence Arthur Gretton MPI for Biological Cybernetics Tübingen, Germany arthur@tuebingen.mpg.de Kenji Fukumizu Inst. of Statistical Mathematics Tokyo Japan fukumizu@ism.ac.jp
More informationDoubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London
Doubly Stochastic Inference for Deep Gaussian Processes Hugh Salimbeni Department of Computing Imperial College London 29/5/2017 Motivation DGPs promise much, but are difficult to train Doubly Stochastic
More informationThe Perturbed Variation
The Perturbed Variation Maayan Harel Department of Electrical Engineering Technion, Haifa, Israel maayanga@tx.technion.ac.il Shie Mannor Department of Electrical Engineering Technion, Haifa, Israel shie@ee.technion.ac.il
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationCS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings
CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,
More informationEcological inference with distribution regression
Ecological inference with distribution regression Seth Flaxman 10 May 2017 Department of Politics and International Relations Ecological inference I How to draw conclusions about individuals from aggregate-level
More informationGeneralized Transfer Component Analysis for Mismatched Jpeg Steganalysis
Generalized Transfer Component Analysis for Mismatched Jpeg Steganalysis Xiaofeng Li 1, Xiangwei Kong 1, Bo Wang 1, Yanqing Guo 1,Xingang You 2 1 School of Information and Communication Engineering Dalian
More informationDomain Adaptation with Conditional Transferable Components
Mingming Gong 1 MINGMING.GONG@STUDENT.UTS.EDU.AU Kun Zhang 2,3 KUNZ1@CMU.EDU Tongliang Liu 1 TLIANG.LIU@GMAIL.COM Dacheng Tao 1 DACHENG.TAO@UTS.EDU.AU Clark Glymour 2 CG09@ANDREW.CMU.EDU Bernhard Schölkopf
More information