Kernel-based Dependency Measures & Hypothesis Testing

Size: px
Start display at page:

Download "Kernel-based Dependency Measures & Hypothesis Testing"

Transcription

1 Kernel-based Dependency Measures and Hypothesis Testing CMAP, École Polytechnique Carnegie Mellon University November 27, 2017

2 Outline Motivation. Diverse set of domains: kernel. Dependency measures: Kernel canonical correlation analysis. Maximum mean discrepancy. Hilbert-Schmidt independence criterion. Hypothesis testing.

3 Dependency Measures as Objective Functions

4 Outlier-robust image registration [Kybic, 2004, Neemuchwala et al., 2007] Given two images: Goal: find the transformation which takes the right one to the left.

5 Outlier-robust image registration [Kybic, 2004, Neemuchwala et al., 2007] Given two images: Goal: find the transformation which takes the right one to the left.

6 Outlier-robust image registration: equations Reference image: y ref, test image: y test, possible transformations: Θ. Objective: In the example: I =KCCA. Jpθq I py ref, y test pθqq looooooomooooooon similarity Ñ max θpθ.

7 Independent subspace analysis [Cardoso, 1998] ext. ÐÝÝ ICA Cocktail party problem: independent groups of people / music bands, observation = mixed sources.

8 ISA equations Observation: x t As t, s Goal: ŝ from tx 1,..., x T u. Assumptions: independent groups: I `s 1,..., s M 0, s m -s: non-gaussian, A: invertible. s 1 ;... ; s Mı.

9 ISA solution Find W which makes the estimated components independent: y Wx y 1 ;... ; y Mı, JpWq I y M 1,..., y Ñ min. W

10 Distribution regression [Póczos et al., 2013, Szabó et al., 2016]. Sustainability Goal: aerosol prediction = air pollution Ñ climate. Prediction using labelled bags: bag := multi-spectral satellite measurements over an area, label := local aerosol value.

11 Objects in the bags Examples: time-series modelling: user = set of time-series, computer vision: image = collection of patch vectors, NLP: corpus = bag of documents, network analysis: group of people = bag of friendship graphs,...

12 Objects in the bags Examples: time-series modelling: user = set of time-series, computer vision: image = collection of patch vectors, NLP: corpus = bag of documents, network analysis: group of people = bag of friendship graphs,... Wider context (statistics): point estimation tasks.

13 Regression on labelled bags Given: labelled bags: ẑ `ˆP i, y i ( l i 1, ˆP i : bag from P i, N : ˆP i. test bag: ˆP.

14 Regression on labelled bags Given: labelled bags: ẑ `ˆP i, y i ( l i 1, ˆP i : bag from P i, N : ˆP i. test bag: ˆP. Estimator: f λ ẑ 1 arg min f PH l ÿ l i 1 f ` ı 2 lomon µˆp i yi ` λ }f } 2 feature of ˆP i H.

15 Regression on labelled bags Given: labelled bags: ẑ `ˆP i, y i ( l i 1, ˆP i : bag from P i, N : ˆP i. test bag: ˆP. Estimator: f λ ẑ Prediction: 1 ÿ l arg min f PH K l i 1 ŷ`ˆp g T pg ` lλiq 1 y, f `µˆp i yi ı 2 ` λ }f } 2 H K. g K`µˆP, µˆp i, G K`µˆP i, µˆp j, y ryi s.

16 Regression on labelled bags Given: labelled bags: ẑ `ˆP i, y i ( l i 1, ˆP i : bag from P i, N : ˆP i. test bag: ˆP. Estimator: f λ ẑ Prediction: 1 ÿ l arg min f PH K l i 1 ŷ`ˆp g T pg ` lλiq 1 y, f `µˆp i yi ı 2 ` λ }f } 2 H K. g K`µˆP, µˆp i, G K`µˆP i, µˆp j, y ryi s. Inner product of distributions K`µˆP i, µˆp j =?

17 Feature selection Goal: find the feature subset (# of rooms, criminal rate, local taxes) most relevant for house price prediction (y).

18 Feature selection: equations Features: x 1,..., x F. Subset: S Ď t1,..., F u. MaxRelevance - MinRedundancy principle [Peng et al., 2005]: JpSq 1 S ÿ ips I `x i, y 1 S 2 ÿ i,jps I `x i, x j Ñ max. SĎt1,...,F u

19 Hypothesis Testing

20 Example-1 (2-sample testing): NLP Given: 2 categories of documents (Bayesian inference, neuroscience). Task: test their distinguishability, most discriminative words Ñ interpretability.

21 Example-1 (2-sample testing): NLP Given: 2 categories of documents (Bayesian inference, neuroscience). Task: test their distinguishability, most discriminative words Ñ interpretability. Do tx i u and ty j u come from the same distribution, i.e. P x P y?

22 Example-2 (2-sample testing): computer vision Given: two sets of faces (happy, angry). Task: check if they are different, determine the most discriminative features/regions.

23 Example-3 (2-sample testing): audio Amplitude modulation: simple technique to transmit voice over radio. in the example: 2 songs. Fragments from song 1 P x, song 2 P y.

24 Example: independence testing-1 We are given paired samples. Task: test independence. Examples: (song, year of release) pairs

25 Example: independence testing-1 We are given paired samples. Task: test independence. Examples: (song, year of release) pairs (video, caption) pairs

26 Example: independence testing-1 We are given paired samples. Task: test independence. Examples: (song, year of release) pairs (video, caption) pairs tpx i, y i qu n i 1? ÝÑ P xy P x P y.

27 Example: independence testing-2 How do we detect dependency? (paired samples) x 1 : Honourable senators, I have a question for the Leader of the Government in the Senate with regard to the support funding to farmers that has been announced. Most farmers have not received any money yet. x 2 : No doubt there is great pressure on provincial and municipal governments in relation to the issue of child care, but the reality is that there have been no cuts to child care funding from the federal government to the provinces. In fact, we have increased federal investments for early childhood development.... y 1 : Honorables sénateurs, ma question s adresse au leader du gouvernement au Sénat et concerne l aide financiére qu on a annoncée pour les agriculteurs. La plupart des agriculteurs n ont encore rien reu de cet argent. y 2 : Il est évident que les ordres de gouvernements provinciaux et municipaux subissent de fortes pressions en ce qui concerne les services de garde, mais le gouvernement n a pas réduit le financement qu il verse aux provinces pour les services de garde. Au contraire, nous avons augmenté le financement fédéral pour le développement des jeunes enfants....

28 Example: independence testing-2 How do we detect dependency? (paired samples) x 1 : Honourable senators, I have a question for the Leader of the Government in the Senate with regard to the support funding to farmers that has been announced. Most farmers have not received any money yet. x 2 : No doubt there is great pressure on provincial and municipal governments in relation to the issue of child care, but the reality is that there have been no cuts to child care funding from the federal government to the provinces. In fact, we have increased federal investments for early childhood development.... y 1 : Honorables sénateurs, ma question s adresse au leader du gouvernement au Sénat et concerne l aide financiére qu on a annoncée pour les agriculteurs. La plupart des agriculteurs n ont encore rien reu de cet argent. y 2 : Il est évident que les ordres de gouvernements provinciaux et municipaux subissent de fortes pressions en ce qui concerne les services de garde, mais le gouvernement n a pas réduit le financement qu il verse aux provinces pour les services de garde. Au contraire, nous avons augmenté le financement fédéral pour le développement des jeunes enfants.... Are the French paragraphs translations of the English ones, or have nothing to do with it, i.e. P xy P x P y?

29 Diverse Set of Domains

30 Classical information theory Kullback-Leibler divergence: ż j ppxq KL pp, Qq ppxq log dx. R d qpxq

31 Classical information theory Kullback-Leibler divergence: ż j ppxq KL pp, Qq ppxq log dx. R d qpxq Mutual information: I ppq KL P, b M m 1P m.

32 Classical information theory Kullback-Leibler divergence: ż j ppxq KL pp, Qq ppxq log dx. R d qpxq Mutual information: Properties: I ppq ě 0. I ppq KL P, b M m 1P m.

33 Classical information theory Kullback-Leibler divergence: ż j ppxq KL pp, Qq ppxq log dx. R d qpxq Mutual information: I ppq KL Properties: I ppq ě 0. I ppq 0 ô P b M m 1 P m. P, b M m 1P m.

34 Classical information theory Kullback-Leibler divergence: ż j ppxq KL pp, Qq ppxq log dx. R d qpxq Mutual information: I ppq KL Properties: I ppq ě 0. I ppq 0 ô P b M m 1 P m. P, b M m 1P m. Alternatives: Rényi, Tsallis, L 2 divergence... Typically: X R d.

35 Euclidean space Ñ inner product Ñ kernel Extension of kpx, yq x T y leads to kernels.

36 Euclidean space Ñ inner product Ñ kernel Extension of kpx, yq x T y leads to kernels. Why?

37 Euclidean space Ñ inner product Ñ kernel Extension of kpx, yq x T y leads to kernels. Why? Classification (SVM):

38 Euclidean space Ñ inner product Ñ kernel Extension of kpx, yq x T y leads to kernels. Why? Classification (SVM):

39 Euclidean space Ñ inner product Ñ kernel Extension of kpx, yq x T y leads to kernels. Why? Classification (SVM):

40 Euclidean space Ñ inner product Ñ kernel Extension of kpx, yq x T y leads to kernels. Why? Classification (SVM): Representation of distributions: Example: ϕpxq x: mean. P ÞÑ E x P ϕpxq.

41 Representations of distributions: EX Given: 2 Gaussians with different means. Solution: t-test Two Gaussian variables: different means P Q pdf

42 Representations of distributions: EX 2 Setup: 2 Gaussians; same means, different variances. Idea: look at the 2nd-order features of RVs Two Gaussian variables: different variances P Q pdf

43 Representations of distributions: EX 2 Setup: 2 Gaussians; same means, different variances. Idea: look at the 2nd-order features of RVs. ϕ x x 2 ñ difference in EX Two Gaussian variables: different variances P Q Pdf s of X 2 P Q pdf pdf

44 Representations of distributions: further moments Setup: a Gaussian and a Laplacian distribution. Challenge: their means and variances are the same. Idea: look at higher-order features. pdf Gaussian & Laplacian variables P Q ϕpxq e i,x : characteristic function, X R d.

45 Kernels: why? continued Idea: P xy ÞÑ C xy. Covariance matrix C xy E xy px Exq py Eyq T ı

46 Kernels: why? continued Idea: P xy ÞÑ C xy. Covariance matrix C xy E xy px Exq py Eyq T ı, S }C xy } F

47 Kernels: why? continued Idea: P xy ÞÑ C xy. Covariance matrix C xy E xy px ı Exq py Eyq T,? S }C xy } F 0

48 Kernels: why? continued Idea: P xy ÞÑ C xy. Covariance matrix C xy E xy px Exq py Eyq T ı, S }C xy } F? 0 Ø linear dependence.

49 Kernels: why? continued Idea: P xy ÞÑ C xy. Covariance matrix C xy E xy px Exq py Eyq T ı, S }C xy } F? 0 Ø linear dependence. Covariance operator: take features of x and y C xy E xy pϕpxq looooooooomooooooooon Ex ϕpxqq b pψpyq E y ψpyqq centering in feature space

50 Kernels: why? continued Idea: P xy ÞÑ C xy. Covariance matrix C xy E xy px Exq py Eyq T ı, S }C xy } F? 0 Ø linear dependence. Covariance operator: take features of x and y C xy E xy pϕpxq looooooooomooooooooon Ex ϕpxqq b pψpyq E y ψpyqq, centering in feature space S }C xy } HS : HSICpP xy q. We capture non-linear dependencies via ϕ, ψ!

51 Kernel: similarity between features Given: x and x 1 P X objects (images, texts,... ).

52 Kernel: similarity between features Given: x and x 1 P X objects (images, texts,... ). Question: how similar they are?

53 Kernel: similarity between features Given: x and x 1 P X objects (images, texts,... ). Question: how similar they are? Define features of the objects: ϕpxq : features of x, ϕpx 1 q : features of x 1. Kernel: inner product of these features kpx, x 1 q : ϕpxq, ϕpx 1 q H.

54 RKHS: intuition k defines an RKHS: H k Ă tf : X Ñ R functionsu. Elements: lomon kp, xq P H k.

55 RKHS: intuition k defines an RKHS: H k Ă tf : X Ñ R functionsu. Elements: lomon kp, xq P H k. f ř n i 1 c ikp, x i q

56 RKHS: intuition k defines an RKHS: H k Ă tf : X Ñ R functionsu. Elements: lomon kp, xq P H k. f ř n i 1 c ikp, x i q Ø c P R n [SVM dual].

57 RKHS: intuition k defines an RKHS: H k Ă tf : X Ñ R functionsu. Elements: lomon kp, xq P H k. f ř n i 1 c ikp, x i q Ø c P R n [SVM dual]. Inner product: lomon kp, xq, kp, x 1 loomoonq kpx, x 1 q [evaluation]. ϕpxq ϕpx 1 q H k

58 RKHS: intuition k defines an RKHS: H k Ă tf : X Ñ R functionsu. Elements: lomon kp, xq P H k. f ř n i 1 c ikp, x i q Ø c P R n [SVM dual]. Inner product: lomon kp, xq, kp, x 1 loomoonq kpx, x 1 q [evaluation]. ϕpxq ϕpx 1 q Practically H k Q f f pcq. Enough: kpx, x 1 q. G rkpx i, x j qs n i,j 1 ě 0. H k

59 Diverse set of domains, kernel examples X Rd, γ ą 0: kp px, yq phx, yi ` γqp, ke px, yq e γ}x y}2, Zolt an Szab o 2 kg px, yq e γ}x y}2, 1 kc px, yq 1 `. γ }x y}22

60 Diverse set of domains, kernel examples X Rd, γ ą 0: kp px, yq phx, yi ` γqp, ke px, yq e γ}x y}2, 2 kg px, yq e γ}x y}2, 1 kc px, yq 1 `. γ }x y}22 X = strings, texts: r -spectrum kernel: # of common ď r -substrings. Zolt an Szab o

61 Diverse set of domains, kernel examples X Rd, γ ą 0: kp px, yq phx, yi ` γqp, ke px, yq e γ}x y}2, 2 kg px, yq e γ}x y}2, 1 kc px, yq 1 `. γ }x y}22 X = strings, texts: r -spectrum kernel: # of common ď r -substrings. X = time-series: dynamic time-warping. Zolt an Szab o

62 Diverse set of domains, kernel examples X Rd, γ ą 0: kp px, yq phx, yi ` γqp, ke px, yq e γ}x y}2, 2 kg px, yq e γ}x y}2, 1 kc px, yq 1 `. γ }x y}22 X = strings, texts: r -spectrum kernel: # of common ď r -substrings. X = time-series: dynamic time-warping. X = trees, graphs, dynamical systems, sets, permutations,... Zolt an Szab o

63 Independence measures Given: random variable px, yq P X ˆ Y, px, yq P xy. Goal: measure the dependence of x and y.

64 Independence measures Given: random variable px, yq P X ˆ Y, px, yq P xy. Goal: measure the dependence of x and y. Desiderata for a QpP xy q independence measure [Rényi, 1959]: 1. QpP xy q is well-defined, 2. QpP xy q P r0, 1s, 3. QpP xy q 0 iff. x K y. 4. QpP xy q 1 iff. y f pxq or x gpyq.

65 Independence measures He showed: satisfies 1-4. Too ambitious: computationally intractable. many functions. QpP xy q sup corrpf pxq, gpyqq, f,g

66 Independence measures: restriction to continuous functions C b px q tf : X Ñ R, bounded continuousu would also work. Still too large!

67 Independence measures: restriction to continuous functions C b px q tf : X Ñ R, bounded continuousu would also work. Still too large! Idea: certain H k function classes are dense in C b px q. computionally tractable.

68 Wanted Independence measure, distance, inner product measures/estimates on probability distributions without density estimation!

69 Kernel Canonical Correlation Analysis (KCCA)

70 KCCA: definition Given: k : X ˆ X Ñ R, l : Y ˆ Y Ñ R. Associated RKHS-s: H k, H l.

71 KCCA: definition Given: k : X ˆ X Ñ R, l : Y ˆ Y Ñ R. Associated RKHS-s: H k, H l. KCCA measure of px, yq P X ˆ Y: ρ KCCA px, yq sup corrpf pxq, gpyqq, f PH k,gph l corrpf pxq, gpyqq cov xypf pxq, gpyqq a varx f pxq var y gpyq.

72 KCCA: estimation Empirical estimate of KCCA from tpx i, y i qu n i 1 : {ρ KCCA px, yq sup cpr n,dpr n b c T p G x q 2 c c T G x G y d b. d T p G y q 2 d

73 KCCA: estimation Empirical estimate of KCCA from tpx i, y i qu n i 1 : {ρ KCCA px, yq sup cpr n,dpr n b c T p G x q 2 c c T G x G y d b. d T p G y q 2 d Centered Gram matrix: G x rkpx i, x j qs n i,j 1, G x H n G x H n looomooon centering in feature space, H n I n 1nˆn n.

74 KCCA: estimation Empirical estimate of KCCA from tpx i, y i qu n i 1 : {ρ KCCA px, yq sup cpr n,dpr n b c T p G x q 2 c c T G x G y d b. d T p G y q 2 d Centered Gram matrix: G x rkpx i, x j qs n i,j 1, G x H n G x H n looomooon centering in feature space, H n I n 1nˆn n. c, d: come from a generalized eigenvalue problem (Av λbv).

75 KCCA: estimation Empirical estimate of KCCA from tpx i, y i qu n i 1 : {ρ KCCA px, yq sup cpr n,dpr n b c T p G x q 2 c c T G x G y d b. d T p G y q 2 d Centered Gram matrix: G x rkpx i, x j qs n i,j 1, G x H n G x H n looomooon centering in feature space, H n I n 1nˆn n. c, d: come from a generalized eigenvalue problem (Av λbv). A bit of regularization: G x Ñ G x ` κi n, G y Ñ...

76 KCCA as an independence measure If x K y, then ρ KCCA px, y; κq 0. Opposite direction: For rich H k, H l [Bach and Jordan, 2002, Gretton et al., 2005].

77 KCCA as an independence measure If x K y, then ρ KCCA px, y; κq 0. Opposite direction: For rich H k, H l [Bach and Jordan, 2002, Gretton et al., 2005]. Enough: universal kernel = H k dense in C b px q.

78 KCCA as an independence measure If x K y, then ρ KCCA px, y; κq 0. Opposite direction: For rich H k, H l [Bach and Jordan, 2002, Gretton et al., 2005]. Enough: universal kernel = H k dense in C b px q. Example: Gaussian, Laplacian kernel.

79 ITE toolbox Estimators for dependency measures (Q KCCA), distances on distributions (Q MMD), independence of random variables (Q HSIC), and many more... Link:

80 ITE toolbox Estimators for dependency measures (Q KCCA), distances on distributions (Q MMD), independence of random variables (Q HSIC), and many more... Link:

81 Maximum Mean Discrepancy (MMD)

82 MMD estimator: intuition

83 MMD estimator: intuition

84 MMD estimator: intuition

85 MMD estimator: intuition {MMD 2 pp, Qq Ě G P,P ` ĘG Q,Q 2 Ę G P,Q (without diagonals in Ě G P,P, Ę G Q,Q ) : { MMD & { HSIC illustration credit: Arthur Gretton

86 MMD estimator Feature of a distribution: µ P E x P ϕpxq.

87 MMD estimator Feature of a distribution: µ P E x P ϕpxq. MMD = difference between feature means: MMD 2 pp, Qq : }µ P µ Q } 2 H k,

88 MMD estimator Feature of a distribution: µ P E x P ϕpxq. MMD = difference between feature means: MMD 2 pp, Qq : }µ P µ Q } 2 H k, {MMDupP, 2 Qq G Ě P,P ` ĘG Q,Q 2G Ę P,Q using tx i u m i 1 P, ty ju n j 1 Q samples.

89 MMD estimator Feature of a distribution: µ P E x P ϕpxq. MMD = difference between feature means: MMD 2 pp, Qq : }µ P µ Q } 2 H k, {MMDupP, 2 Qq G Ě P,P ` ĘG Q,Q 2G Ę P,Q using tx i u m i 1 P, ty ju n j 1 Q samples. Computational complexity: O `pm ` nq 2, quadratic.

90 Hilbert-Schmidt Independence Criterion (HSIC)

91 HSIC: intuition. X : images, Y: descriptions

92 HSIC intuition: Gram matrices

93 HSIC intuition: Gram matrices Empirical estimate: {HSIC 2 1 n 2 G x, G y F.

94 HSIC intuition: Gram matrices Empirical estimate: {HSIC 2 1 n 2 G x, G y F. HSICpP xyq MMD pp xy, P x b P y q.

95 Cocktail party: HSIC demo

96 ISA reminder x As, s where s m -s are non-gaussian & independent. Goal: tx t u T t 1 Ñ W A 1, ts t u T t 1, s 1 ;... ; s Mı,

97 ISA reminder x As, s where s m -s are non-gaussian & independent. Goal: tx t u T t 1 Ñ W A 1, ts t u T t 1, Objective function: ŝ Wx, JpWq I ŝ M 1,..., ŝ s 1 ;... ; s Mı, Ñ min W.

98 ISA: source, observation Hidden sources (s):

99 ISA: source, observation Hidden sources (s): Observation (x):

100 ISA: estimated sources using HSIC, ambiguity Estimated sources (ŝ):

101 ISA: estimated sources using HSIC, ambiguity Estimated sources (ŝ): Performance (ŴA), ambiguity:

102 Conjecture: ISA separation theorem [Cardoso, 1998] ISA = ICA + permutation.

103 Conjecture: ISA separation theorem [Cardoso, 1998] ISA = ICA + permutation. z HSICpŝi, ŝ j q. Here: dimps m q 3.

104 Conjecture: ISA separation theorem [Cardoso, 1998] ISA = ICA + permutation. z HSICpŝi, ŝ j q. Here: dimps m q 3.

105 Conjecture: ISA separation theorem [Cardoso, 1998] ISA = ICA + permutation. z HSICpŝi, ŝ j q. Here: dimps m q 3. Basis of the state-of-the-art ISA solvers.

106 Conjecture: ISA separation theorem [Cardoso, 1998] ISA = ICA + permutation. z HSICpŝi, ŝ j q. Here: dimps m q 3. Basis of the state-of-the-art ISA solvers. Sufficient conditions [Szabó et al., 2012]: s m : spherical.

107 ISA separation theorem For dimps m q 2: less is sufficient. Invariance to 90 rotation: f pu 1, u 2 q f p u 2, u 1 q f p u 1, u 2 q f pu 2, u 1 q.

108 ISA separation theorem For dimps m q 2: less is sufficient. Invariance to 90 rotation: f pu 1, u 2 q f p u 2, u 1 q f p u 1, u 2 q f pu 2, u 1 q. permutation and sign: f p u 1, u 2 q f p u 2, u 1 q.

109 ISA separation theorem For dimps m q 2: less is sufficient. Invariance to 90 rotation: f pu 1, u 2 q f p u 2, u 1 q f p u 1, u 2 q f pu 2, u 1 q. permutation and sign: f p u 1, u 2 q f p u 2, u 1 q. L p -spherical: f pu 1, u 2 q h p ř i u i p q pp ą 0q.

110 Mean embedding, MMD, HSIC: +applications & review Applications: domain adaptation [Zhang et al., 2013], -generalization [Blanchard et al., 2017], interpretable machine learning [Kim et al., 2016], kernel belief propagation [Song et al., 2011], kernel Bayes rule [Fukumizu et al., 2013], model criticism [Lloyd et al., 2014], approximate Bayesian computation [Park et al., 2016], probabilistic programming [Schölkopf et al., 2015], distribution classification[muandet et al., 2011], -regression [Szabó et al., 2016], topological data analysis [Kusano et al., 2016], post selection inference [Yamada et al., 2016], causal inference [Mooij et al., 2016, Pfister et al., 2017, Strobl et al., 2017]. Mean embedding review [Muandet et al., 2017].

111 Hypothesis Testing

112 Two-sample testing: recall Given: X tx i u n i 1 P, Y ty ju n j 1 Q. Example: x i = i th happy face, y j = j th sad face.

113 Two-sample testing: recall Given: X tx i u n i 1 P, Y ty ju n j 1 Q. Example: x i = i th happy face, y j = j th sad face. Problem: using X, Y test H 0 : P Q, vs H 1 : P Q.

114 Ingredients of two-sample test Test statistic: ˆλ n ˆλ n px, Y q, random. Significance level: α Under H 0 : P H0 p looomooon ˆλ n ď T α q 1 α. correctly accepting H P H0 (ˆλn ) P H1 (ˆλn ) T α ˆλ n ˆλ n

115 Ingredients of two-sample test Test statistic: ˆλ n ˆλ n px, Y q, random. Significance level: α Under H 0 : P H0 p looomooon ˆλ n ď T α q 1 α. correctly accepting H 0 Under H 1 : P H1 pt α ă ˆλ n q Ppcorrectly rejecting H 0 q =: power P H0 (ˆλn ) P H1 (ˆλn ) T α ˆλ n ˆλ n

116 Two-sample test using MMD asymptotics: H 0 Under H 0 [Gretton et al., 2007, Gretton et al., 2012] ÝÝÝÝÝÝÑ U-statistics n MMD { 8ÿ upp, 2 Pq λ i `z2 i 2, i 1 where z i Np0, 2q i.i.d., ż kpx, x 1 qv i pxqdppxq λ i v i px 1 q, kpx, x 1 q ϕpxq µ P, ϕpx 1 q µ P. H k X pdf Empirical pdf MMDu 2

117 Null approximations; test statistics: quadratic in time Small sample size: permutation test. Medium sample size: 0.4 gamma approximation: 0.3 α=2,β=1 p α,β (x) x

118 Null approximations; test statistics: quadratic in time Small sample size: permutation test. Medium sample size: 0.4 gamma approximation: 0.3 α=2,β=1 α=4,β=1 p α,β (x) x

119 Null approximations; test statistics: quadratic in time Small sample size: permutation test. Medium sample size: 0.4 gamma approximation: 0.3 α=2,β=1 α=4,β=1 α=2,β=3 p α,β (x) x

120 Null approximations; test statistics: quadratic in time Small sample size: permutation test. Medium sample size: 0.4 gamma approximation: 0.3 α=2,β=1 α=4,β=1 α=2,β=3 p α,β (x) truncated expansion [Gretton et al., 2009] x

121 Null approximations; test statistics: quadratic in time Small sample size: permutation test. Medium sample size: 0.4 gamma approximation: 0.3 α=2,β=1 α=4,β=1 α=2,β=3 p α,β (x) x truncated expansion [Gretton et al., 2009]. Large sample size: online techniques [Gretton et al., 2012] (large var), recent linear methods (soon).

122 Independence testing with HSIC Similary story [Gretton et al., 2008, Pfister et al., 2016]: Null asymptotics: 8ÿ λ i zi 2, z i Np0, 1q. i 1 In practice: permutation-test/gamma-approximation.

123 Related work 2-sample testing: block-mmd [Zaremba et al., 2013] var Œ

124 Related work 2-sample testing: block-mmd [Zaremba et al., 2013] var Π3-variable interaction [Sejdinovic et al., 2013].

125 Related work 2-sample testing: block-mmd [Zaremba et al., 2013] var Π3-variable interaction [Sejdinovic et al., 2013]. Goodness-of-fit [Chwialkowski et al., 2016].

126 Related work 2-sample testing: block-mmd [Zaremba et al., 2013] var Œ 3-variable interaction [Sejdinovic et al., 2013]. Goodness-of-fit [Chwialkowski et al., 2016]. Time-series: independence (stationary Ñ shift) [Chwialkowski and Gretton, 2014], wild bootstrap: [Chwialkowski et al., 2014, Rubenstein et al., 2016].

127 Related work 2-sample testing: block-mmd [Zaremba et al., 2013] var Œ 3-variable interaction [Sejdinovic et al., 2013]. Goodness-of-fit [Chwialkowski et al., 2016]. Time-series: independence (stationary Ñ shift) [Chwialkowski and Gretton, 2014], wild bootstrap: [Chwialkowski et al., 2014, Rubenstein et al., 2016]. block-hsic [Zhang et al., 2017]: RFF acceleration.

128 Related work 2-sample testing: block-mmd [Zaremba et al., 2013] var Œ 3-variable interaction [Sejdinovic et al., 2013]. Goodness-of-fit [Chwialkowski et al., 2016]. Time-series: independence (stationary Ñ shift) [Chwialkowski and Gretton, 2014], wild bootstrap: [Chwialkowski et al., 2014, Rubenstein et al., 2016]. block-hsic [Zhang et al., 2017]: RFF acceleration. Conditional independence & RFF [Strobl et al., 2017].

129 Linear-time Tests

130 Linear-time 2-sample test [Chwialkowski et al., 2015] Recall: MMDpP, Qq }µ P µ Q } Hk.

131 Linear-time 2-sample test [Chwialkowski et al., 2015] Recall: MMDpP, Qq }µ P µ Q } Hk. Idea: change this to g f ρpp, Qq : e 1 Jÿ rµ P pv j q µ Q pv j qs J 2 j 1 with random tv j u J j 1 test locations.

132 Linear-time 2-sample test [Chwialkowski et al., 2015] Recall: MMDpP, Qq }µ P µ Q } Hk. Idea: change this to g f ρpp, Qq : e 1 Jÿ rµ P pv j q µ Q pv j qs J 2 j 1 with random tv j u J j 1 test locations. ρ is a metric for Gaussian k (bounded, analytic, characteristic), a.s.

133 Estimation pρ 2 pp, Qq 1 J z n 1 n Jÿ rˆµ P pv j q ˆµ Q pv j qs 2 1 J zt n z n, j 1 nÿ rkpx loooooooooomoooooooooon i, v j q kpy i, v j qs J j 1 P R J. z i i 1

134 Estimation pρ 2 pp, Qq 1 J z n 1 n Jÿ rˆµ P pv j q ˆµ Q pv j qs 2 1 J zt n z n, j 1 nÿ rkpx loooooooooomoooooooooon i, v j q kpy i, v j qs J j 1 P R J. z i i 1 Good news: estimation is linear in n! Bad news: intractable null distr. =? n p ρ 2 pp, Pq d ÝÑ sum of J correlated χ 2.

135 Estimation pρ 2 pp, Qq 1 J z n 1 n Jÿ rˆµ P pv j q ˆµ Q pv j qs 2 1 J zt n z n, j 1 nÿ rkpx loooooooooomoooooooooon i, v j q kpy i, v j qs J j 1 P R J. z i i 1 Good news: estimation is linear in n! Bad news: intractable null distr. =? n p ρ 2 pp, Pq d ÝÑ sum of J correlated χ 2. Modified statistic (whitening): ˆλ n n z T n Σ 1 n z n, Σ n cov ptz i u n i 1q.

136 Estimation pρ 2 pp, Qq 1 J z n 1 n Jÿ rˆµ P pv j q ˆµ Q pv j qs 2 1 J zt n z n, j 1 nÿ rkpx loooooooooomoooooooooon i, v j q kpy i, v j qs J j 1 P R J. z i i 1 Good news: estimation is linear in n! Bad news: intractable null distr. =? n p ρ 2 pp, Pq d ÝÑ sum of J correlated χ 2. Modified statistic (whitening): ˆλ n n z T n Σ 1 n z n, Σ n cov ptz i u n i 1q. Under H 0 : ˆλ n d ÝÑ χ 2 pjq. ñ Easy to get the p1 αq-quantile!

137 Optimize locations and kernel Test locations (tv j u J j 1 ), kernel parameters: fixed. Idea [Jitkrittum et al., 2016]: optimize them for a power proxy In practice λ n nz T Σ 1 z ppopulation ˆλ n q. Power is similar to the Opn 2 q MMD, but in Opnq time!

138 Optimize locations and kernel Test locations (tv j u J j 1 ), kernel parameters: fixed. Idea [Jitkrittum et al., 2016]: optimize them for a power proxy In practice λ n nz T Σ 1 z ppopulation ˆλ n q. Power is similar to the Opn 2 q MMD, but in Opnq time! Independence testing [Jitkrittum et al., 2017a].

139 Optimize locations and kernel Test locations (tv j u J j 1 ), kernel parameters: fixed. Idea [Jitkrittum et al., 2016]: optimize them for a power proxy λ n nz T Σ 1 z ppopulation ˆλ n q. In practice Power is similar to the Opn 2 q MMD, but in Opnq time! Independence testing [Jitkrittum et al., 2017a]. Goodness-of-fit [Jitkrittum et al., 2017b]: Best Paper NIPS-2018!

140 Demo-1 = NLP: most/least discriminative words NIPS papers ( ). TF-IDF repr. Bayes-Neuro. Most discriminative words: spike, markov, cortex, dropout, recurr, iii, gibb. learned test locations: highly interpretable, markov, gibb (ð Gibbs): Bayesian inference, spike, cortex : key terms in neuroscience.

141 Demo-1 = NLP: most/least discriminative words NIPS papers ( ). TF-IDF repr. Bayes-Neuro. Least dicriminative ones: circumfer, bra, dominiqu, rhino, mitra, kid, impostor.

142 Demo-2: Distinguish positive/negative emotions Karolinska Directed Emotional Faces (KDEF) [Lundqvist et al., 1998]. 70 actors = 35 females and 35 males. d 48 ˆ Grayscale. Pixel features. ` : happy neutral surprised : afraid angry disgusted Learned test location (averaged) =

143 Summary Dependency measures, distance: KCCA, HSIC, MMD. Tool: kernels.

144 Summary Dependency measures, distance: KCCA, HSIC, MMD. Tool: kernels. Applications: image registration, ISA, distribution regression, feature selection. hypothesis testing.

145 Summary Dependency measures, distance: KCCA, HSIC, MMD. Tool: kernels. Applications: image registration, ISA, distribution regression, feature selection. hypothesis testing. MMD, HSIC: Opn 2 q. Linear-time tests (R d ).

146 Thank you for the attention!

147 Bach, F. R. and Jordan, M. I. (2002). Kernel independent component analysis. Journal of Machine Learning Research, 3:1 48. Blanchard, G., Deshmukh, A. A., Dogan, U., Lee, G., and Scott, C. (2017). Domain generalization by marginal transfer learning. Technical report. Cardoso, J.-F. (1998). Multidimensional independent component analysis. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages Chwialkowski, K. and Gretton, A. (2014). A kernel independence test for random processes. In International Conference on Machine Learning (ICML; JMLR W&CP), volume 32, page

148 Chwialkowski, K., Ramdas, A., Sejdinovic, D., and Gretton, A. (2015). Fast two-sample testing with analytic representations of probability measures. In Neural Information Processing Systems (NIPS), pages Chwialkowski, K., Sejdinovic, D., and Gretton, A. (2014). A wild bootstrap for degenerate kernel tests. In Neural Information Processing Systems (NIPS), pages Chwialkowski, K., Strathmann, H., and Gretton, A. (2016). A kernel test of goodness of fit. In International Conference on Machine Learning (ICML), pages Fukumizu, K., Song, L., and Gretton, A. (2013). Kernel Bayes rule: Bayesian inference with positive definite kernels.

149 Journal of Machine Learning Research, 14: Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., and Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13: Gretton, A., Fukumizu, K., Harchaoui, Z., and Sriperumbudur, B. K. (2009). A fast, consistent kernel two-sample test. In Neural Information Processing Systems (NIPS), pages Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., and Smola, A. J. (2007). A kernel statistical test of independence. In Neural Information Processing Systems (NIPS), pages Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., and Smola, A. J. (2008).

150 A kernel statistical test of independence. In Neural Information Processing Systems (NIPS), pages Gretton, A., Herbrich, R., Smola, A., Bousquet, O., and Schölkopf, B. (2005). Kernel methods for measuring independence. Journal of Machine Learning Research, 6: Jitkrittum, W., Szabó, Z., Chwialkowski, K., and Gretton, A. (2016). Interpretable distribution features with maximum testing power. In Neural Information Processing Systems (NIPS), pages Jitkrittum, W., Szabó, Z., and Gretton, A. (2017a). An adaptive test of independence with analytic kernel embeddings. In International Conference on Machine Learning (ICML), volume 70, pages PMLR.

151 Jitkrittum, W., Xu, W., Szabó, Z., Fukumizu, K., and Gretton, A. (2017b). A linear-time kernel goodness-of-fit test. In Advances in Neural Information Processing Systems (NIPS). Kim, B., Khanna, R., and Koyejo, O. O. (2016). Examples are not enough, learn to criticize! criticism for interpretability. In Advances in Neural Information Processing Systems (NIPS), pages Kusano, G., Fukumizu, K., and Hiraoka, Y. (2016). Persistence weighted Gaussian kernel for topological data analysis. In International Conference on Machine Learning (ICML), pages Kybic, J. (2004). High-dimensional mutual information estimation for image registration.

152 In IEEE International Conference on Image Processing (ICIP), pages Lloyd, J. R., Duvenaud, D., Grosse, R., Tenenbaum, J. B., and Ghahramani, Z. (2014). Automatic construction and natural-language description of nonparametric regression models. In AAAI Conference on Artificial Intelligence, pages Lundqvist, D., Flykt, A., and Öhman, A. (1998). The Karolinska directed emotional faces-kdef. Technical report, ISBN Mooij, J. M., Peters, J., Janzing, D., Zscheischler, J., and Schölkopf, B. (2016). Distinguishing cause from effect using observational data: Methods and benchmarks. Journal of Machine Learning Research, 17:1 102.

153 Muandet, K., Fukumizu, K., Dinuzzo, F., and Schölkopf, B. (2011). Learning from distributions via support measure machines. In Neural Information Processing Systems (NIPS), pages Muandet, K., Fukumizu, K., Sriperumbudur, B., and Schölkopf, B. (2017). Kernel mean embedding of distributions: A review and beyond. Technical report. Neemuchwala, H., Hero, A., Zabuawala, S., and Carson, P. (2007). Image registration methods in high dimensional space. International Journal of Imaging Systems and Technology, 16: Park, M., Jitkrittum, W., and Sejdinovic, D. (2016).

154 K2-ABC: Approximate Bayesian computation with kernel embeddings. In International Conference on Artificial Intelligence and Statistics (AISTATS; PMLR), volume 51, pages 51: Peng, H., Long, F., and Ding, C. (2005). Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8): Pfister, N., Bühlmann, P., Schölkopf, B., and Peters, J. (2016). Kernel-based tests for joint independence. Technical report. ( Pfister, N., Bühlmann, P., Schölkopf, B., and Peters, J. (2017). Kernel-based tests for joint independence.

155 Journal of the Royal Statistical Society: Series B (Statistical Methodology). Póczos, B., Singh, A., Rinaldo, A., and Wasserman, L. (2013). Distribution-free distribution regression. In International Conference on AI and Statistics (AISTATS; JMLR W&CP), volume 31, pages Rényi, A. (1959). On measures of dependence. Acta Mathematica Academiae Scientiarum Hungaricae, 10: Rubenstein, P. K., Chwialkowski, K. P., and Gretton, A. (2016). A kernel test for three-variable interactions with random processes. In Conference on Uncertainty in Artificial Intelligence (UAI), pages

156 Schölkopf, B., Muandet, K., Fukumizu, K., Harmeling, S., and Peters, J. (2015). Computing functions of random variables via reproducing kernel Hilbert space representations. Statistics and Computing, 25(4): Sejdinovic, D., Gretton, A., and Bergsma, W. (2013). A kernel test for three-variable interactions. In Neural Information Processing Systems (NIPS), pages Song, L., Gretton, A., Bickson, D., Low, Y., and Guestrin, C. (2011). Kernel belief propagation. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages Strobl, E. V., Visweswaran, S., and Zhang, K. (2017). Approximate kernel-based conditional independence tests for fast non-parametric causal discovery.

157 Technical report. Szabó, Z., Póczos, B., and Lörincz, A. (2012). Separation theorem for independent subspace analysis and its consequences. Pattern Recognition, 45(4): Szabó, Z., Sriperumbudur, B. K., Póczos, B., and Gretton, A. (2016). Learning theory for distribution regression. Journal of Machine Learning Research, 17(152):1 40. Yamada, M., Umezu, Y., Fukumizu, K., and Takeuchi, I. (2016). Post selection inference with kernels. Technical report. ( Zaremba, W., Gretton, A., and Blaschko, M. (2013). B-tests: Low variance kernel two-sample tests.

158 In NIPS, pages Zhang, K., Schölkopf, B., Muandet, K., and Wang, Z. (2013). Domain adaptation under target and conditional shift. Journal of Machine Learning Research, 28(3): Zhang, Q., Filippi, S., Gretton, A., and Sejdinovic, D. (2017). Large-scale kernel methods for independence testing. Statistics and Computing, pages 1 18.

Kernels. B.Sc. École Polytechnique September 4, 2018

Kernels. B.Sc. École Polytechnique September 4, 2018 CMAP B.Sc. Day @ École Polytechnique September 4, 2018 Inner product Ñ kernel: similarity between features Extension of kpx, yq x, y ř i x iy i : kpx, yq ϕpxq, ϕpyq H. Inner product Ñ kernel: similarity

More information

Tensor Product Kernels: Characteristic Property, Universality

Tensor Product Kernels: Characteristic Property, Universality Tensor Product Kernels: Characteristic Property, Universality Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Hangzhou International Conference on Frontiers of Data Science

More information

Hypothesis Testing with Kernels

Hypothesis Testing with Kernels (Gatsby Unit, UCL) PRNI, Trento June 22, 2016 Motivation: detecting differences in AM signals Amplitude modulation: simple technique to transmit voice over radio. in the example: 2 songs. Fragments from

More information

Kernel methods for hypothesis testing and inference

Kernel methods for hypothesis testing and inference Kernel methods for hypothesis testing and inference MLSS Tübingen, 2015 Arthur Gretton Gatsby Unit, CSML, UCL Some motivating questions... Detecting differences in brain signals The problem: Do local field

More information

Kernel methods. MLSS Arequipa, Peru, Arthur Gretton. Gatsby Unit, CSML, UCL

Kernel methods. MLSS Arequipa, Peru, Arthur Gretton. Gatsby Unit, CSML, UCL Kernel methods MLSS Arequipa, Peru, 2016 Arthur Gretton Gatsby Unit, CSML, UCL Some motivating questions... Detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange when

More information

Characterizing Independence with Tensor Product Kernels

Characterizing Independence with Tensor Product Kernels Characterizing Independence with Tensor Product Kernels Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Department of Statistics, PSU December 13, 2017 Zolta n Szabo

More information

Learning Interpretable Features to Compare Distributions

Learning Interpretable Features to Compare Distributions Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections

More information

Learning features to compare distributions

Learning features to compare distributions Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this

More information

Distribution Regression: A Simple Technique with Minimax-optimal Guarantee

Distribution Regression: A Simple Technique with Minimax-optimal Guarantee Distribution Regression: A Simple Technique with Minimax-optimal Guarantee (CMAP, École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department,

More information

Learning features to compare distributions

Learning features to compare distributions Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this

More information

Minimax-optimal distribution regression

Minimax-optimal distribution regression Zoltán Szabó (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) ISNPS, Avignon June 12,

More information

Distribution Regression with Minimax-Optimal Guarantee

Distribution Regression with Minimax-Optimal Guarantee Distribution Regression with Minimax-Optimal Guarantee (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton

More information

Distribution Regression

Distribution Regression Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute

More information

Hilbert Space Representations of Probability Distributions

Hilbert Space Representations of Probability Distributions Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck

More information

Real and Complex Independent Subspace Analysis by Generalized Variance

Real and Complex Independent Subspace Analysis by Generalized Variance Real and Complex Independent Subspace Analysis by Generalized Variance Neural Information Processing Group, Department of Information Systems, Eötvös Loránd University, Budapest, Hungary ICA Research Network

More information

A Linear-Time Kernel Goodness-of-Fit Test

A Linear-Time Kernel Goodness-of-Fit Test A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum 1; Wenkai Xu 1 Zoltán Szabó 2 NIPS 2017 Best paper! Kenji Fukumizu 3 Arthur Gretton 1 wittawatj@gmail.com 1 Gatsby Unit, University College

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney

More information

Hypothesis Testing with Kernel Embeddings on Interdependent Data

Hypothesis Testing with Kernel Embeddings on Interdependent Data Hypothesis Testing with Kernel Embeddings on Interdependent Data Dino Sejdinovic Department of Statistics University of Oxford joint work with Kacper Chwialkowski and Arthur Gretton (Gatsby Unit, UCL)

More information

Big Hypothesis Testing with Kernel Embeddings

Big Hypothesis Testing with Kernel Embeddings Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big

More information

Two-stage Sampled Learning Theory on Distributions

Two-stage Sampled Learning Theory on Distributions Two-stage Sampled Learning Theory on Distributions Zoltán Szabó (Gatsby Unit, UCL) Joint work with Arthur Gretton (Gatsby Unit, UCL), Barnabás Póczos (ML Department, CMU), Bharath K. Sriperumbudur (Department

More information

Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise

Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Makoto Yamada and Masashi Sugiyama Department of Computer Science, Tokyo Institute of Technology

More information

Hilbert Space Embedding of Probability Measures

Hilbert Space Embedding of Probability Measures Lecture 2 Hilbert Space Embedding of Probability Measures Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 2017 Recap of Lecture

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Independent Subspace Analysis

Independent Subspace Analysis Independent Subspace Analysis Barnabás Póczos Supervisor: Dr. András Lőrincz Eötvös Loránd University Neural Information Processing Group Budapest, Hungary MPI, Tübingen, 24 July 2007. Independent Component

More information

Fast Two Sample Tests using Smooth Random Features

Fast Two Sample Tests using Smooth Random Features Kacper Chwialkowski University College London, Computer Science Department Aaditya Ramdas Carnegie Mellon University, Machine Learning and Statistics School of Computer Science Dino Sejdinovic University

More information

Forefront of the Two Sample Problem

Forefront of the Two Sample Problem Forefront of the Two Sample Problem From classical to state-of-the-art methods Yuchi Matsuoka What is the Two Sample Problem? X ",, X % ~ P i. i. d Y ",, Y - ~ Q i. i. d Two Sample Problem P = Q? Example:

More information

Hilbert Schmidt Independence Criterion

Hilbert Schmidt Independence Criterion Hilbert Schmidt Independence Criterion Thanks to Arthur Gretton, Le Song, Bernhard Schölkopf, Olivier Bousquet Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au

More information

Mathematische Methoden der Unsicherheitsquantifizierung

Mathematische Methoden der Unsicherheitsquantifizierung Mathematische Methoden der Unsicherheitsquantifizierung Oliver Ernst Professur Numerische Mathematik Sommersemester 2014 Contents 1 Introduction 1.1 What is Uncertainty Quantification? 1.2 A Case Study:

More information

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,

More information

Lecture 2: Mappings of Probabilities to RKHS and Applications

Lecture 2: Mappings of Probabilities to RKHS and Applications Lecture 2: Mappings of Probabilities to RKHS and Applications Lille, 214 Arthur Gretton Gatsby Unit, CSML, UCL Detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange

More information

Kernel Measures of Conditional Dependence

Kernel Measures of Conditional Dependence Kernel Measures of Conditional Dependence Kenji Fukumizu Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 6-8569 Japan fukumizu@ism.ac.jp Arthur Gretton Max-Planck Institute for

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Kernel methods for Bayesian inference

Kernel methods for Bayesian inference Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

ICA and ISA Using Schweizer-Wolff Measure of Dependence

ICA and ISA Using Schweizer-Wolff Measure of Dependence Keywords: independent component analysis, independent subspace analysis, copula, non-parametric estimation of dependence Abstract We propose a new algorithm for independent component and independent subspace

More information

arxiv: v2 [stat.ml] 3 Sep 2015

arxiv: v2 [stat.ml] 3 Sep 2015 Nonparametric Independence Testing for Small Sample Sizes arxiv:46.9v [stat.ml] 3 Sep 5 Aaditya Ramdas Dept. of Statistics and Machine Learning Dept. Carnegie Mellon University aramdas@cs.cmu.edu September

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University

More information

Faster Kernel Ridge Regression Using Sketching and Preconditioning

Faster Kernel Ridge Regression Using Sketching and Preconditioning Faster Kernel Ridge Regression Using Sketching and Preconditioning Haim Avron Department of Applied Math Tel Aviv University, Israel SIAM CS&E 2017 February 27, 2017 Brief Introduction to Kernel Ridge

More information

Manifold Regularization

Manifold Regularization Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1 The Problem of Learning is

More information

STK-IN4300 Statistical Learning Methods in Data Science

STK-IN4300 Statistical Learning Methods in Data Science STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no STK-IN4300: lecture 2 1/ 38 Outline of the lecture STK-IN4300 - Statistical Learning Methods in Data Science Linear

More information

Fast Parallel Estimation of High Dimensional Information Theoretical Quantities with Low Dimensional Random Projection Ensembles

Fast Parallel Estimation of High Dimensional Information Theoretical Quantities with Low Dimensional Random Projection Ensembles Fast Parallel Estimation of High Dimensional Information Theoretical Quantities with Low Dimensional Random Projection Ensembles Zoltán Szabó and András Lőrincz Department of Information Systems, Eötvös

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

A Linear-Time Kernel Goodness-of-Fit Test

A Linear-Time Kernel Goodness-of-Fit Test A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum Gatsby Unit, UCL wittawatj@gmail.com Wenkai Xu Gatsby Unit, UCL wenkaix@gatsby.ucl.ac.uk Zoltán Szabó CMAP, École Polytechnique zoltan.szabo@polytechnique.edu

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior

More information

STK-IN4300 Statistical Learning Methods in Data Science

STK-IN4300 Statistical Learning Methods in Data Science Outline of the lecture Linear Methods for Regression Linear Regression Models and Least Squares Subset selection STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no

More information

Maximum Mean Discrepancy

Maximum Mean Discrepancy Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia

More information

Package dhsic. R topics documented: July 27, 2017

Package dhsic. R topics documented: July 27, 2017 Package dhsic July 27, 2017 Type Package Title Independence Testing via Hilbert Schmidt Independence Criterion Version 2.0 Date 2017-07-24 Author Niklas Pfister and Jonas Peters Maintainer Niklas Pfister

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

A Kernelized Stein Discrepancy for Goodness-of-fit Tests

A Kernelized Stein Discrepancy for Goodness-of-fit Tests Qiang Liu QLIU@CS.DARTMOUTH.EDU Computer Science, Dartmouth College, NH, 3755 Jason D. Lee JASONDLEE88@EECS.BERKELEY.EDU Michael Jordan JORDAN@CS.BERKELEY.EDU Department of Electrical Engineering and Computer

More information

Generalized clustering via kernel embeddings

Generalized clustering via kernel embeddings Generalized clustering via kernel embeddings Stefanie Jegelka 1, Arthur Gretton 2,1, Bernhard Schölkopf 1, Bharath K. Sriperumbudur 3, and Ulrike von Luxburg 1 1 Max Planck Institute for Biological Cybernetics,

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.

Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Michel Besserve MPI for Intelligent Systems, Tübingen michel.besserve@tuebingen.mpg.de Nikos K. Logothetis MPI

More information

Probabilistic latent variable models for distinguishing between cause and effect

Probabilistic latent variable models for distinguishing between cause and effect Probabilistic latent variable models for distinguishing between cause and effect Joris M. Mooij joris.mooij@tuebingen.mpg.de Oliver Stegle oliver.stegle@tuebingen.mpg.de Dominik Janzing dominik.janzing@tuebingen.mpg.de

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Least-Squares Independence Test

Least-Squares Independence Test IEICE Transactions on Information and Sstems, vol.e94-d, no.6, pp.333-336,. Least-Squares Independence Test Masashi Sugiama Toko Institute of Technolog, Japan PRESTO, Japan Science and Technolog Agenc,

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

Statistical Convergence of Kernel CCA

Statistical Convergence of Kernel CCA Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

A Wild Bootstrap for Degenerate Kernel Tests

A Wild Bootstrap for Degenerate Kernel Tests A Wild Bootstrap for Degenerate Kernel Tests Kacper Chwialkowski Department of Computer Science University College London London, Gower Street, WCE 6BT kacper.chwialkowski@gmail.com Dino Sejdinovic Gatsby

More information

Kernel Adaptive Metropolis-Hastings

Kernel Adaptive Metropolis-Hastings Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis

Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analsis Matthew B. Blaschko, Christoph H. Lampert, & Arthur Gretton Ma Planck Institute for Biological Cbernetics Tübingen, German

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

No. of dimensions 1. No. of centers

No. of dimensions 1. No. of centers Contents 8.6 Course of dimensionality............................ 15 8.7 Computational aspects of linear estimators.................. 15 8.7.1 Diagonalization of circulant andblock-circulant matrices......

More information

Feature-to-Feature Regression for a Two-Step Conditional Independence Test

Feature-to-Feature Regression for a Two-Step Conditional Independence Test Feature-to-Feature Regression for a Two-Step Conditional Independence Test Qinyi Zhang Department of Statistics University of Oxford qinyi.zhang@stats.ox.ac.uk Sarah Filippi School of Public Health Department

More information

Bayesian Hidden Markov Models and Extensions

Bayesian Hidden Markov Models and Extensions Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

A Kernel Statistical Test of Independence

A Kernel Statistical Test of Independence A Kernel Statistical Test of Independence Arthur Gretton MPI for Biological Cybernetics Tübingen, Germany arthur@tuebingen.mpg.de Kenji Fukumizu Inst. of Statistical Mathematics Tokyo Japan fukumizu@ism.ac.jp

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Recovering Distributions from Gaussian RKHS Embeddings

Recovering Distributions from Gaussian RKHS Embeddings Motonobu Kanagawa Graduate University for Advanced Studies kanagawa@ism.ac.jp Kenji Fukumizu Institute of Statistical Mathematics fukumizu@ism.ac.jp Abstract Recent advances of kernel methods have yielded

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior

More information

Autoregressive Independent Process Analysis with Missing Observations

Autoregressive Independent Process Analysis with Missing Observations Autoregressive Independent Process Analysis with Missing Observations Zoltán Szabó Eötvös Loránd University - Department of Software Technology and Methodology Pázmány P. sétány 1/C, Budapest, H-1117 -

More information

Distributed Gaussian Processes

Distributed Gaussian Processes Distributed Gaussian Processes Marc Deisenroth Department of Computing Imperial College London http://wp.doc.ic.ac.uk/sml/marc-deisenroth Gaussian Process Summer School, University of Sheffield 15th September

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

NOTES ON SOME EXERCISES OF LECTURE 5, MODULE 2

NOTES ON SOME EXERCISES OF LECTURE 5, MODULE 2 NOTES ON SOME EXERCISES OF LECTURE 5, MODULE 2 MARCO VITTURI Contents 1. Solution to exercise 5-2 1 2. Solution to exercise 5-3 2 3. Solution to exercise 5-7 4 4. Solution to exercise 5-8 6 5. Solution

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Summary of Class Advanced Topics Dhruv Batra Virginia Tech HW1 Grades Mean: 28.5/38 ~= 74.9%

More information

Self Adaptive Particle Filter

Self Adaptive Particle Filter Self Adaptive Particle Filter Alvaro Soto Pontificia Universidad Catolica de Chile Department of Computer Science Vicuna Mackenna 4860 (143), Santiago 22, Chile asoto@ing.puc.cl Abstract The particle filter

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Sparse Kernel Density Estimation Technique Based on Zero-Norm Constraint

Sparse Kernel Density Estimation Technique Based on Zero-Norm Constraint Sparse Kernel Density Estimation Technique Based on Zero-Norm Constraint Xia Hong 1, Sheng Chen 2, Chris J. Harris 2 1 School of Systems Engineering University of Reading, Reading RG6 6AY, UK E-mail: x.hong@reading.ac.uk

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Second-Order Inference for Gaussian Random Curves

Second-Order Inference for Gaussian Random Curves Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)

More information

Causal Modeling with Generative Neural Networks

Causal Modeling with Generative Neural Networks Causal Modeling with Generative Neural Networks Michele Sebag TAO, CNRS INRIA LRI Université Paris-Sud Joint work: D. Kalainathan, O. Goudet, I. Guyon, M. Hajaiej, A. Decelle, C. Furtlehner https://arxiv.org/abs/1709.05321

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Semi-Supervised Learning through Principal Directions Estimation

Semi-Supervised Learning through Principal Directions Estimation Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de

More information

Cross-Entropy Optimization for Independent Process Analysis

Cross-Entropy Optimization for Independent Process Analysis Cross-Entropy Optimization for Independent Process Analysis Zoltán Szabó, Barnabás Póczos, and András Lőrincz Department of Information Systems Eötvös Loránd University, Budapest, Hungary Research Group

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University of Helsinki 14 Helsinki, Finland kun.zhang@cs.helsinki.fi Aapo Hyvärinen

More information