Hypothesis Testing with Kernel Embeddings on Interdependent Data

Size: px
Start display at page:

Download "Hypothesis Testing with Kernel Embeddings on Interdependent Data"

Transcription

1 Hypothesis Testing with Kernel Embeddings on Interdependent Data Dino Sejdinovic Department of Statistics University of Oxford joint work with Kacper Chwialkowski and Arthur Gretton (Gatsby Unit, UCL) 9 April 215 Dagstuhl D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 1 / 19

2 Kernel Embedding feature map: x k(, x) H k instead of x (ϕ 1 (x),..., ϕ s (x)) R s k(, x), k(, y) Hk = k(x, y) inner products easily computed D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 2 / 19

3 Kernel Embedding feature map: x k(, x) H k instead of x (ϕ 1 (x),..., ϕ s (x)) R s k(, x), k(, y) Hk = k(x, y) inner products easily computed embedding: P µ k (P) = E X P k(, X ) H k instead of P (Eϕ 1 (X ),..., Eϕ s (X )) R s µ k (P), µ k (Q) Hk = E X,Y k(x, Y ) inner products easily estimated D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 2 / 19

4 Kernel Embedding feature map: x k(, x) H k instead of x (ϕ 1 (x),..., ϕ s (x)) R s k(, x), k(, y) Hk = k(x, y) inner products easily computed embedding: P µ k (P) = E X P k(, X ) H k instead of P (Eϕ 1 (X ),..., Eϕ s (X )) R s µ k (P), µ k (Q) Hk = E X,Y k(x, Y ) inner products easily estimated µ k (P) represents expectations w.r.t. P, i.e., E X f (X ) = E X f, k(, X ) Hk = f, µ k (P) Hk f H k D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 2 / 19

5 Kernel MMD Denition Kernel metric (MMD) between P and Q: MMD k (P, Q) = E X k(, X ) E Y k(, Y ) 2 H k = E XX k(x, X ) + E YY k(y, Y ) 2E X Y k(x, Y ) D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 3 / 19

6 Kernel MMD A polynomial kernel k(x, x ) = ( 1 + x x ) s on R p captures the dierence in rst s (mixed) moments only For a certain family of kernels (characteristic/universal): MMD k (P, Q) = i P = Q: Gaussian exp( 1 2σ 2 z z 2 2 ), Laplacian, inverse multiquadratics, B 2n+1 - splines... Under mild assumptions, k-mmd metrizes weak* topology on probability measures (Sriperumbudur, 21): MMD k (P n, P) P n P D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 4 / 19

7 Nonparametric two-sample tests Testing H : P = Q vs. H A : P Q based on samples {x i } nx i=1 P, {y i} ny i=1 Q. Test statistic is an estimate of MMD k (P, Q) = E XX k(x, X ) + E YY k(y, Y ) 2E X Y k(x, Y ): 1 1 MMD k = k(x i, x j ) + k(y i, y j ) n x (n x 1) n y (n y 1) Degenerate U-statistic: 2 i j n x n y i,j k(x i, y j ). i j 1 n -convergence to MMD under H A, 1 n -convergence to under H. O(n 2 ) to compute (n = n x + n y ) various approximations (block-based, random features) trade computation for power. Gretton et al (NIPS 29, JMLR 212, NIPS 212) D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 5 / 19

8 Test threshold For i.i.d. data, under H : P = Q: n x n y MMDk n x +n y r=1 λ r ( Z 2 r 1 ), {Z r } r=1 i.i.d. N (, 1) {λ r } depend on both k and P: eigenvalues of T : L 2 L 2, ˆ (Tf ) (x) f (x ) k(x, x ) dp(x ). }{{} centred Asymptotic null distribution typically estimated using a permutation test. For interdependent samples, {Z r } r=1 are correlated, with the correlation structure dependent on the correlation structure within the samples. D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 6 / 19

9 Nonparametric independence tests H : X Y H A : X Y D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 7 / 19

10 Nonparametric independence tests H : X Y P X Y = P X P Y H A : X Y P X Y P X P Y Test statistic: HSIC(X, Y ) = µ κ (ˆPXY ) µ κ (ˆPX ˆPY ) 2, H κ with κ = k l Gretton et al (25, 28); Smola et al (27); Related to distance covariance (dcov) in statistics literature Szekely et al (AoS 27, AoAS 29); S. et al (AoS 213) D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 7 / 19

11 HSIC computation k(, ) around and take naps... trademark terrier temperament... ) world - he loves to lay are alert and active with the couch potato of the terrier little bundles of energy. They Sealyham Terrier is the l(the,cairn Terriers are independent D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 8 / 19

12 HSIC computation k(, ) around and take naps... trademark terrier temperament... ) world - he loves to lay are alert and active with the couch potato of the terrier little bundles of energy. They Sealyham Terrier is the l(the,cairn Terriers are independent K= L= HSIC measures average similarity between the kernel matrices: HSIC(X, Y ) = 1 n 2 HKH, HLH H = I 1 n 11 (centering matrix) D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 8 / 19

13 HSIC computation k(, ) around and take naps... trademark terrier temperament... ) world - he loves to lay are alert and active with the couch potato of the terrier little bundles of energy. They Sealyham Terrier is the l(the,cairn Terriers are independent K= L= HSIC measures average similarity between the kernel matrices: HSIC(X, Y ) = 1 n 2 HKH, HLH H = I 1 n 11 (centering matrix) Extensions: conditional independence testing (Fukumizu, Gretton, Sun and Schölkopf, 28; Zhang, Peters, Janzing and Schölkopf, 211), three-variable interaction / V-structure discovery (S., Gretton and Bergsma, 213) D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 8 / 19

14 Kernel tests on time series Kacper Chwialkowski Arthur Gretton D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 9 / 19

15 Test calibration for dependent observations Is P the same distribution as Q? D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 1 / 19

16 Kernel MMD Denition Kernel metric (MMD) between P and Q: MMD k (P, Q) = E X k(, X ) E Y k(, Y ) 2 H k = E XX k(x, X ) + E YY k(y, Y ) 2E X Y k(x, Y ) D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 11 / 19

17 Permutation test on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,p Correct threshold Bootstrapped threshold a =.1 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 12 / 19

18 Permutation test on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,p Correct threshold Bootstrapped threshold a =.2 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 12 / 19

19 Permutation test on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,p Correct threshold Bootstrapped threshold a =.3 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 12 / 19

20 Permutation test on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,p Correct threshold Bootstrapped threshold a =.4 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 12 / 19

21 Permutation test on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,p Correct threshold Bootstrapped threshold a =.5 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 12 / 19

22 Permutation test on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,p Correct threshold Bootstrapped threshold a =.6 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 12 / 19

23 Permutation test on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,p Correct threshold Bootstrapped threshold a =.7 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 12 / 19

24 Permutation test on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,p Correct threshold Bootstrapped threshold a =.8 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 12 / 19

25 Wild Bootstrap Wild bootstrap process (Leucht and Neumann, 213): W t,n = e 1/l n W t 1,n + 1 e 2/l i.i.d. nɛ t where W,n, ɛ 1,..., ɛ n N (, 1), and n W t,n = W t,n 1 W n j=1 j,n. MMD k,wb := 1 n 2 x n x n x i=1 j=1 2 n x n y i=1 W (x) (x) i,n x W j,n x k(x i, x j ) 1 nx 2 n x n y j=1 W (x) (y) i,n x W j,n y k(x i, y j ). n y n y i=1 j=1 W (y) i,n y W (y) j,n y k(y i, y j ) Theorem (Chwialkowski, S. and Gretton, 214) Let k be bounded and Lipschitz continuous, and let {X t } P and {Y t } Q both be τ-dependent with τ(r) ( = O(r 6 ɛ ), but independent ) of each other. nx n Then, under H : P = Q, ϕ y n MMDk n x +n y, x n y P MMDk,b n x +n y as n x, n y, where ϕ is the Prokhorov metric. D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 13 / 19

26 Wild Bootstrap on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,w Correct threshold Bootstrapped threshold a =.1 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 14 / 19

27 Wild Bootstrap on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,w Correct threshold Bootstrapped threshold a =.2 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 14 / 19

28 Wild Bootstrap on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,w Correct threshold Bootstrapped threshold a =.3 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 14 / 19

29 Wild Bootstrap on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,w Correct threshold Bootstrapped threshold a =.4 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 14 / 19

30 Wild Bootstrap on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,w Correct threshold Bootstrapped threshold a =.5 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 14 / 19

31 Wild Bootstrap on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,w Correct threshold Bootstrapped threshold a =.6 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 14 / 19

32 Wild Bootstrap on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,w Correct threshold Bootstrapped threshold a =.7 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 14 / 19

33 Wild Bootstrap on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,w Correct threshold Bootstrapped threshold a =.8 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 14 / 19

34 Test calibration for dependent observations Two-sample test experiment \ method perm. wild MCMC diagnostics i.i.d. vs i.i.d. (H ).4.12 i.i.d. vs Gibbs (H ) Gibbs vs Gibbs (H ) value value time time time series 2 time series time series time series 1 type I error AR coeffcient type II error V b1 V b2 Shift Extinction rate D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 15 / 19

35 Time Series Coupled at a Lag type II error rate mean distance correlation sample size lag KCSD HSIC sample size X t = cos(φ t,1), φ t,1 = φ t 1,1 +.1ɛ 1,t + 2πf 1T s, ɛ 1,t i.i.d. N (, 1), Y t = [2 + C sin(φ t,1)] cos(φ t,2), φ t,2 = φ t 1,2 +.1ɛ 2,t + 2πf 2T s, ɛ 2,t i.i.d. N (, 1). Parameters: C =.4, f 1 = 4Hz,f 2 = 2Hz, 1 T s = 1Hz. - M. Besserve, N.K. Logothetis, and B. Schölkopf. Statistical analysis of coupled time series with kernel cross-spectral density operators. NIPS 213. D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 16 / 19

36 Summary Interdependent data lead to incorrect Type I control for kernel tests (too many false positives). Consistency of a wild bootstrap procedure under weak long-range dependencies (τ -mixing), applicable to both two-sample and independence tests Applications: MCMC diagnostics, time series dependence across multiple lags D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 17 / 19

37 Open questions Interdependent case: how to select parameters of the wild bootstrap / block bootstrap - requires estimating mixing properties of the time series rst? Large-scale testing: tradeos between computation and power How to interpret the discovered dierences in distributions / discovered dependence? Do we really care about all possible dierences between distributions? Tuning parameters - can select kernels/hyperparameters to directly optimize relative eciency of the test, but how does this aect tradeos with interdependent data? Sensitive interplay between the kernel hyperparameter and the wild bootstrap parameters Multivariate interaction and graphical model selection - approximations? D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 18 / 19

38 References K. Chwialkowski, DS and A. Gretton, A wild bootstrap for degenerate kernel tests. Advances in Neural Information Processing Systems (NIPS) 27, Dec DS, B. Sriperumbudur, A. Gretton and K. Fukumizu, Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Statist. 41(5): , Oct M. Besserve, N.K. Logothetis and B. Schölkopf, Statistical analysis of coupled time series with kernel cross-spectral density operators. Advances in Neural Information Processing Systems (NIPS) 26, Dec A. Leucht and M.H. Neumann, Dependent wild bootstrap for degenerate U- and V-statistics. J. Multivar. Anal. 117:257-28, 213. A. Gretton, K.M. Borgwardt, M.J. Rasch, B. Schölkopf and A. Smola, A Kernel Two-Sample Test. J. Mach. Learn. Res. 13(Mar): , 212. D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 19 / 19

39 Kernels and characteristic functions 3 Smooth density 3 Rough density Y Y X X m=512, α=.5 E-distance/dCov of Székely and Rizzo (24,25), Székely et al (27): V 2 (X, Y ) = E XY E X Y X X 2 Y Y 2 + E X E X X X 2 E Y E Y Y Y 2 [ 2E XY EX X X 2 E Y Y Y ] Type II error dist, q=1/6 dist, q=1/3 dist, q=2/3 dist, q=1 dist, q=4/3 gauss (median) frequency DS, B. Sriperumbudur, A. Gretton and K. Fukumizu, Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Annals of Statistics 41(5), p , 213. D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 2 / 19

40 Embeddings in Mercer's Expansion Mercer's Expansion For a compact metric space X, and a continous kernel k, k(x, y) = λ r Φ r (x)φ r (y), r=1 with {λ r, Φ r } r 1 eigenvalue, eigenfunction pairs of f f (x)k(, x)dp(x) on L 2 (P). H k k(, x) H k µ k (P) µ k (ˆP) µ k ( ˆQ) 2 = H k { λr Φ r (x)} l 2 { λr EΦ r (X )} l 2 ( 1 n x λ r r=1 n x t=1 Φ r (X t ) 1 n y n y t=1 Φ r (Y t ) ) 2 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 21 / 19

41 Wild Bootstrap ρ x = n x /n, ρ y = n y /n {W t,n } 1 t n, EW t,n =, E [W t,n W t,n] = ζ lim u ζ(u) 1 ρ x ρ y n MMDk = ρ x ρ y n MMDk,wb = r=1 r=1 λ r ( ρy λ r ( ρy n x t=1 n x t=1 Φ r (X t ) nx ( t t l n ), with ρ x Φ r (X t ) W (y) t,n x nx n y t=1 ρ x ( E [Φ r (X 1 )W 1,nΦ s (X t )W t,n ] = E [Φ r (X 1 )Φ s (X t )] ζ t 1 l n ) ) 2 Φ r (Y t ) ny n y t=1 n E [Φ r (X 1 )Φ s (X t )], t, r, s provided dependence between X 1 and X t disappears fast enough (a τ-mixing condition). ) (y) 2 Φ r (Y t ) W t,n y ny D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 22 / 19

42 ICML Workshop on Large-Scale Kernel Learning Lille, France, 11 July 215 (collocated with ICML 215) Foundational algorithmic techniques for large-scale kernel learning: matrix factorization, randomization and approximation, variational inference and sampling, inducing variables, random Fourier features, unifying frameworks Interface between kernel methods and deep learning architectures Tradeos between statistical and computational eciency in kernel methods Stochastic gradient techniques with kernel methods Large-scale multiple kernel learning Large-scale representation learning with kernels Large-scale kernel methods for complex data types beyond perceptual data Conrmed speakers: Francis Bach, Neil Lawrence, Russ Salakhutdinov, Marius Kloft, Zaid Harchaoui Deadline for Submissions: Friday, May 1st, 215, 23: UTC. D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 23 / 19

Big Hypothesis Testing with Kernel Embeddings

Big Hypothesis Testing with Kernel Embeddings Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big

More information

A Wild Bootstrap for Degenerate Kernel Tests

A Wild Bootstrap for Degenerate Kernel Tests A Wild Bootstrap for Degenerate Kernel Tests Kacper Chwialkowski Department of Computer Science University College London London, Gower Street, WCE 6BT kacper.chwialkowski@gmail.com Dino Sejdinovic Gatsby

More information

Topics in kernel hypothesis testing

Topics in kernel hypothesis testing Topics in kernel hypothesis testing Kacper Chwialkowski A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of University College London. Department

More information

Hilbert Space Representations of Probability Distributions

Hilbert Space Representations of Probability Distributions Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute

More information

A Linear-Time Kernel Goodness-of-Fit Test

A Linear-Time Kernel Goodness-of-Fit Test A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum 1; Wenkai Xu 1 Zoltán Szabó 2 NIPS 2017 Best paper! Kenji Fukumizu 3 Arthur Gretton 1 wittawatj@gmail.com 1 Gatsby Unit, University College

More information

Kernel Adaptive Metropolis-Hastings

Kernel Adaptive Metropolis-Hastings Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1

More information

Kernel methods for Bayesian inference

Kernel methods for Bayesian inference Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20

More information

Lecture 2: Mappings of Probabilities to RKHS and Applications

Lecture 2: Mappings of Probabilities to RKHS and Applications Lecture 2: Mappings of Probabilities to RKHS and Applications Lille, 214 Arthur Gretton Gatsby Unit, CSML, UCL Detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange

More information

Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.

Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Michel Besserve MPI for Intelligent Systems, Tübingen michel.besserve@tuebingen.mpg.de Nikos K. Logothetis MPI

More information

Hilbert Space Embedding of Probability Measures

Hilbert Space Embedding of Probability Measures Lecture 2 Hilbert Space Embedding of Probability Measures Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 2017 Recap of Lecture

More information

Unsupervised Nonparametric Anomaly Detection: A Kernel Method

Unsupervised Nonparametric Anomaly Detection: A Kernel Method Fifty-second Annual Allerton Conference Allerton House, UIUC, Illinois, USA October - 3, 24 Unsupervised Nonparametric Anomaly Detection: A Kernel Method Shaofeng Zou Yingbin Liang H. Vincent Poor 2 Xinghua

More information

A Linear-Time Kernel Goodness-of-Fit Test

A Linear-Time Kernel Goodness-of-Fit Test A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum Gatsby Unit, UCL wittawatj@gmail.com Wenkai Xu Gatsby Unit, UCL wenkaix@gatsby.ucl.ac.uk Zoltán Szabó CMAP, École Polytechnique zoltan.szabo@polytechnique.edu

More information

A Kernel Test for Three-Variable Interactions

A Kernel Test for Three-Variable Interactions A Kernel Test for Three-Variable Interactions Dino Sejdinovic, Arthur Gretton Gatsby Unit, CSML, UCL, UK {dino.sejdinovic, arthur.gretton}@gmail.com Wicher Bergsma Department of Statistics, LSE, UK w.p.bergsma@lse.ac.uk

More information

Learning features to compare distributions

Learning features to compare distributions Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

Learning Interpretable Features to Compare Distributions

Learning Interpretable Features to Compare Distributions Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections

More information

On a Nonparametric Notion of Residual and its Applications

On a Nonparametric Notion of Residual and its Applications On a Nonparametric Notion of Residual and its Applications Bodhisattva Sen and Gábor Székely arxiv:1409.3886v1 [stat.me] 12 Sep 2014 Columbia University and National Science Foundation September 16, 2014

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney

More information

Fast Two Sample Tests using Smooth Random Features

Fast Two Sample Tests using Smooth Random Features Kacper Chwialkowski University College London, Computer Science Department Aaditya Ramdas Carnegie Mellon University, Machine Learning and Statistics School of Computer Science Dino Sejdinovic University

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

Kernel Measures of Conditional Dependence

Kernel Measures of Conditional Dependence Kernel Measures of Conditional Dependence Kenji Fukumizu Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 6-8569 Japan fukumizu@ism.ac.jp Arthur Gretton Max-Planck Institute for

More information

Advances in kernel exponential families

Advances in kernel exponential families Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate

More information

Adaptive HMC via the Infinite Exponential Family

Adaptive HMC via the Infinite Exponential Family Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family

More information

B-tests: Low Variance Kernel Two-Sample Tests

B-tests: Low Variance Kernel Two-Sample Tests B-tests: Low Variance Kernel Two-Sample Tests Wojciech Zaremba, Arthur Gretton, Matthew Blaschko To cite this version: Wojciech Zaremba, Arthur Gretton, Matthew Blaschko. B-tests: Low Variance Kernel Two-

More information

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,

More information

arxiv: v1 [math.st] 24 Aug 2017

arxiv: v1 [math.st] 24 Aug 2017 Submitted to the Annals of Statistics MULTIVARIATE DEPENDENCY MEASURE BASED ON COPULA AND GAUSSIAN KERNEL arxiv:1708.07485v1 math.st] 24 Aug 2017 By AngshumanRoy and Alok Goswami and C. A. Murthy We propose

More information

Tensor Product Kernels: Characteristic Property, Universality

Tensor Product Kernels: Characteristic Property, Universality Tensor Product Kernels: Characteristic Property, Universality Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Hangzhou International Conference on Frontiers of Data Science

More information

Distribution Regression

Distribution Regression Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

Distribution Regression: A Simple Technique with Minimax-optimal Guarantee

Distribution Regression: A Simple Technique with Minimax-optimal Guarantee Distribution Regression: A Simple Technique with Minimax-optimal Guarantee (CMAP, École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department,

More information

Statistical Convergence of Kernel CCA

Statistical Convergence of Kernel CCA Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,

More information

Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches

Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Arthur Gretton 1 and László Györfi 2 1. Gatsby Computational Neuroscience Unit London, UK 2. Budapest University of Technology

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Maximum Mean Discrepancy

Maximum Mean Discrepancy Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia

More information

A Kernelized Stein Discrepancy for Goodness-of-fit Tests

A Kernelized Stein Discrepancy for Goodness-of-fit Tests Qiang Liu QLIU@CS.DARTMOUTH.EDU Computer Science, Dartmouth College, NH, 3755 Jason D. Lee JASONDLEE88@EECS.BERKELEY.EDU Michael Jordan JORDAN@CS.BERKELEY.EDU Department of Electrical Engineering and Computer

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Distribution Regression with Minimax-Optimal Guarantee

Distribution Regression with Minimax-Optimal Guarantee Distribution Regression with Minimax-Optimal Guarantee (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Learning features to compare distributions

Learning features to compare distributions Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this

More information

Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise

Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Makoto Yamada and Masashi Sugiyama Department of Computer Science, Tokyo Institute of Technology

More information

Recovering Distributions from Gaussian RKHS Embeddings

Recovering Distributions from Gaussian RKHS Embeddings Motonobu Kanagawa Graduate University for Advanced Studies kanagawa@ism.ac.jp Kenji Fukumizu Institute of Statistical Mathematics fukumizu@ism.ac.jp Abstract Recent advances of kernel methods have yielded

More information

Package dhsic. R topics documented: July 27, 2017

Package dhsic. R topics documented: July 27, 2017 Package dhsic July 27, 2017 Type Package Title Independence Testing via Hilbert Schmidt Independence Criterion Version 2.0 Date 2017-07-24 Author Niklas Pfister and Jonas Peters Maintainer Niklas Pfister

More information

arxiv: v1 [stat.me] 10 Jun 2013

arxiv: v1 [stat.me] 10 Jun 2013 A Kernel Test for Three-Variable Interactions arxiv:306.8v [stat.me] 0 Jun 03 Dino Sejdinovic, Arthur Gretton Gatsby Unit, CSML, UCL, UK {dino.sejdinovic,arthur.gretton}@gmail.com Wicher Bergsma Department

More information

Minimax-optimal distribution regression

Minimax-optimal distribution regression Zoltán Szabó (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) ISNPS, Avignon June 12,

More information

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional

More information

Two-sample hypothesis testing for random dot product graphs

Two-sample hypothesis testing for random dot product graphs Two-sample hypothesis testing for random dot product graphs Minh Tang Department of Applied Mathematics and Statistics Johns Hopkins University JSM 2014 Joint work with Avanti Athreya, Vince Lyzinski,

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Manny Parzen, Cherished Teacher And a Tale of Two Kernels. Grace Wahba

Manny Parzen, Cherished Teacher And a Tale of Two Kernels. Grace Wahba Manny Parzen, Cherished Teacher And a Tale of Two Kernels Grace Wahba From Emanuel Parzen and a Tale of Two Kernels Memorial Session for Manny Parzen Joint Statistical Meetings Baltimore, MD August 2,

More information

Compressive Inference

Compressive Inference Compressive Inference Weihong Guo and Dan Yang Case Western Reserve University and SAMSI SAMSI transition workshop Project of Compressive Inference subgroup of Imaging WG Active members: Garvesh Raskutti,

More information

Metric Embedding for Kernel Classification Rules

Metric Embedding for Kernel Classification Rules Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding

More information

On Testing Independence and Goodness-of-fit in Linear Models

On Testing Independence and Goodness-of-fit in Linear Models On Testing Independence and Goodness-of-fit in Linear Models arxiv:1302.5831v2 [stat.me] 5 May 2014 Arnab Sen and Bodhisattva Sen University of Minnesota and Columbia University Abstract We consider a

More information

On corrections of classical multivariate tests for high-dimensional data

On corrections of classical multivariate tests for high-dimensional data On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics

More information

arxiv: v2 [stat.ml] 3 Sep 2015

arxiv: v2 [stat.ml] 3 Sep 2015 Nonparametric Independence Testing for Small Sample Sizes arxiv:46.9v [stat.ml] 3 Sep 5 Aaditya Ramdas Dept. of Statistics and Machine Learning Dept. Carnegie Mellon University aramdas@cs.cmu.edu September

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

Forefront of the Two Sample Problem

Forefront of the Two Sample Problem Forefront of the Two Sample Problem From classical to state-of-the-art methods Yuchi Matsuoka What is the Two Sample Problem? X ",, X % ~ P i. i. d Y ",, Y - ~ Q i. i. d Two Sample Problem P = Q? Example:

More information

Kernels for Automatic Pattern Discovery and Extrapolation

Kernels for Automatic Pattern Discovery and Extrapolation Kernels for Automatic Pattern Discovery and Extrapolation Andrew Gordon Wilson agw38@cam.ac.uk mlg.eng.cam.ac.uk/andrew University of Cambridge Joint work with Ryan Adams (Harvard) 1 / 21 Pattern Recognition

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University

More information

Generalized clustering via kernel embeddings

Generalized clustering via kernel embeddings Generalized clustering via kernel embeddings Stefanie Jegelka 1, Arthur Gretton 2,1, Bernhard Schölkopf 1, Bharath K. Sriperumbudur 3, and Ulrike von Luxburg 1 1 Max Planck Institute for Biological Cybernetics,

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University of Helsinki 14 Helsinki, Finland kun.zhang@cs.helsinki.fi Aapo Hyvärinen

More information

INDEPENDENCE MEASURES

INDEPENDENCE MEASURES INDEPENDENCE MEASURES Beatriz Bueno Larraz Máster en Investigación e Innovación en Tecnologías de la Información y las Comunicaciones. Escuela Politécnica Superior. Máster en Matemáticas y Aplicaciones.

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates : A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.

More information

Colored Maximum Variance Unfolding

Colored Maximum Variance Unfolding Colored Maximum Variance Unfolding Le Song, Alex Smola, Karsten Borgwardt and Arthur Gretton National ICT Australia, Canberra, Australia University of Cambridge, Cambridge, United Kingdom MPI for Biological

More information

Feature-to-Feature Regression for a Two-Step Conditional Independence Test

Feature-to-Feature Regression for a Two-Step Conditional Independence Test Feature-to-Feature Regression for a Two-Step Conditional Independence Test Qinyi Zhang Department of Statistics University of Oxford qinyi.zhang@stats.ox.ac.uk Sarah Filippi School of Public Health Department

More information

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,

More information

Generalizing Distance Covariance to Measure. and Test Multivariate Mutual Dependence

Generalizing Distance Covariance to Measure. and Test Multivariate Mutual Dependence Generalizing Distance Covariance to Measure and Test Multivariate Mutual Dependence arxiv:1709.02532v5 [math.st] 25 Feb 2018 Ze Jin, David S. Matteson February 27, 2018 Abstract We propose three new measures

More information

Causal Inference in Time Series via Supervised Learning

Causal Inference in Time Series via Supervised Learning Causal Inference in Time Series via Supervised Learning Yoichi Chikahara and Akinori Fujino NTT Communication Science Laboratories, Kyoto 619-0237, Japan chikahara.yoichi@lab.ntt.co.jp, fujino.akinori@lab.ntt.co.jp

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

Two-stage Sampled Learning Theory on Distributions

Two-stage Sampled Learning Theory on Distributions Two-stage Sampled Learning Theory on Distributions Zoltán Szabó (Gatsby Unit, UCL) Joint work with Arthur Gretton (Gatsby Unit, UCL), Barnabás Póczos (ML Department, CMU), Bharath K. Sriperumbudur (Department

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Bickel Rosenblatt test

Bickel Rosenblatt test University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance

More information

High-dimensional test for normality

High-dimensional test for normality High-dimensional test for normality Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Inria joint work with Alain Celisse Rennes - June 5th, 2014 Jérémie Kellner Ph.D Student University

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

The Perturbed Variation

The Perturbed Variation The Perturbed Variation Maayan Harel Department of Electrical Engineering Technion, Haifa, Israel maayanga@tx.technion.ac.il Shie Mannor Department of Electrical Engineering Technion, Haifa, Israel shie@ee.technion.ac.il

More information

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Bootstrapping high dimensional vector: interplay between dependence and dimensionality Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang

More information

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway

More information

Deep Convolutional Neural Networks for Pairwise Causality

Deep Convolutional Neural Networks for Pairwise Causality Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, Delhi Tata Consultancy Services Ltd. {karamjit.singh,

More information

A Kernel Statistical Test of Independence

A Kernel Statistical Test of Independence A Kernel Statistical Test of Independence Arthur Gretton MPI for Biological Cybernetics Tübingen, Germany arthur@tuebingen.mpg.de Kenji Fukumizu Inst. of Statistical Mathematics Tokyo Japan fukumizu@ism.ac.jp

More information

Bivariate Paired Numerical Data

Bivariate Paired Numerical Data Bivariate Paired Numerical Data Pearson s correlation, Spearman s ρ and Kendall s τ, tests of independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

3 Comparison with Other Dummy Variable Methods

3 Comparison with Other Dummy Variable Methods Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction

More information

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Kernel Change-point Analysis

Kernel Change-point Analysis Kernel Change-point Analysis Zaïd Harchaoui LTCI, TELECOM ParisTech and CNRS 46, rue Barrault, 75634 Paris cedex 3, France zaid.harchaoui@enst.fr Francis Bach Willow Project, INRIA-ENS 45, rue d Ulm, 75230

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

What is an RKHS? Dino Sejdinovic, Arthur Gretton. March 11, 2014

What is an RKHS? Dino Sejdinovic, Arthur Gretton. March 11, 2014 What is an RKHS? Dino Sejdinovic, Arthur Gretton March 11, 2014 1 Outline Normed and inner product spaces Cauchy sequences and completeness Banach and Hilbert spaces Linearity, continuity and boundedness

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 4 Jan-Willem van de Meent (credit: Yijun Zhao, Arthur Gretton Rasmussen & Williams, Percy Liang) Kernel Regression Basis function regression

More information

Hilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems

Hilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems with Applications to Dynamical Systems Le Song Jonathan Huang School of Computer Science, Carnegie Mellon University Alex Smola Yahoo! Research, Santa Clara, CA, USA Kenji Fukumizu Institute of Statistical

More information

Extreme inference in stationary time series

Extreme inference in stationary time series Extreme inference in stationary time series Moritz Jirak FOR 1735 February 8, 2013 1 / 30 Outline 1 Outline 2 Motivation The multivariate CLT Measuring discrepancies 3 Some theory and problems The problem

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Geometry on Probability Spaces

Geometry on Probability Spaces Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information