Hypothesis Testing with Kernel Embeddings on Interdependent Data
|
|
- Elinor Underwood
- 5 years ago
- Views:
Transcription
1 Hypothesis Testing with Kernel Embeddings on Interdependent Data Dino Sejdinovic Department of Statistics University of Oxford joint work with Kacper Chwialkowski and Arthur Gretton (Gatsby Unit, UCL) 9 April 215 Dagstuhl D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 1 / 19
2 Kernel Embedding feature map: x k(, x) H k instead of x (ϕ 1 (x),..., ϕ s (x)) R s k(, x), k(, y) Hk = k(x, y) inner products easily computed D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 2 / 19
3 Kernel Embedding feature map: x k(, x) H k instead of x (ϕ 1 (x),..., ϕ s (x)) R s k(, x), k(, y) Hk = k(x, y) inner products easily computed embedding: P µ k (P) = E X P k(, X ) H k instead of P (Eϕ 1 (X ),..., Eϕ s (X )) R s µ k (P), µ k (Q) Hk = E X,Y k(x, Y ) inner products easily estimated D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 2 / 19
4 Kernel Embedding feature map: x k(, x) H k instead of x (ϕ 1 (x),..., ϕ s (x)) R s k(, x), k(, y) Hk = k(x, y) inner products easily computed embedding: P µ k (P) = E X P k(, X ) H k instead of P (Eϕ 1 (X ),..., Eϕ s (X )) R s µ k (P), µ k (Q) Hk = E X,Y k(x, Y ) inner products easily estimated µ k (P) represents expectations w.r.t. P, i.e., E X f (X ) = E X f, k(, X ) Hk = f, µ k (P) Hk f H k D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 2 / 19
5 Kernel MMD Denition Kernel metric (MMD) between P and Q: MMD k (P, Q) = E X k(, X ) E Y k(, Y ) 2 H k = E XX k(x, X ) + E YY k(y, Y ) 2E X Y k(x, Y ) D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 3 / 19
6 Kernel MMD A polynomial kernel k(x, x ) = ( 1 + x x ) s on R p captures the dierence in rst s (mixed) moments only For a certain family of kernels (characteristic/universal): MMD k (P, Q) = i P = Q: Gaussian exp( 1 2σ 2 z z 2 2 ), Laplacian, inverse multiquadratics, B 2n+1 - splines... Under mild assumptions, k-mmd metrizes weak* topology on probability measures (Sriperumbudur, 21): MMD k (P n, P) P n P D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 4 / 19
7 Nonparametric two-sample tests Testing H : P = Q vs. H A : P Q based on samples {x i } nx i=1 P, {y i} ny i=1 Q. Test statistic is an estimate of MMD k (P, Q) = E XX k(x, X ) + E YY k(y, Y ) 2E X Y k(x, Y ): 1 1 MMD k = k(x i, x j ) + k(y i, y j ) n x (n x 1) n y (n y 1) Degenerate U-statistic: 2 i j n x n y i,j k(x i, y j ). i j 1 n -convergence to MMD under H A, 1 n -convergence to under H. O(n 2 ) to compute (n = n x + n y ) various approximations (block-based, random features) trade computation for power. Gretton et al (NIPS 29, JMLR 212, NIPS 212) D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 5 / 19
8 Test threshold For i.i.d. data, under H : P = Q: n x n y MMDk n x +n y r=1 λ r ( Z 2 r 1 ), {Z r } r=1 i.i.d. N (, 1) {λ r } depend on both k and P: eigenvalues of T : L 2 L 2, ˆ (Tf ) (x) f (x ) k(x, x ) dp(x ). }{{} centred Asymptotic null distribution typically estimated using a permutation test. For interdependent samples, {Z r } r=1 are correlated, with the correlation structure dependent on the correlation structure within the samples. D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 6 / 19
9 Nonparametric independence tests H : X Y H A : X Y D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 7 / 19
10 Nonparametric independence tests H : X Y P X Y = P X P Y H A : X Y P X Y P X P Y Test statistic: HSIC(X, Y ) = µ κ (ˆPXY ) µ κ (ˆPX ˆPY ) 2, H κ with κ = k l Gretton et al (25, 28); Smola et al (27); Related to distance covariance (dcov) in statistics literature Szekely et al (AoS 27, AoAS 29); S. et al (AoS 213) D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 7 / 19
11 HSIC computation k(, ) around and take naps... trademark terrier temperament... ) world - he loves to lay are alert and active with the couch potato of the terrier little bundles of energy. They Sealyham Terrier is the l(the,cairn Terriers are independent D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 8 / 19
12 HSIC computation k(, ) around and take naps... trademark terrier temperament... ) world - he loves to lay are alert and active with the couch potato of the terrier little bundles of energy. They Sealyham Terrier is the l(the,cairn Terriers are independent K= L= HSIC measures average similarity between the kernel matrices: HSIC(X, Y ) = 1 n 2 HKH, HLH H = I 1 n 11 (centering matrix) D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 8 / 19
13 HSIC computation k(, ) around and take naps... trademark terrier temperament... ) world - he loves to lay are alert and active with the couch potato of the terrier little bundles of energy. They Sealyham Terrier is the l(the,cairn Terriers are independent K= L= HSIC measures average similarity between the kernel matrices: HSIC(X, Y ) = 1 n 2 HKH, HLH H = I 1 n 11 (centering matrix) Extensions: conditional independence testing (Fukumizu, Gretton, Sun and Schölkopf, 28; Zhang, Peters, Janzing and Schölkopf, 211), three-variable interaction / V-structure discovery (S., Gretton and Bergsma, 213) D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 8 / 19
14 Kernel tests on time series Kacper Chwialkowski Arthur Gretton D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 9 / 19
15 Test calibration for dependent observations Is P the same distribution as Q? D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 1 / 19
16 Kernel MMD Denition Kernel metric (MMD) between P and Q: MMD k (P, Q) = E X k(, X ) E Y k(, Y ) 2 H k = E XX k(x, X ) + E YY k(y, Y ) 2E X Y k(x, Y ) D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 11 / 19
17 Permutation test on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,p Correct threshold Bootstrapped threshold a =.1 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 12 / 19
18 Permutation test on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,p Correct threshold Bootstrapped threshold a =.2 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 12 / 19
19 Permutation test on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,p Correct threshold Bootstrapped threshold a =.3 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 12 / 19
20 Permutation test on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,p Correct threshold Bootstrapped threshold a =.4 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 12 / 19
21 Permutation test on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,p Correct threshold Bootstrapped threshold a =.5 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 12 / 19
22 Permutation test on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,p Correct threshold Bootstrapped threshold a =.6 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 12 / 19
23 Permutation test on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,p Correct threshold Bootstrapped threshold a =.7 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 12 / 19
24 Permutation test on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,p Correct threshold Bootstrapped threshold a =.8 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 12 / 19
25 Wild Bootstrap Wild bootstrap process (Leucht and Neumann, 213): W t,n = e 1/l n W t 1,n + 1 e 2/l i.i.d. nɛ t where W,n, ɛ 1,..., ɛ n N (, 1), and n W t,n = W t,n 1 W n j=1 j,n. MMD k,wb := 1 n 2 x n x n x i=1 j=1 2 n x n y i=1 W (x) (x) i,n x W j,n x k(x i, x j ) 1 nx 2 n x n y j=1 W (x) (y) i,n x W j,n y k(x i, y j ). n y n y i=1 j=1 W (y) i,n y W (y) j,n y k(y i, y j ) Theorem (Chwialkowski, S. and Gretton, 214) Let k be bounded and Lipschitz continuous, and let {X t } P and {Y t } Q both be τ-dependent with τ(r) ( = O(r 6 ɛ ), but independent ) of each other. nx n Then, under H : P = Q, ϕ y n MMDk n x +n y, x n y P MMDk,b n x +n y as n x, n y, where ϕ is the Prokhorov metric. D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 13 / 19
26 Wild Bootstrap on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,w Correct threshold Bootstrapped threshold a =.1 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 14 / 19
27 Wild Bootstrap on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,w Correct threshold Bootstrapped threshold a =.2 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 14 / 19
28 Wild Bootstrap on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,w Correct threshold Bootstrapped threshold a =.3 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 14 / 19
29 Wild Bootstrap on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,w Correct threshold Bootstrapped threshold a =.4 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 14 / 19
30 Wild Bootstrap on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,w Correct threshold Bootstrapped threshold a =.5 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 14 / 19
31 Wild Bootstrap on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,w Correct threshold Bootstrapped threshold a =.6 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 14 / 19
32 Wild Bootstrap on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,w Correct threshold Bootstrapped threshold a =.7 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 14 / 19
33 Wild Bootstrap on AR(1): X t+1 = ax t + 1 a 2 ɛ t Density of V n Histogram of V n,w Correct threshold Bootstrapped threshold a =.8 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 14 / 19
34 Test calibration for dependent observations Two-sample test experiment \ method perm. wild MCMC diagnostics i.i.d. vs i.i.d. (H ).4.12 i.i.d. vs Gibbs (H ) Gibbs vs Gibbs (H ) value value time time time series 2 time series time series time series 1 type I error AR coeffcient type II error V b1 V b2 Shift Extinction rate D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 15 / 19
35 Time Series Coupled at a Lag type II error rate mean distance correlation sample size lag KCSD HSIC sample size X t = cos(φ t,1), φ t,1 = φ t 1,1 +.1ɛ 1,t + 2πf 1T s, ɛ 1,t i.i.d. N (, 1), Y t = [2 + C sin(φ t,1)] cos(φ t,2), φ t,2 = φ t 1,2 +.1ɛ 2,t + 2πf 2T s, ɛ 2,t i.i.d. N (, 1). Parameters: C =.4, f 1 = 4Hz,f 2 = 2Hz, 1 T s = 1Hz. - M. Besserve, N.K. Logothetis, and B. Schölkopf. Statistical analysis of coupled time series with kernel cross-spectral density operators. NIPS 213. D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 16 / 19
36 Summary Interdependent data lead to incorrect Type I control for kernel tests (too many false positives). Consistency of a wild bootstrap procedure under weak long-range dependencies (τ -mixing), applicable to both two-sample and independence tests Applications: MCMC diagnostics, time series dependence across multiple lags D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 17 / 19
37 Open questions Interdependent case: how to select parameters of the wild bootstrap / block bootstrap - requires estimating mixing properties of the time series rst? Large-scale testing: tradeos between computation and power How to interpret the discovered dierences in distributions / discovered dependence? Do we really care about all possible dierences between distributions? Tuning parameters - can select kernels/hyperparameters to directly optimize relative eciency of the test, but how does this aect tradeos with interdependent data? Sensitive interplay between the kernel hyperparameter and the wild bootstrap parameters Multivariate interaction and graphical model selection - approximations? D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 18 / 19
38 References K. Chwialkowski, DS and A. Gretton, A wild bootstrap for degenerate kernel tests. Advances in Neural Information Processing Systems (NIPS) 27, Dec DS, B. Sriperumbudur, A. Gretton and K. Fukumizu, Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Statist. 41(5): , Oct M. Besserve, N.K. Logothetis and B. Schölkopf, Statistical analysis of coupled time series with kernel cross-spectral density operators. Advances in Neural Information Processing Systems (NIPS) 26, Dec A. Leucht and M.H. Neumann, Dependent wild bootstrap for degenerate U- and V-statistics. J. Multivar. Anal. 117:257-28, 213. A. Gretton, K.M. Borgwardt, M.J. Rasch, B. Schölkopf and A. Smola, A Kernel Two-Sample Test. J. Mach. Learn. Res. 13(Mar): , 212. D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 19 / 19
39 Kernels and characteristic functions 3 Smooth density 3 Rough density Y Y X X m=512, α=.5 E-distance/dCov of Székely and Rizzo (24,25), Székely et al (27): V 2 (X, Y ) = E XY E X Y X X 2 Y Y 2 + E X E X X X 2 E Y E Y Y Y 2 [ 2E XY EX X X 2 E Y Y Y ] Type II error dist, q=1/6 dist, q=1/3 dist, q=2/3 dist, q=1 dist, q=4/3 gauss (median) frequency DS, B. Sriperumbudur, A. Gretton and K. Fukumizu, Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Annals of Statistics 41(5), p , 213. D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 2 / 19
40 Embeddings in Mercer's Expansion Mercer's Expansion For a compact metric space X, and a continous kernel k, k(x, y) = λ r Φ r (x)φ r (y), r=1 with {λ r, Φ r } r 1 eigenvalue, eigenfunction pairs of f f (x)k(, x)dp(x) on L 2 (P). H k k(, x) H k µ k (P) µ k (ˆP) µ k ( ˆQ) 2 = H k { λr Φ r (x)} l 2 { λr EΦ r (X )} l 2 ( 1 n x λ r r=1 n x t=1 Φ r (X t ) 1 n y n y t=1 Φ r (Y t ) ) 2 D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 21 / 19
41 Wild Bootstrap ρ x = n x /n, ρ y = n y /n {W t,n } 1 t n, EW t,n =, E [W t,n W t,n] = ζ lim u ζ(u) 1 ρ x ρ y n MMDk = ρ x ρ y n MMDk,wb = r=1 r=1 λ r ( ρy λ r ( ρy n x t=1 n x t=1 Φ r (X t ) nx ( t t l n ), with ρ x Φ r (X t ) W (y) t,n x nx n y t=1 ρ x ( E [Φ r (X 1 )W 1,nΦ s (X t )W t,n ] = E [Φ r (X 1 )Φ s (X t )] ζ t 1 l n ) ) 2 Φ r (Y t ) ny n y t=1 n E [Φ r (X 1 )Φ s (X t )], t, r, s provided dependence between X 1 and X t disappears fast enough (a τ-mixing condition). ) (y) 2 Φ r (Y t ) W t,n y ny D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 22 / 19
42 ICML Workshop on Large-Scale Kernel Learning Lille, France, 11 July 215 (collocated with ICML 215) Foundational algorithmic techniques for large-scale kernel learning: matrix factorization, randomization and approximation, variational inference and sampling, inducing variables, random Fourier features, unifying frameworks Interface between kernel methods and deep learning architectures Tradeos between statistical and computational eciency in kernel methods Stochastic gradient techniques with kernel methods Large-scale multiple kernel learning Large-scale representation learning with kernels Large-scale kernel methods for complex data types beyond perceptual data Conrmed speakers: Francis Bach, Neil Lawrence, Russ Salakhutdinov, Marius Kloft, Zaid Harchaoui Deadline for Submissions: Friday, May 1st, 215, 23: UTC. D. Sejdinovic (Statistics, Oxford) Testing with Kernels 9/4/15 23 / 19
Big Hypothesis Testing with Kernel Embeddings
Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big
More informationA Wild Bootstrap for Degenerate Kernel Tests
A Wild Bootstrap for Degenerate Kernel Tests Kacper Chwialkowski Department of Computer Science University College London London, Gower Street, WCE 6BT kacper.chwialkowski@gmail.com Dino Sejdinovic Gatsby
More informationTopics in kernel hypothesis testing
Topics in kernel hypothesis testing Kacper Chwialkowski A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of University College London. Department
More informationHilbert Space Representations of Probability Distributions
Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute
More informationA Linear-Time Kernel Goodness-of-Fit Test
A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum 1; Wenkai Xu 1 Zoltán Szabó 2 NIPS 2017 Best paper! Kenji Fukumizu 3 Arthur Gretton 1 wittawatj@gmail.com 1 Gatsby Unit, University College
More informationKernel Adaptive Metropolis-Hastings
Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1
More informationKernel methods for Bayesian inference
Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20
More informationLecture 2: Mappings of Probabilities to RKHS and Applications
Lecture 2: Mappings of Probabilities to RKHS and Applications Lille, 214 Arthur Gretton Gatsby Unit, CSML, UCL Detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange
More informationStatistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Michel Besserve MPI for Intelligent Systems, Tübingen michel.besserve@tuebingen.mpg.de Nikos K. Logothetis MPI
More informationHilbert Space Embedding of Probability Measures
Lecture 2 Hilbert Space Embedding of Probability Measures Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 2017 Recap of Lecture
More informationUnsupervised Nonparametric Anomaly Detection: A Kernel Method
Fifty-second Annual Allerton Conference Allerton House, UIUC, Illinois, USA October - 3, 24 Unsupervised Nonparametric Anomaly Detection: A Kernel Method Shaofeng Zou Yingbin Liang H. Vincent Poor 2 Xinghua
More informationA Linear-Time Kernel Goodness-of-Fit Test
A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum Gatsby Unit, UCL wittawatj@gmail.com Wenkai Xu Gatsby Unit, UCL wenkaix@gatsby.ucl.ac.uk Zoltán Szabó CMAP, École Polytechnique zoltan.szabo@polytechnique.edu
More informationA Kernel Test for Three-Variable Interactions
A Kernel Test for Three-Variable Interactions Dino Sejdinovic, Arthur Gretton Gatsby Unit, CSML, UCL, UK {dino.sejdinovic, arthur.gretton}@gmail.com Wicher Bergsma Department of Statistics, LSE, UK w.p.bergsma@lse.ac.uk
More informationLearning features to compare distributions
Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationLearning Interpretable Features to Compare Distributions
Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections
More informationOn a Nonparametric Notion of Residual and its Applications
On a Nonparametric Notion of Residual and its Applications Bodhisattva Sen and Gábor Székely arxiv:1409.3886v1 [stat.me] 12 Sep 2014 Columbia University and National Science Foundation September 16, 2014
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney
More informationFast Two Sample Tests using Smooth Random Features
Kacper Chwialkowski University College London, Computer Science Department Aaditya Ramdas Carnegie Mellon University, Machine Learning and Statistics School of Computer Science Dino Sejdinovic University
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationKernel Bayes Rule: Nonparametric Bayesian inference with kernels
Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December
More informationKernel Measures of Conditional Dependence
Kernel Measures of Conditional Dependence Kenji Fukumizu Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 6-8569 Japan fukumizu@ism.ac.jp Arthur Gretton Max-Planck Institute for
More informationAdvances in kernel exponential families
Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate
More informationAdaptive HMC via the Infinite Exponential Family
Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family
More informationB-tests: Low Variance Kernel Two-Sample Tests
B-tests: Low Variance Kernel Two-Sample Tests Wojciech Zaremba, Arthur Gretton, Matthew Blaschko To cite this version: Wojciech Zaremba, Arthur Gretton, Matthew Blaschko. B-tests: Low Variance Kernel Two-
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationarxiv: v1 [math.st] 24 Aug 2017
Submitted to the Annals of Statistics MULTIVARIATE DEPENDENCY MEASURE BASED ON COPULA AND GAUSSIAN KERNEL arxiv:1708.07485v1 math.st] 24 Aug 2017 By AngshumanRoy and Alok Goswami and C. A. Murthy We propose
More informationTensor Product Kernels: Characteristic Property, Universality
Tensor Product Kernels: Characteristic Property, Universality Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Hangzhou International Conference on Frontiers of Data Science
More informationDistribution Regression
Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationDistribution Regression: A Simple Technique with Minimax-optimal Guarantee
Distribution Regression: A Simple Technique with Minimax-optimal Guarantee (CMAP, École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department,
More informationStatistical Convergence of Kernel CCA
Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,
More informationNonparametric Indepedence Tests: Space Partitioning and Kernel Approaches
Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Arthur Gretton 1 and László Györfi 2 1. Gatsby Computational Neuroscience Unit London, UK 2. Budapest University of Technology
More informationKernel Sequential Monte Carlo
Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section
More informationMaximum Mean Discrepancy
Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia
More informationA Kernelized Stein Discrepancy for Goodness-of-fit Tests
Qiang Liu QLIU@CS.DARTMOUTH.EDU Computer Science, Dartmouth College, NH, 3755 Jason D. Lee JASONDLEE88@EECS.BERKELEY.EDU Michael Jordan JORDAN@CS.BERKELEY.EDU Department of Electrical Engineering and Computer
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationDistribution Regression with Minimax-Optimal Guarantee
Distribution Regression with Minimax-Optimal Guarantee (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationLearning features to compare distributions
Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this
More informationDependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise
Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Makoto Yamada and Masashi Sugiyama Department of Computer Science, Tokyo Institute of Technology
More informationRecovering Distributions from Gaussian RKHS Embeddings
Motonobu Kanagawa Graduate University for Advanced Studies kanagawa@ism.ac.jp Kenji Fukumizu Institute of Statistical Mathematics fukumizu@ism.ac.jp Abstract Recent advances of kernel methods have yielded
More informationPackage dhsic. R topics documented: July 27, 2017
Package dhsic July 27, 2017 Type Package Title Independence Testing via Hilbert Schmidt Independence Criterion Version 2.0 Date 2017-07-24 Author Niklas Pfister and Jonas Peters Maintainer Niklas Pfister
More informationarxiv: v1 [stat.me] 10 Jun 2013
A Kernel Test for Three-Variable Interactions arxiv:306.8v [stat.me] 0 Jun 03 Dino Sejdinovic, Arthur Gretton Gatsby Unit, CSML, UCL, UK {dino.sejdinovic,arthur.gretton}@gmail.com Wicher Bergsma Department
More informationMinimax-optimal distribution regression
Zoltán Szabó (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) ISNPS, Avignon June 12,
More informationOn corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR
Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional
More informationTwo-sample hypothesis testing for random dot product graphs
Two-sample hypothesis testing for random dot product graphs Minh Tang Department of Applied Mathematics and Statistics Johns Hopkins University JSM 2014 Joint work with Avanti Athreya, Vince Lyzinski,
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationManny Parzen, Cherished Teacher And a Tale of Two Kernels. Grace Wahba
Manny Parzen, Cherished Teacher And a Tale of Two Kernels Grace Wahba From Emanuel Parzen and a Tale of Two Kernels Memorial Session for Manny Parzen Joint Statistical Meetings Baltimore, MD August 2,
More informationCompressive Inference
Compressive Inference Weihong Guo and Dan Yang Case Western Reserve University and SAMSI SAMSI transition workshop Project of Compressive Inference subgroup of Imaging WG Active members: Garvesh Raskutti,
More informationMetric Embedding for Kernel Classification Rules
Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding
More informationOn Testing Independence and Goodness-of-fit in Linear Models
On Testing Independence and Goodness-of-fit in Linear Models arxiv:1302.5831v2 [stat.me] 5 May 2014 Arnab Sen and Bodhisattva Sen University of Minnesota and Columbia University Abstract We consider a
More informationOn corrections of classical multivariate tests for high-dimensional data
On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics
More informationarxiv: v2 [stat.ml] 3 Sep 2015
Nonparametric Independence Testing for Small Sample Sizes arxiv:46.9v [stat.ml] 3 Sep 5 Aaditya Ramdas Dept. of Statistics and Machine Learning Dept. Carnegie Mellon University aramdas@cs.cmu.edu September
More informationApproximate Kernel PCA with Random Features
Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,
More informationForefront of the Two Sample Problem
Forefront of the Two Sample Problem From classical to state-of-the-art methods Yuchi Matsuoka What is the Two Sample Problem? X ",, X % ~ P i. i. d Y ",, Y - ~ Q i. i. d Two Sample Problem P = Q? Example:
More informationKernels for Automatic Pattern Discovery and Extrapolation
Kernels for Automatic Pattern Discovery and Extrapolation Andrew Gordon Wilson agw38@cam.ac.uk mlg.eng.cam.ac.uk/andrew University of Cambridge Joint work with Ryan Adams (Harvard) 1 / 21 Pattern Recognition
More informationDistinguishing Causes from Effects using Nonlinear Acyclic Causal Models
JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University
More informationGeneralized clustering via kernel embeddings
Generalized clustering via kernel embeddings Stefanie Jegelka 1, Arthur Gretton 2,1, Bernhard Schölkopf 1, Bharath K. Sriperumbudur 3, and Ulrike von Luxburg 1 1 Max Planck Institute for Biological Cybernetics,
More informationDistinguishing Causes from Effects using Nonlinear Acyclic Causal Models
Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University of Helsinki 14 Helsinki, Finland kun.zhang@cs.helsinki.fi Aapo Hyvärinen
More informationINDEPENDENCE MEASURES
INDEPENDENCE MEASURES Beatriz Bueno Larraz Máster en Investigación e Innovación en Tecnologías de la Información y las Comunicaciones. Escuela Politécnica Superior. Máster en Matemáticas y Aplicaciones.
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationDivide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates
: A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.
More informationColored Maximum Variance Unfolding
Colored Maximum Variance Unfolding Le Song, Alex Smola, Karsten Borgwardt and Arthur Gretton National ICT Australia, Canberra, Australia University of Cambridge, Cambridge, United Kingdom MPI for Biological
More informationFeature-to-Feature Regression for a Two-Step Conditional Independence Test
Feature-to-Feature Regression for a Two-Step Conditional Independence Test Qinyi Zhang Department of Statistics University of Oxford qinyi.zhang@stats.ox.ac.uk Sarah Filippi School of Public Health Department
More informationCS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings
CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,
More informationGeneralizing Distance Covariance to Measure. and Test Multivariate Mutual Dependence
Generalizing Distance Covariance to Measure and Test Multivariate Mutual Dependence arxiv:1709.02532v5 [math.st] 25 Feb 2018 Ze Jin, David S. Matteson February 27, 2018 Abstract We propose three new measures
More informationCausal Inference in Time Series via Supervised Learning
Causal Inference in Time Series via Supervised Learning Yoichi Chikahara and Akinori Fujino NTT Communication Science Laboratories, Kyoto 619-0237, Japan chikahara.yoichi@lab.ntt.co.jp, fujino.akinori@lab.ntt.co.jp
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationTwo-stage Sampled Learning Theory on Distributions
Two-stage Sampled Learning Theory on Distributions Zoltán Szabó (Gatsby Unit, UCL) Joint work with Arthur Gretton (Gatsby Unit, UCL), Barnabás Póczos (ML Department, CMU), Bharath K. Sriperumbudur (Department
More informationUnsupervised dimensionality reduction
Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional
More informationBickel Rosenblatt test
University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance
More informationHigh-dimensional test for normality
High-dimensional test for normality Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Inria joint work with Alain Celisse Rennes - June 5th, 2014 Jérémie Kellner Ph.D Student University
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationThe Perturbed Variation
The Perturbed Variation Maayan Harel Department of Electrical Engineering Technion, Haifa, Israel maayanga@tx.technion.ac.il Shie Mannor Department of Electrical Engineering Technion, Haifa, Israel shie@ee.technion.ac.il
More informationBootstrapping high dimensional vector: interplay between dependence and dimensionality
Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang
More informationThe Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway
More informationDeep Convolutional Neural Networks for Pairwise Causality
Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, Delhi Tata Consultancy Services Ltd. {karamjit.singh,
More informationA Kernel Statistical Test of Independence
A Kernel Statistical Test of Independence Arthur Gretton MPI for Biological Cybernetics Tübingen, Germany arthur@tuebingen.mpg.de Kenji Fukumizu Inst. of Statistical Mathematics Tokyo Japan fukumizu@ism.ac.jp
More informationBivariate Paired Numerical Data
Bivariate Paired Numerical Data Pearson s correlation, Spearman s ρ and Kendall s τ, tests of independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More information3 Comparison with Other Dummy Variable Methods
Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction
More informationDirect Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationKernel Change-point Analysis
Kernel Change-point Analysis Zaïd Harchaoui LTCI, TELECOM ParisTech and CNRS 46, rue Barrault, 75634 Paris cedex 3, France zaid.harchaoui@enst.fr Francis Bach Willow Project, INRIA-ENS 45, rue d Ulm, 75230
More informationOn prediction and density estimation Peter McCullagh University of Chicago December 2004
On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationWhat is an RKHS? Dino Sejdinovic, Arthur Gretton. March 11, 2014
What is an RKHS? Dino Sejdinovic, Arthur Gretton March 11, 2014 1 Outline Normed and inner product spaces Cauchy sequences and completeness Banach and Hilbert spaces Linearity, continuity and boundedness
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 4 Jan-Willem van de Meent (credit: Yijun Zhao, Arthur Gretton Rasmussen & Williams, Percy Liang) Kernel Regression Basis function regression
More informationHilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems
with Applications to Dynamical Systems Le Song Jonathan Huang School of Computer Science, Carnegie Mellon University Alex Smola Yahoo! Research, Santa Clara, CA, USA Kenji Fukumizu Institute of Statistical
More informationExtreme inference in stationary time series
Extreme inference in stationary time series Moritz Jirak FOR 1735 February 8, 2013 1 / 30 Outline 1 Outline 2 Motivation The multivariate CLT Measuring discrepancies 3 Some theory and problems The problem
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationGeometry on Probability Spaces
Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,
More informationBasis Expansion and Nonlinear SVM. Kai Yu
Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion
More information