Big Hypothesis Testing with Kernel Embeddings
|
|
- Homer Lynch
- 5 years ago
- Views:
Transcription
1 Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 1 / 22
2 Heiko Strathmann Soumyajit De Wojciech Zaremba Matthew Blaschko Arthur Gretton D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 2 / 22
3 Making Hard Inference Possible many dimensions low signal-to-noise ratio highly non-linear assocations higher-order interactions X1 X2 Y D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 3 / 22
4 Making Hard Inference Possible many dimensions low signal-to-noise ratio highly non-linear assocations higher-order interactions X1 X2 Y need an expressive model and a very large number of observations D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 3 / 22
5 Making Hard Inference Possible many dimensions low signal-to-noise ratio highly non-linear assocations higher-order interactions X1 X2 Y need an expressive model and a very large number of observations cannot use batch algorithms D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 3 / 22
6 Overview 1 Kernel Embeddings and MMD 2 Scaling up Kernel Tests 3 Experiments D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 4 / 22
7 Kernel Embeddings and MMD Outline 1 Kernel Embeddings and MMD 2 Scaling up Kernel Tests 3 Experiments D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 4 / 22
8 Kernel Embeddings and MMD Kernel Embedding feature map: x k(, x) H k instead of x (ϕ 1 (x),..., ϕ s (x)) R s k(, x), k(, y) Hk = k(x, y) inner products easily computed D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 5 / 22
9 Kernel Embeddings and MMD Kernel Embedding feature map: x k(, x) H k instead of x (ϕ 1 (x),..., ϕ s (x)) R s k(, x), k(, y) Hk = k(x, y) inner products easily computed embedding: P µ k (P) = E X P k(, X ) H k instead of P (Eϕ 1 (X ),..., Eϕ s (X )) R s µ k (P), µ k (Q) Hk = E X,Y k(x, Y ) inner products easily estimated D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 5 / 22
10 Kernel Embeddings and MMD Kernel MMD (1) Denition Kernel metric (MMD) between P and Q: MMD 2 k(p, Q) = Ek(, X ) Ek(, Y ) 2 H k = E XX k(x, X ) + E YY k(y, Y ) 2E X Y k(x, Y ) D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 6 / 22
11 Kernel Embeddings and MMD Kernel MMD (2) A polynomial kernel k(z, z ) = ( 1 + z z ) s captures the dierence in rst s moments only For a certain family of kernels (characteristic): MMD k (P, Q) = 0 if and only if P = Q: Gaussian exp( 1 2σ 2 z z 2 2 ), Laplacian, inverse multiquadratics, B 2n+1 - splines... Under mild assumptions, k-mmd metrizes weak* topology on probability measures (Sriperumbudur, 2010): MMD k (P n, P) 0 P n P D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 7 / 22
12 Kernel Embeddings and MMD Nonparametric two-sample tests Testing H 0 : P = Q vs. H A : P Q based on samples {x i } nx i=1 P, {y i} ny i=1 Q. Test statistic is an estimate of MMD 2 k (P, Q) = E XX k(x, X ) + E YY k(y, Y ) 2E X Y k(x, Y ): MMD = 1 n x (n x 1) k(x i, x j ) + i j 2 k(x i, y j ). n x n y i,j 1 n y (n y 1) k(y i, y j ) i j O(n 2 ) to compute (n = n x + n y ) 1 Degenerate U-statistic: n -convergence to MMD under H A, 1 n -convergence to 0 under H 0. Gretton et al (2009, 2012) D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 8 / 22
13 Kernel Embeddings and MMD Nonparametric independence tests H 0 : X Y H A : X Y D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 9 / 22
14 Kernel Embeddings and MMD Nonparametric independence tests H 0 : X Y P X Y = P X P Y H A : X Y P X Y P X P Y Test statistic: HSIC(X, Y ) = µ κ (ˆPXY ) µ κ (ˆPX ˆPY ) 2, H κ with κ = k l Gretton et al (2005, 2008); Smola et al (2007) D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 9 / 22
15 Kernel Embeddings and MMD Nonparametric independence tests H 0 : X Y P X Y = P X P Y H A : X Y P X Y P X P Y Test statistic: HSIC(X, Y ) = µ κ (ˆPXY ) µ κ (ˆPX ˆPY ) 2, H κ with κ = k l Gretton et al (2005, 2008); Smola et al (2007) Extensions: conditional independence testing (Fukumizu, Gretton, Sun and Schölkopf, 2008; Zhang, Peters, Janzing and Schölkopf, 2011), three-variable interaction (DS, Gretton and Bergsma, 2013) D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 9 / 22
16 Scaling up Kernel Tests Outline 1 Kernel Embeddings and MMD 2 Scaling up Kernel Tests 3 Experiments D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 9 / 22
17 Scaling up Kernel Tests Test threshold under H 0 : P = Q: MMDk r=1 λ r n x n y n x +n y {λ r } depend on both k and P. expensive threshold computation: ( Z 2 r 1 ), {Z r } i.i.d. N (0, 1) Estimate leading λ r 's (requires eigendecomposition of the kernel matrix): O(n 3 ) Permutation test: #shues O(n 2 ) D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 10 / 22
18 Scaling up Kernel Tests Limited data, unlimited time MMD 2 k(p, Q) = E XX k(x, X ) + E YY k(y, Y ) 2E X Y k(x, Y ) Estimate with MMD = 1 n x (n x 1) k(x i, x j ) + i j 2 k(x i, y j ). n x n y i,j 1 n y (n y 1) k(y i, y j ) i j Complexity: O ( n 2). D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 11 / 22
19 Scaling up Kernel Tests Limited time, unlimited data MMD1 MMD2... Process mini-batches of size B at a time: ˆη k = B n/b MMD n b=1 k,b Complexity: O(nB). Provided B/n 0: 1 n -convergence to MMD if MMD n/b MMD 0, 0 under H 0. 1 nb -convergence to - A.Gretton, B.Sriperumbudur, DS, H.Strathmann, S.Balakrishnan, M.Pontil and K.Fukumizu, Optimal kernel choice for large-scale two-sample tests, NIPS W.Zaremba, A.Gretton, M.Blaschko, B-test: A Non-Parametric, Low Variance Kernel Two-Sample Test, NIPS D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 12 / 22
20 Scaling up Kernel Tests Full statistic vs. mini-batch statistic U-statistic mini-batch time O(n 2 ) O(nB) storage O(n 2 ) O(B 2 ) null distribution innite sum of chi-squares normal computing p-value O(n 3 ) or #shues O(n 2 ) O(nB) convergence rate 1/n 1/ nb n x n y (n x +n y ) 3/2 B ˆηk N ( 0, σ 2 k) under H0 σ 2 k (depends on k and P) can be unbiasedly estimated on each block in O(B 2 ) time D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 13 / 22
21 Scaling up Kernel Tests Asymptotic eciency criterion - A. Gretton, B. Sriperumbudur, DS, H. Strathmann, S. Balakrishnan, M. Pontil and K. Fukumizu, Optimal kernel choice for large-scale two-sample tests, NIPS Proposition For given P and Q. Let η k = MMD 2 k (P, Q), and let σ2 k variance of the linear-time statistic ˆη k. Then be the asymptotic k = arg max k K η k/σ k minimizes the asymptotic Type II error probability on K. D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 14 / 22
22 Scaling up Kernel Tests Asymptotic eciency criterion - A. Gretton, B. Sriperumbudur, DS, H. Strathmann, S. Balakrishnan, M. Pontil and K. Fukumizu, Optimal kernel choice for large-scale two-sample tests, NIPS Proposition For given P and Q. Let η k = MMD 2 k (P, Q), and let σ2 k variance of the linear-time statistic ˆη k. Then be the asymptotic k = arg max k K η k/σ k minimizes the asymptotic Type II error probability on K. We only have estimates of η k and σ k! Will the kernel optimization using plug-in esimates be consistent? Over what families of kernels can we perform such optimization eciently? D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 14 / 22
23 Scaling up Kernel Tests Asymptotic eciency criterion - A. Gretton, B. Sriperumbudur, DS, H. Strathmann, S. Balakrishnan, M. Pontil and K. Fukumizu, Optimal kernel choice for large-scale two-sample tests, NIPS Proposition For given P and Q. Let η k = MMD 2 k (P, Q), and let σ2 k variance of the linear-time statistic ˆη k. Then be the asymptotic k = arg max k K η k/σ k minimizes the asymptotic Type II error probability on K. We only have estimates of η k and σ k! Will the kernel optimization using plug-in esimates be consistent? yes! Over what families of kernels can we perform such optimization eciently? linear combinations (MKL) D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 14 / 22
24 Experiments Outline 1 Kernel Embeddings and MMD 2 Scaling up Kernel Tests 3 Experiments D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 14 / 22
25 Experiments Hard-to-detect dierences: Gaussian blobs Dicult problems: lengthscale of the dierence in distributions not the same as that of the distributions. D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 15 / 22
26 Experiments Hard-to-detect dierences: Gaussian blobs Dicult problems: lengthscale of the dierence in distributions not the same as that of the distributions. We distinguish grids of Gaussian blobs with dierent covariances. 35 Blob data p 35 Blob data q x2 20 y x y1 Figure : 3 3 blobs, ratio ε = 3.2 of largest-to-smallest eigenvalues of blobs in Q. D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 15 / 22
27 Experiments Gaussian blobs (2) blobs with ε = 1.4. Linear time statistic vs. Quadratic time statistic. Fixed kernel. D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 16 / 22
28 Experiments Gaussian blobs (2) blobs with ε = 1.4. Linear time statistic vs. Quadratic time statistic. Fixed kernel. m per trial Type II error Trials Quadratic 5,000 [0.7996, ] ,000 [0.5161, ] 367 > 10, 000 Buy more RAM! D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 16 / 22
29 Experiments Gaussian blobs (2) blobs with ε = 1.4. Linear time statistic vs. Quadratic time statistic. Fixed kernel. m per trial Type II error Trials Quadratic 5,000 [0.7996, ] ,000 [0.5161, ] 367 > 10, 000 Buy more RAM! Linear 100, 000, 000 [0.2250, ] , 000, 000 [0.1873, ] , 000, ± D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 16 / 22
30 Experiments Gaussian blobs (3) Type II error Opt L 2 MaxMMD MaxMMD Median ratio between eigenvalues Figure : m = 10, 000; family generated by gaussian kernels with bandwiths {2 5,..., 2 15 }. D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 17 / 22
31 Experiments Hard-to-detect dierences: UCI HIGGS - P. Baldi, P. Sadowski, and D. Whiteson. Searching for Exotic Particles in High-energy Physics with Deep Learning. Nature Communications 5, benchmark dataset for distinguishing a signature of Higgs boson vs. background joint distributions of the azimuthal angular momenta ϕ for four particle jets: low-signal, low-level features Do joint angular momenta carry any discriminating information? D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 18 / 22
32 Experiments Hard-to-detect dierences: UCI HIGGS - P. Baldi, P. Sadowski, and D. Whiteson. Searching for Exotic Particles in High-energy Physics with Deep Learning. Nature Communications 5, benchmark dataset for distinguishing a signature of Higgs boson vs. background joint distributions of the azimuthal angular momenta ϕ for four particle jets: low-signal, low-level features Do joint angular momenta carry any discriminating information? sample size: 1e4 5e4 1e5 5e5 1e6 p-value (gauss-med): D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 18 / 22
33 Experiments Hard-to-detect dierences: UCI HIGGS - P. Baldi, P. Sadowski, and D. Whiteson. Searching for Exotic Particles in High-energy Physics with Deep Learning. Nature Communications 5, benchmark dataset for distinguishing a signature of Higgs boson vs. background joint distributions of the azimuthal angular momenta ϕ for four particle jets: low-signal, low-level features Do joint angular momenta carry any discriminating information? sample size: 1e4 5e4 1e5 5e5 1e6 p-value (gauss-med): train/test size: 2e3/8e3 1e4/4e4 2e4/8e4 1e5/4e5 2e5/8e5 p-value (gauss-opt): e e-18 D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 18 / 22
34 Experiments Experiment: Independence Test ( sign ) X N (0, I d ), Y = 2 d d/2 j=1 sign (X 2j 1X 2j ) Z j + Z d 2 +1, where Z N ( 0, I d 2 +1 ) d = 50 d = 100 sign, gauss-med Power (1- Type II error) number of samples D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 19 / 22
35 Experiments Experiment: Independence Test (sine ) X 1, X 2 i.i.d. Unif [0, 2π], Y = sin(x 1 + X 2 ) + 10Z, with Z N (0, 1). sine, B = 100 brown: q = 1 brown: opt gauss: med gauss: opt N = 5e5, 1-Type II.277 ± ± ± ±.061 Type I.035 ± ± ± ±.027 N = 1e6, 1-Type II.460 ± ± ± ±.041 Type I.055 ± ± ± ±.033 D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 20 / 22
36 Experiments Shogun Written in C++ with interfaces to Python, Matlab, Java, R. Google Summer of Code (2012, 2014). D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 21 / 22
37 Experiments Summary Hypothesis testing based on kernel embeddings reveals hard-to-detect dierences between distributions and non-linear low-signal associations. D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 22 / 22
38 Experiments Summary Hypothesis testing based on kernel embeddings reveals hard-to-detect dierences between distributions and non-linear low-signal associations. A simple mini-batch procedure allows us to run the tests on large-scale problems and on streaming data. D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 22 / 22
39 Experiments Summary Hypothesis testing based on kernel embeddings reveals hard-to-detect dierences between distributions and non-linear low-signal associations. A simple mini-batch procedure allows us to run the tests on large-scale problems and on streaming data. Can select kernel parameters on-the-y in order to explicitly maximise test power. D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 22 / 22
40 Experiments Summary Hypothesis testing based on kernel embeddings reveals hard-to-detect dierences between distributions and non-linear low-signal associations. A simple mini-batch procedure allows us to run the tests on large-scale problems and on streaming data. Can select kernel parameters on-the-y in order to explicitly maximise test power. Both kernel selection and testing in O(n) time and O(1) storage (if B = const). D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 22 / 22
Hypothesis Testing with Kernel Embeddings on Interdependent Data
Hypothesis Testing with Kernel Embeddings on Interdependent Data Dino Sejdinovic Department of Statistics University of Oxford joint work with Kacper Chwialkowski and Arthur Gretton (Gatsby Unit, UCL)
More informationFast Two Sample Tests using Smooth Random Features
Kacper Chwialkowski University College London, Computer Science Department Aaditya Ramdas Carnegie Mellon University, Machine Learning and Statistics School of Computer Science Dino Sejdinovic University
More informationAdvances in kernel exponential families
Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationKernel Adaptive Metropolis-Hastings
Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1
More informationKernel Bayes Rule: Nonparametric Bayesian inference with kernels
Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December
More informationA Linear-Time Kernel Goodness-of-Fit Test
A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum 1; Wenkai Xu 1 Zoltán Szabó 2 NIPS 2017 Best paper! Kenji Fukumizu 3 Arthur Gretton 1 wittawatj@gmail.com 1 Gatsby Unit, University College
More informationB-tests: Low Variance Kernel Two-Sample Tests
B-tests: Low Variance Kernel Two-Sample Tests Wojciech Zaremba, Arthur Gretton, Matthew Blaschko To cite this version: Wojciech Zaremba, Arthur Gretton, Matthew Blaschko. B-tests: Low Variance Kernel Two-
More informationLearning features to compare distributions
Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this
More informationKernel methods for Bayesian inference
Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20
More informationHilbert Space Embedding of Probability Measures
Lecture 2 Hilbert Space Embedding of Probability Measures Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 2017 Recap of Lecture
More informationLearning Interpretable Features to Compare Distributions
Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationGeneralized clustering via kernel embeddings
Generalized clustering via kernel embeddings Stefanie Jegelka 1, Arthur Gretton 2,1, Bernhard Schölkopf 1, Bharath K. Sriperumbudur 3, and Ulrike von Luxburg 1 1 Max Planck Institute for Biological Cybernetics,
More informationarxiv: v1 [math.st] 24 Aug 2017
Submitted to the Annals of Statistics MULTIVARIATE DEPENDENCY MEASURE BASED ON COPULA AND GAUSSIAN KERNEL arxiv:1708.07485v1 math.st] 24 Aug 2017 By AngshumanRoy and Alok Goswami and C. A. Murthy We propose
More informationAdaptive HMC via the Infinite Exponential Family
Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family
More informationUnsupervised Nonparametric Anomaly Detection: A Kernel Method
Fifty-second Annual Allerton Conference Allerton House, UIUC, Illinois, USA October - 3, 24 Unsupervised Nonparametric Anomaly Detection: A Kernel Method Shaofeng Zou Yingbin Liang H. Vincent Poor 2 Xinghua
More informationHilbert Space Representations of Probability Distributions
Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck
More informationA Linear-Time Kernel Goodness-of-Fit Test
A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum Gatsby Unit, UCL wittawatj@gmail.com Wenkai Xu Gatsby Unit, UCL wenkaix@gatsby.ucl.ac.uk Zoltán Szabó CMAP, École Polytechnique zoltan.szabo@polytechnique.edu
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationKernel Sequential Monte Carlo
Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section
More informationA Wild Bootstrap for Degenerate Kernel Tests
A Wild Bootstrap for Degenerate Kernel Tests Kacper Chwialkowski Department of Computer Science University College London London, Gower Street, WCE 6BT kacper.chwialkowski@gmail.com Dino Sejdinovic Gatsby
More informationLearning features to compare distributions
Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this
More informationCS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings
CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,
More informationMaximum Mean Discrepancy
Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia
More informationSemi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis
Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analsis Matthew B. Blaschko, Christoph H. Lampert, & Arthur Gretton Ma Planck Institute for Biological Cbernetics Tübingen, German
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationSlide a window along the input arc sequence S. Least-squares estimate. σ 2. σ Estimate 1. Statistically test the difference between θ 1 and θ 2
Corner Detection 2D Image Features Corners are important two dimensional features. Two dimensional image features are interesting local structures. They include junctions of dierent types Slide 3 They
More informationOn a Nonparametric Notion of Residual and its Applications
On a Nonparametric Notion of Residual and its Applications Bodhisattva Sen and Gábor Székely arxiv:1409.3886v1 [stat.me] 12 Sep 2014 Columbia University and National Science Foundation September 16, 2014
More informationStatistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Michel Besserve MPI for Intelligent Systems, Tübingen michel.besserve@tuebingen.mpg.de Nikos K. Logothetis MPI
More informationbelow, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing
Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,
More informationMultivariate Gaussian Random Fields with SPDEs
Multivariate Gaussian Random Fields with SPDEs Xiangping Hu Daniel Simpson, Finn Lindgren and Håvard Rue Department of Mathematics, University of Oslo PASI, 214 Outline The Matérn covariance function and
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationMLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22
MLE and GMM Li Zhao, SJTU Spring, 2017 Li Zhao MLE and GMM 1 / 22 Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 2 / 22 Maximum Likelihood Estimation - Introduction For a linear model y
More informationTopics in kernel hypothesis testing
Topics in kernel hypothesis testing Kacper Chwialkowski A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of University College London. Department
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationOnline Gradient Descent Learning Algorithms
DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline
More informationHigh-dimensional test for normality
High-dimensional test for normality Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Inria joint work with Alain Celisse Rennes - June 5th, 2014 Jérémie Kellner Ph.D Student University
More informationSemi-Nonparametric Inferences for Massive Data
Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work
More informationDependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise
Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Makoto Yamada and Masashi Sugiyama Department of Computer Science, Tokyo Institute of Technology
More informationKernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages
Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages 27 April 2015 Wittawat Jitkrittum 1, Arthur Gretton 1, Nicolas Heess, S. M. Ali Eslami, Balaji Lakshminarayanan 1, Dino Sejdinovic
More informationp(z)
Chapter Statistics. Introduction This lecture is a quick review of basic statistical concepts; probabilities, mean, variance, covariance, correlation, linear regression, probability density functions and
More informationTensor Product Kernels: Characteristic Property, Universality
Tensor Product Kernels: Characteristic Property, Universality Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Hangzhou International Conference on Frontiers of Data Science
More informationLecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017
Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 46 Lecture Overview 1. Variance Component
More information1 Exponential manifold by reproducing kernel Hilbert spaces
1 Exponential manifold by reproducing kernel Hilbert spaces 1.1 Introduction The purpose of this paper is to propose a method of constructing exponential families of Hilbert manifold, on which estimation
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationPackage dhsic. R topics documented: July 27, 2017
Package dhsic July 27, 2017 Type Package Title Independence Testing via Hilbert Schmidt Independence Criterion Version 2.0 Date 2017-07-24 Author Niklas Pfister and Jonas Peters Maintainer Niklas Pfister
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationKernel Measures of Conditional Dependence
Kernel Measures of Conditional Dependence Kenji Fukumizu Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 6-8569 Japan fukumizu@ism.ac.jp Arthur Gretton Max-Planck Institute for
More informationGlobal vs. Multiscale Approaches
Harmonic Analysis on Graphs Global vs. Multiscale Approaches Weizmann Institute of Science, Rehovot, Israel July 2011 Joint work with Matan Gavish (WIS/Stanford), Ronald Coifman (Yale), ICML 10' Challenge:
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationA Kernel Test for Three-Variable Interactions
A Kernel Test for Three-Variable Interactions Dino Sejdinovic, Arthur Gretton Gatsby Unit, CSML, UCL, UK {dino.sejdinovic, arthur.gretton}@gmail.com Wicher Bergsma Department of Statistics, LSE, UK w.p.bergsma@lse.ac.uk
More informationThese outputs can be written in a more convenient form: with y(i) = Hc m (i) n(i) y(i) = (y(i); ; y K (i)) T ; c m (i) = (c m (i); ; c m K(i)) T and n
Binary Codes for synchronous DS-CDMA Stefan Bruck, Ulrich Sorger Institute for Network- and Signal Theory Darmstadt University of Technology Merckstr. 25, 6428 Darmstadt, Germany Tel.: 49 65 629, Fax:
More informationDistinguishing Causes from Effects using Nonlinear Acyclic Causal Models
Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University of Helsinki 14 Helsinki, Finland kun.zhang@cs.helsinki.fi Aapo Hyvärinen
More informationDistribution Regression: A Simple Technique with Minimax-optimal Guarantee
Distribution Regression: A Simple Technique with Minimax-optimal Guarantee (CMAP, École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department,
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationIntroduction to Machine Learning
Introduction to Machine Learning 12. Gaussian Processes Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 The Normal Distribution http://www.gaussianprocess.org/gpml/chapters/
More informationMaximum Likelihood (ML) Estimation
Econometrics 2 Fall 2004 Maximum Likelihood (ML) Estimation Heino Bohn Nielsen 1of32 Outline of the Lecture (1) Introduction. (2) ML estimation defined. (3) ExampleI:Binomialtrials. (4) Example II: Linear
More informationMultiple Kernel Learning
CS 678A Course Project Vivek Gupta, 1 Anurendra Kumar 2 Sup: Prof. Harish Karnick 1 1 Department of Computer Science and Engineering 2 Department of Electrical Engineering Indian Institute of Technology,
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationLecture 2: Mappings of Probabilities to RKHS and Applications
Lecture 2: Mappings of Probabilities to RKHS and Applications Lille, 214 Arthur Gretton Gatsby Unit, CSML, UCL Detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange
More informationExploiting k-nearest Neighbor Information with Many Data
Exploiting k-nearest Neighbor Information with Many Data 2017 TEST TECHNOLOGY WORKSHOP 2017. 10. 24 (Tue.) Yung-Kyun Noh Robotics Lab., Contents Nonparametric methods for estimating density functions Nearest
More informationRegression and Statistical Inference
Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationDistribution Regression
Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481
More informationUnsupervised dimensionality reduction
Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional
More informationUnderstanding the relationship between Functional and Structural Connectivity of Brain Networks
Understanding the relationship between Functional and Structural Connectivity of Brain Networks Sashank J. Reddi Machine Learning Department, Carnegie Mellon University SJAKKAMR@CS.CMU.EDU Abstract Background.
More informationPREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou
PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS Chenlin Wu Yuhan Lou Background Smart grid: increasing the contribution of renewable in grid energy Solar generation: intermittent and nondispatchable
More informationThe geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm
More informationMax. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes
Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationDeep Convolutional Neural Networks for Pairwise Causality
Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, Delhi Tata Consultancy Services Ltd. {karamjit.singh,
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationDistinguishing Causes from Effects using Nonlinear Acyclic Causal Models
JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University
More informationPrincipal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More informationFacebook Friends! and Matrix Functions
Facebook Friends! and Matrix Functions! Graduate Research Day Joint with David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF Kyle Kloster! Purdue University! Network Analysis Use linear algebra
More informationSliced Inverse Regression
Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationGIST 4302/5302: Spatial Analysis and Modeling
GIST 4302/5302: Spatial Analysis and Modeling Basics of Statistics Guofeng Cao www.myweb.ttu.edu/gucao Department of Geosciences Texas Tech University guofeng.cao@ttu.edu Spring 2015 Outline of This Week
More informationA Kernel on Persistence Diagrams for Machine Learning
A Kernel on Persistence Diagrams for Machine Learning Jan Reininghaus 1 Stefan Huber 1 Roland Kwitt 2 Ulrich Bauer 1 1 Institute of Science and Technology Austria 2 FB Computer Science Universität Salzburg,
More informationNonparametric Bayesian Methods
Nonparametric Bayesian Methods Debdeep Pati Florida State University October 2, 2014 Large spatial datasets (Problem of big n) Large observational and computer-generated datasets: Often have spatial and
More informationComputation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy
Computation Of Asymptotic Distribution For Semiparametric GMM Estimators Hidehiko Ichimura Graduate School of Public Policy and Graduate School of Economics University of Tokyo A Conference in honor of
More informationNonparametric Indepedence Tests: Space Partitioning and Kernel Approaches
Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Arthur Gretton 1 and László Györfi 2 1. Gatsby Computational Neuroscience Unit London, UK 2. Budapest University of Technology
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationSecond-Order Inference for Gaussian Random Curves
Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really
More informationCorner. Corners are the intersections of two edges of sufficiently different orientations.
2D Image Features Two dimensional image features are interesting local structures. They include junctions of different types like Y, T, X, and L. Much of the work on 2D features focuses on junction L,
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More information