Big Hypothesis Testing with Kernel Embeddings

Size: px
Start display at page:

Download "Big Hypothesis Testing with Kernel Embeddings"

Transcription

1 Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 1 / 22

2 Heiko Strathmann Soumyajit De Wojciech Zaremba Matthew Blaschko Arthur Gretton D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 2 / 22

3 Making Hard Inference Possible many dimensions low signal-to-noise ratio highly non-linear assocations higher-order interactions X1 X2 Y D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 3 / 22

4 Making Hard Inference Possible many dimensions low signal-to-noise ratio highly non-linear assocations higher-order interactions X1 X2 Y need an expressive model and a very large number of observations D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 3 / 22

5 Making Hard Inference Possible many dimensions low signal-to-noise ratio highly non-linear assocations higher-order interactions X1 X2 Y need an expressive model and a very large number of observations cannot use batch algorithms D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 3 / 22

6 Overview 1 Kernel Embeddings and MMD 2 Scaling up Kernel Tests 3 Experiments D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 4 / 22

7 Kernel Embeddings and MMD Outline 1 Kernel Embeddings and MMD 2 Scaling up Kernel Tests 3 Experiments D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 4 / 22

8 Kernel Embeddings and MMD Kernel Embedding feature map: x k(, x) H k instead of x (ϕ 1 (x),..., ϕ s (x)) R s k(, x), k(, y) Hk = k(x, y) inner products easily computed D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 5 / 22

9 Kernel Embeddings and MMD Kernel Embedding feature map: x k(, x) H k instead of x (ϕ 1 (x),..., ϕ s (x)) R s k(, x), k(, y) Hk = k(x, y) inner products easily computed embedding: P µ k (P) = E X P k(, X ) H k instead of P (Eϕ 1 (X ),..., Eϕ s (X )) R s µ k (P), µ k (Q) Hk = E X,Y k(x, Y ) inner products easily estimated D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 5 / 22

10 Kernel Embeddings and MMD Kernel MMD (1) Denition Kernel metric (MMD) between P and Q: MMD 2 k(p, Q) = Ek(, X ) Ek(, Y ) 2 H k = E XX k(x, X ) + E YY k(y, Y ) 2E X Y k(x, Y ) D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 6 / 22

11 Kernel Embeddings and MMD Kernel MMD (2) A polynomial kernel k(z, z ) = ( 1 + z z ) s captures the dierence in rst s moments only For a certain family of kernels (characteristic): MMD k (P, Q) = 0 if and only if P = Q: Gaussian exp( 1 2σ 2 z z 2 2 ), Laplacian, inverse multiquadratics, B 2n+1 - splines... Under mild assumptions, k-mmd metrizes weak* topology on probability measures (Sriperumbudur, 2010): MMD k (P n, P) 0 P n P D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 7 / 22

12 Kernel Embeddings and MMD Nonparametric two-sample tests Testing H 0 : P = Q vs. H A : P Q based on samples {x i } nx i=1 P, {y i} ny i=1 Q. Test statistic is an estimate of MMD 2 k (P, Q) = E XX k(x, X ) + E YY k(y, Y ) 2E X Y k(x, Y ): MMD = 1 n x (n x 1) k(x i, x j ) + i j 2 k(x i, y j ). n x n y i,j 1 n y (n y 1) k(y i, y j ) i j O(n 2 ) to compute (n = n x + n y ) 1 Degenerate U-statistic: n -convergence to MMD under H A, 1 n -convergence to 0 under H 0. Gretton et al (2009, 2012) D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 8 / 22

13 Kernel Embeddings and MMD Nonparametric independence tests H 0 : X Y H A : X Y D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 9 / 22

14 Kernel Embeddings and MMD Nonparametric independence tests H 0 : X Y P X Y = P X P Y H A : X Y P X Y P X P Y Test statistic: HSIC(X, Y ) = µ κ (ˆPXY ) µ κ (ˆPX ˆPY ) 2, H κ with κ = k l Gretton et al (2005, 2008); Smola et al (2007) D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 9 / 22

15 Kernel Embeddings and MMD Nonparametric independence tests H 0 : X Y P X Y = P X P Y H A : X Y P X Y P X P Y Test statistic: HSIC(X, Y ) = µ κ (ˆPXY ) µ κ (ˆPX ˆPY ) 2, H κ with κ = k l Gretton et al (2005, 2008); Smola et al (2007) Extensions: conditional independence testing (Fukumizu, Gretton, Sun and Schölkopf, 2008; Zhang, Peters, Janzing and Schölkopf, 2011), three-variable interaction (DS, Gretton and Bergsma, 2013) D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 9 / 22

16 Scaling up Kernel Tests Outline 1 Kernel Embeddings and MMD 2 Scaling up Kernel Tests 3 Experiments D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 9 / 22

17 Scaling up Kernel Tests Test threshold under H 0 : P = Q: MMDk r=1 λ r n x n y n x +n y {λ r } depend on both k and P. expensive threshold computation: ( Z 2 r 1 ), {Z r } i.i.d. N (0, 1) Estimate leading λ r 's (requires eigendecomposition of the kernel matrix): O(n 3 ) Permutation test: #shues O(n 2 ) D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 10 / 22

18 Scaling up Kernel Tests Limited data, unlimited time MMD 2 k(p, Q) = E XX k(x, X ) + E YY k(y, Y ) 2E X Y k(x, Y ) Estimate with MMD = 1 n x (n x 1) k(x i, x j ) + i j 2 k(x i, y j ). n x n y i,j 1 n y (n y 1) k(y i, y j ) i j Complexity: O ( n 2). D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 11 / 22

19 Scaling up Kernel Tests Limited time, unlimited data MMD1 MMD2... Process mini-batches of size B at a time: ˆη k = B n/b MMD n b=1 k,b Complexity: O(nB). Provided B/n 0: 1 n -convergence to MMD if MMD n/b MMD 0, 0 under H 0. 1 nb -convergence to - A.Gretton, B.Sriperumbudur, DS, H.Strathmann, S.Balakrishnan, M.Pontil and K.Fukumizu, Optimal kernel choice for large-scale two-sample tests, NIPS W.Zaremba, A.Gretton, M.Blaschko, B-test: A Non-Parametric, Low Variance Kernel Two-Sample Test, NIPS D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 12 / 22

20 Scaling up Kernel Tests Full statistic vs. mini-batch statistic U-statistic mini-batch time O(n 2 ) O(nB) storage O(n 2 ) O(B 2 ) null distribution innite sum of chi-squares normal computing p-value O(n 3 ) or #shues O(n 2 ) O(nB) convergence rate 1/n 1/ nb n x n y (n x +n y ) 3/2 B ˆηk N ( 0, σ 2 k) under H0 σ 2 k (depends on k and P) can be unbiasedly estimated on each block in O(B 2 ) time D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 13 / 22

21 Scaling up Kernel Tests Asymptotic eciency criterion - A. Gretton, B. Sriperumbudur, DS, H. Strathmann, S. Balakrishnan, M. Pontil and K. Fukumizu, Optimal kernel choice for large-scale two-sample tests, NIPS Proposition For given P and Q. Let η k = MMD 2 k (P, Q), and let σ2 k variance of the linear-time statistic ˆη k. Then be the asymptotic k = arg max k K η k/σ k minimizes the asymptotic Type II error probability on K. D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 14 / 22

22 Scaling up Kernel Tests Asymptotic eciency criterion - A. Gretton, B. Sriperumbudur, DS, H. Strathmann, S. Balakrishnan, M. Pontil and K. Fukumizu, Optimal kernel choice for large-scale two-sample tests, NIPS Proposition For given P and Q. Let η k = MMD 2 k (P, Q), and let σ2 k variance of the linear-time statistic ˆη k. Then be the asymptotic k = arg max k K η k/σ k minimizes the asymptotic Type II error probability on K. We only have estimates of η k and σ k! Will the kernel optimization using plug-in esimates be consistent? Over what families of kernels can we perform such optimization eciently? D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 14 / 22

23 Scaling up Kernel Tests Asymptotic eciency criterion - A. Gretton, B. Sriperumbudur, DS, H. Strathmann, S. Balakrishnan, M. Pontil and K. Fukumizu, Optimal kernel choice for large-scale two-sample tests, NIPS Proposition For given P and Q. Let η k = MMD 2 k (P, Q), and let σ2 k variance of the linear-time statistic ˆη k. Then be the asymptotic k = arg max k K η k/σ k minimizes the asymptotic Type II error probability on K. We only have estimates of η k and σ k! Will the kernel optimization using plug-in esimates be consistent? yes! Over what families of kernels can we perform such optimization eciently? linear combinations (MKL) D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 14 / 22

24 Experiments Outline 1 Kernel Embeddings and MMD 2 Scaling up Kernel Tests 3 Experiments D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 14 / 22

25 Experiments Hard-to-detect dierences: Gaussian blobs Dicult problems: lengthscale of the dierence in distributions not the same as that of the distributions. D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 15 / 22

26 Experiments Hard-to-detect dierences: Gaussian blobs Dicult problems: lengthscale of the dierence in distributions not the same as that of the distributions. We distinguish grids of Gaussian blobs with dierent covariances. 35 Blob data p 35 Blob data q x2 20 y x y1 Figure : 3 3 blobs, ratio ε = 3.2 of largest-to-smallest eigenvalues of blobs in Q. D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 15 / 22

27 Experiments Gaussian blobs (2) blobs with ε = 1.4. Linear time statistic vs. Quadratic time statistic. Fixed kernel. D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 16 / 22

28 Experiments Gaussian blobs (2) blobs with ε = 1.4. Linear time statistic vs. Quadratic time statistic. Fixed kernel. m per trial Type II error Trials Quadratic 5,000 [0.7996, ] ,000 [0.5161, ] 367 > 10, 000 Buy more RAM! D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 16 / 22

29 Experiments Gaussian blobs (2) blobs with ε = 1.4. Linear time statistic vs. Quadratic time statistic. Fixed kernel. m per trial Type II error Trials Quadratic 5,000 [0.7996, ] ,000 [0.5161, ] 367 > 10, 000 Buy more RAM! Linear 100, 000, 000 [0.2250, ] , 000, 000 [0.1873, ] , 000, ± D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 16 / 22

30 Experiments Gaussian blobs (3) Type II error Opt L 2 MaxMMD MaxMMD Median ratio between eigenvalues Figure : m = 10, 000; family generated by gaussian kernels with bandwiths {2 5,..., 2 15 }. D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 17 / 22

31 Experiments Hard-to-detect dierences: UCI HIGGS - P. Baldi, P. Sadowski, and D. Whiteson. Searching for Exotic Particles in High-energy Physics with Deep Learning. Nature Communications 5, benchmark dataset for distinguishing a signature of Higgs boson vs. background joint distributions of the azimuthal angular momenta ϕ for four particle jets: low-signal, low-level features Do joint angular momenta carry any discriminating information? D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 18 / 22

32 Experiments Hard-to-detect dierences: UCI HIGGS - P. Baldi, P. Sadowski, and D. Whiteson. Searching for Exotic Particles in High-energy Physics with Deep Learning. Nature Communications 5, benchmark dataset for distinguishing a signature of Higgs boson vs. background joint distributions of the azimuthal angular momenta ϕ for four particle jets: low-signal, low-level features Do joint angular momenta carry any discriminating information? sample size: 1e4 5e4 1e5 5e5 1e6 p-value (gauss-med): D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 18 / 22

33 Experiments Hard-to-detect dierences: UCI HIGGS - P. Baldi, P. Sadowski, and D. Whiteson. Searching for Exotic Particles in High-energy Physics with Deep Learning. Nature Communications 5, benchmark dataset for distinguishing a signature of Higgs boson vs. background joint distributions of the azimuthal angular momenta ϕ for four particle jets: low-signal, low-level features Do joint angular momenta carry any discriminating information? sample size: 1e4 5e4 1e5 5e5 1e6 p-value (gauss-med): train/test size: 2e3/8e3 1e4/4e4 2e4/8e4 1e5/4e5 2e5/8e5 p-value (gauss-opt): e e-18 D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 18 / 22

34 Experiments Experiment: Independence Test ( sign ) X N (0, I d ), Y = 2 d d/2 j=1 sign (X 2j 1X 2j ) Z j + Z d 2 +1, where Z N ( 0, I d 2 +1 ) d = 50 d = 100 sign, gauss-med Power (1- Type II error) number of samples D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 19 / 22

35 Experiments Experiment: Independence Test (sine ) X 1, X 2 i.i.d. Unif [0, 2π], Y = sin(x 1 + X 2 ) + 10Z, with Z N (0, 1). sine, B = 100 brown: q = 1 brown: opt gauss: med gauss: opt N = 5e5, 1-Type II.277 ± ± ± ±.061 Type I.035 ± ± ± ±.027 N = 1e6, 1-Type II.460 ± ± ± ±.041 Type I.055 ± ± ± ±.033 D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 20 / 22

36 Experiments Shogun Written in C++ with interfaces to Python, Matlab, Java, R. Google Summer of Code (2012, 2014). D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 21 / 22

37 Experiments Summary Hypothesis testing based on kernel embeddings reveals hard-to-detect dierences between distributions and non-linear low-signal associations. D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 22 / 22

38 Experiments Summary Hypothesis testing based on kernel embeddings reveals hard-to-detect dierences between distributions and non-linear low-signal associations. A simple mini-batch procedure allows us to run the tests on large-scale problems and on streaming data. D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 22 / 22

39 Experiments Summary Hypothesis testing based on kernel embeddings reveals hard-to-detect dierences between distributions and non-linear low-signal associations. A simple mini-batch procedure allows us to run the tests on large-scale problems and on streaming data. Can select kernel parameters on-the-y in order to explicitly maximise test power. D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 22 / 22

40 Experiments Summary Hypothesis testing based on kernel embeddings reveals hard-to-detect dierences between distributions and non-linear low-signal associations. A simple mini-batch procedure allows us to run the tests on large-scale problems and on streaming data. Can select kernel parameters on-the-y in order to explicitly maximise test power. Both kernel selection and testing in O(n) time and O(1) storage (if B = const). D. Sejdinovic (Statistics, Oxford) Big Testing with Kernels 09/01/15 22 / 22

Hypothesis Testing with Kernel Embeddings on Interdependent Data

Hypothesis Testing with Kernel Embeddings on Interdependent Data Hypothesis Testing with Kernel Embeddings on Interdependent Data Dino Sejdinovic Department of Statistics University of Oxford joint work with Kacper Chwialkowski and Arthur Gretton (Gatsby Unit, UCL)

More information

Fast Two Sample Tests using Smooth Random Features

Fast Two Sample Tests using Smooth Random Features Kacper Chwialkowski University College London, Computer Science Department Aaditya Ramdas Carnegie Mellon University, Machine Learning and Statistics School of Computer Science Dino Sejdinovic University

More information

Advances in kernel exponential families

Advances in kernel exponential families Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Kernel Adaptive Metropolis-Hastings

Kernel Adaptive Metropolis-Hastings Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

A Linear-Time Kernel Goodness-of-Fit Test

A Linear-Time Kernel Goodness-of-Fit Test A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum 1; Wenkai Xu 1 Zoltán Szabó 2 NIPS 2017 Best paper! Kenji Fukumizu 3 Arthur Gretton 1 wittawatj@gmail.com 1 Gatsby Unit, University College

More information

B-tests: Low Variance Kernel Two-Sample Tests

B-tests: Low Variance Kernel Two-Sample Tests B-tests: Low Variance Kernel Two-Sample Tests Wojciech Zaremba, Arthur Gretton, Matthew Blaschko To cite this version: Wojciech Zaremba, Arthur Gretton, Matthew Blaschko. B-tests: Low Variance Kernel Two-

More information

Learning features to compare distributions

Learning features to compare distributions Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this

More information

Kernel methods for Bayesian inference

Kernel methods for Bayesian inference Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20

More information

Hilbert Space Embedding of Probability Measures

Hilbert Space Embedding of Probability Measures Lecture 2 Hilbert Space Embedding of Probability Measures Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 2017 Recap of Lecture

More information

Learning Interpretable Features to Compare Distributions

Learning Interpretable Features to Compare Distributions Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney

More information

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,

More information

Generalized clustering via kernel embeddings

Generalized clustering via kernel embeddings Generalized clustering via kernel embeddings Stefanie Jegelka 1, Arthur Gretton 2,1, Bernhard Schölkopf 1, Bharath K. Sriperumbudur 3, and Ulrike von Luxburg 1 1 Max Planck Institute for Biological Cybernetics,

More information

arxiv: v1 [math.st] 24 Aug 2017

arxiv: v1 [math.st] 24 Aug 2017 Submitted to the Annals of Statistics MULTIVARIATE DEPENDENCY MEASURE BASED ON COPULA AND GAUSSIAN KERNEL arxiv:1708.07485v1 math.st] 24 Aug 2017 By AngshumanRoy and Alok Goswami and C. A. Murthy We propose

More information

Adaptive HMC via the Infinite Exponential Family

Adaptive HMC via the Infinite Exponential Family Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family

More information

Unsupervised Nonparametric Anomaly Detection: A Kernel Method

Unsupervised Nonparametric Anomaly Detection: A Kernel Method Fifty-second Annual Allerton Conference Allerton House, UIUC, Illinois, USA October - 3, 24 Unsupervised Nonparametric Anomaly Detection: A Kernel Method Shaofeng Zou Yingbin Liang H. Vincent Poor 2 Xinghua

More information

Hilbert Space Representations of Probability Distributions

Hilbert Space Representations of Probability Distributions Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck

More information

A Linear-Time Kernel Goodness-of-Fit Test

A Linear-Time Kernel Goodness-of-Fit Test A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum Gatsby Unit, UCL wittawatj@gmail.com Wenkai Xu Gatsby Unit, UCL wenkaix@gatsby.ucl.ac.uk Zoltán Szabó CMAP, École Polytechnique zoltan.szabo@polytechnique.edu

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

A Wild Bootstrap for Degenerate Kernel Tests

A Wild Bootstrap for Degenerate Kernel Tests A Wild Bootstrap for Degenerate Kernel Tests Kacper Chwialkowski Department of Computer Science University College London London, Gower Street, WCE 6BT kacper.chwialkowski@gmail.com Dino Sejdinovic Gatsby

More information

Learning features to compare distributions

Learning features to compare distributions Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this

More information

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,

More information

Maximum Mean Discrepancy

Maximum Mean Discrepancy Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia

More information

Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis

Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analsis Matthew B. Blaschko, Christoph H. Lampert, & Arthur Gretton Ma Planck Institute for Biological Cbernetics Tübingen, German

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Slide a window along the input arc sequence S. Least-squares estimate. σ 2. σ Estimate 1. Statistically test the difference between θ 1 and θ 2

Slide a window along the input arc sequence S. Least-squares estimate. σ 2. σ Estimate 1. Statistically test the difference between θ 1 and θ 2 Corner Detection 2D Image Features Corners are important two dimensional features. Two dimensional image features are interesting local structures. They include junctions of dierent types Slide 3 They

More information

On a Nonparametric Notion of Residual and its Applications

On a Nonparametric Notion of Residual and its Applications On a Nonparametric Notion of Residual and its Applications Bodhisattva Sen and Gábor Székely arxiv:1409.3886v1 [stat.me] 12 Sep 2014 Columbia University and National Science Foundation September 16, 2014

More information

Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.

Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Michel Besserve MPI for Intelligent Systems, Tübingen michel.besserve@tuebingen.mpg.de Nikos K. Logothetis MPI

More information

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,

More information

Multivariate Gaussian Random Fields with SPDEs

Multivariate Gaussian Random Fields with SPDEs Multivariate Gaussian Random Fields with SPDEs Xiangping Hu Daniel Simpson, Finn Lindgren and Håvard Rue Department of Mathematics, University of Oslo PASI, 214 Outline The Matérn covariance function and

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22 MLE and GMM Li Zhao, SJTU Spring, 2017 Li Zhao MLE and GMM 1 / 22 Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 2 / 22 Maximum Likelihood Estimation - Introduction For a linear model y

More information

Topics in kernel hypothesis testing

Topics in kernel hypothesis testing Topics in kernel hypothesis testing Kacper Chwialkowski A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of University College London. Department

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

Online Gradient Descent Learning Algorithms

Online Gradient Descent Learning Algorithms DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline

More information

High-dimensional test for normality

High-dimensional test for normality High-dimensional test for normality Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Inria joint work with Alain Celisse Rennes - June 5th, 2014 Jérémie Kellner Ph.D Student University

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise

Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Makoto Yamada and Masashi Sugiyama Department of Computer Science, Tokyo Institute of Technology

More information

Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages

Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages 27 April 2015 Wittawat Jitkrittum 1, Arthur Gretton 1, Nicolas Heess, S. M. Ali Eslami, Balaji Lakshminarayanan 1, Dino Sejdinovic

More information

p(z)

p(z) Chapter Statistics. Introduction This lecture is a quick review of basic statistical concepts; probabilities, mean, variance, covariance, correlation, linear regression, probability density functions and

More information

Tensor Product Kernels: Characteristic Property, Universality

Tensor Product Kernels: Characteristic Property, Universality Tensor Product Kernels: Characteristic Property, Universality Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Hangzhou International Conference on Frontiers of Data Science

More information

Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017

Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017 Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 46 Lecture Overview 1. Variance Component

More information

1 Exponential manifold by reproducing kernel Hilbert spaces

1 Exponential manifold by reproducing kernel Hilbert spaces 1 Exponential manifold by reproducing kernel Hilbert spaces 1.1 Introduction The purpose of this paper is to propose a method of constructing exponential families of Hilbert manifold, on which estimation

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Package dhsic. R topics documented: July 27, 2017

Package dhsic. R topics documented: July 27, 2017 Package dhsic July 27, 2017 Type Package Title Independence Testing via Hilbert Schmidt Independence Criterion Version 2.0 Date 2017-07-24 Author Niklas Pfister and Jonas Peters Maintainer Niklas Pfister

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Kernel Measures of Conditional Dependence

Kernel Measures of Conditional Dependence Kernel Measures of Conditional Dependence Kenji Fukumizu Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 6-8569 Japan fukumizu@ism.ac.jp Arthur Gretton Max-Planck Institute for

More information

Global vs. Multiscale Approaches

Global vs. Multiscale Approaches Harmonic Analysis on Graphs Global vs. Multiscale Approaches Weizmann Institute of Science, Rehovot, Israel July 2011 Joint work with Matan Gavish (WIS/Stanford), Ronald Coifman (Yale), ICML 10' Challenge:

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

A Kernel Test for Three-Variable Interactions

A Kernel Test for Three-Variable Interactions A Kernel Test for Three-Variable Interactions Dino Sejdinovic, Arthur Gretton Gatsby Unit, CSML, UCL, UK {dino.sejdinovic, arthur.gretton}@gmail.com Wicher Bergsma Department of Statistics, LSE, UK w.p.bergsma@lse.ac.uk

More information

These outputs can be written in a more convenient form: with y(i) = Hc m (i) n(i) y(i) = (y(i); ; y K (i)) T ; c m (i) = (c m (i); ; c m K(i)) T and n

These outputs can be written in a more convenient form: with y(i) = Hc m (i) n(i) y(i) = (y(i); ; y K (i)) T ; c m (i) = (c m (i); ; c m K(i)) T and n Binary Codes for synchronous DS-CDMA Stefan Bruck, Ulrich Sorger Institute for Network- and Signal Theory Darmstadt University of Technology Merckstr. 25, 6428 Darmstadt, Germany Tel.: 49 65 629, Fax:

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University of Helsinki 14 Helsinki, Finland kun.zhang@cs.helsinki.fi Aapo Hyvärinen

More information

Distribution Regression: A Simple Technique with Minimax-optimal Guarantee

Distribution Regression: A Simple Technique with Minimax-optimal Guarantee Distribution Regression: A Simple Technique with Minimax-optimal Guarantee (CMAP, École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department,

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 12. Gaussian Processes Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 The Normal Distribution http://www.gaussianprocess.org/gpml/chapters/

More information

Maximum Likelihood (ML) Estimation

Maximum Likelihood (ML) Estimation Econometrics 2 Fall 2004 Maximum Likelihood (ML) Estimation Heino Bohn Nielsen 1of32 Outline of the Lecture (1) Introduction. (2) ML estimation defined. (3) ExampleI:Binomialtrials. (4) Example II: Linear

More information

Multiple Kernel Learning

Multiple Kernel Learning CS 678A Course Project Vivek Gupta, 1 Anurendra Kumar 2 Sup: Prof. Harish Karnick 1 1 Department of Computer Science and Engineering 2 Department of Electrical Engineering Indian Institute of Technology,

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Lecture 2: Mappings of Probabilities to RKHS and Applications

Lecture 2: Mappings of Probabilities to RKHS and Applications Lecture 2: Mappings of Probabilities to RKHS and Applications Lille, 214 Arthur Gretton Gatsby Unit, CSML, UCL Detecting differences in brain signals The problem: Dolocalfieldpotential(LFP)signalschange

More information

Exploiting k-nearest Neighbor Information with Many Data

Exploiting k-nearest Neighbor Information with Many Data Exploiting k-nearest Neighbor Information with Many Data 2017 TEST TECHNOLOGY WORKSHOP 2017. 10. 24 (Tue.) Yung-Kyun Noh Robotics Lab., Contents Nonparametric methods for estimating density functions Nearest

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Distribution Regression

Distribution Regression Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Understanding the relationship between Functional and Structural Connectivity of Brain Networks

Understanding the relationship between Functional and Structural Connectivity of Brain Networks Understanding the relationship between Functional and Structural Connectivity of Brain Networks Sashank J. Reddi Machine Learning Department, Carnegie Mellon University SJAKKAMR@CS.CMU.EDU Abstract Background.

More information

PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou

PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS Chenlin Wu Yuhan Lou Background Smart grid: increasing the contribution of renewable in grid energy Solar generation: intermittent and nondispatchable

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Deep Convolutional Neural Networks for Pairwise Causality

Deep Convolutional Neural Networks for Pairwise Causality Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, Delhi Tata Consultancy Services Ltd. {karamjit.singh,

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Facebook Friends! and Matrix Functions

Facebook Friends! and Matrix Functions Facebook Friends! and Matrix Functions! Graduate Research Day Joint with David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF Kyle Kloster! Purdue University! Network Analysis Use linear algebra

More information

Sliced Inverse Regression

Sliced Inverse Regression Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

GIST 4302/5302: Spatial Analysis and Modeling

GIST 4302/5302: Spatial Analysis and Modeling GIST 4302/5302: Spatial Analysis and Modeling Basics of Statistics Guofeng Cao www.myweb.ttu.edu/gucao Department of Geosciences Texas Tech University guofeng.cao@ttu.edu Spring 2015 Outline of This Week

More information

A Kernel on Persistence Diagrams for Machine Learning

A Kernel on Persistence Diagrams for Machine Learning A Kernel on Persistence Diagrams for Machine Learning Jan Reininghaus 1 Stefan Huber 1 Roland Kwitt 2 Ulrich Bauer 1 1 Institute of Science and Technology Austria 2 FB Computer Science Universität Salzburg,

More information

Nonparametric Bayesian Methods

Nonparametric Bayesian Methods Nonparametric Bayesian Methods Debdeep Pati Florida State University October 2, 2014 Large spatial datasets (Problem of big n) Large observational and computer-generated datasets: Often have spatial and

More information

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy Computation Of Asymptotic Distribution For Semiparametric GMM Estimators Hidehiko Ichimura Graduate School of Public Policy and Graduate School of Economics University of Tokyo A Conference in honor of

More information

Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches

Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Arthur Gretton 1 and László Györfi 2 1. Gatsby Computational Neuroscience Unit London, UK 2. Budapest University of Technology

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Second-Order Inference for Gaussian Random Curves

Second-Order Inference for Gaussian Random Curves Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

Corner. Corners are the intersections of two edges of sufficiently different orientations.

Corner. Corners are the intersections of two edges of sufficiently different orientations. 2D Image Features Two dimensional image features are interesting local structures. They include junctions of different types like Y, T, X, and L. Much of the work on 2D features focuses on junction L,

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information