High-dimensional test for normality

Size: px
Start display at page:

Download "High-dimensional test for normality"

Transcription

1 High-dimensional test for normality Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Inria joint work with Alain Celisse Rennes - June 5th, 2014 Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

2 Framework Input space X of any kind: Scalars or vectors, Structured objects (strings, graphs, trees,... ), Functional space,... érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

3 Working in kernel space X 1,..., X n X i.i.d. Positive semi-definite kernel k : X X R Mappings Y i = k(x i,.) H(k) Definition (RKHS) H(k) = Span{k(x,.) x X } Reproducing property: x, y X, < k(x,.), k(y,.) > H(k) = k(x, y) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

4 Gaussian process in RKHS Gaussian assumption in high-dimensional/kernel spaces Mean equality test in a high-dimensional space (Srivastava et al., 2013) Supervised/unsupervised classification using Gaussian mixtures in kernel space (Bouveyron et al., 2012) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

5 Gaussian process in RKHS Gaussian assumption in high-dimensional/kernel spaces Mean equality test in a high-dimensional space (Srivastava et al., 2013) Supervised/unsupervised classification using Gaussian mixtures in kernel space (Bouveyron et al., 2012) Gaussian process Z GP(µ, Σ) iff h H(k), < Z, h > N (< µ, h >, < Σh, h >) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

6 Gaussian process in RKHS Gaussian assumption in high-dimensional/kernel spaces Mean equality test in a high-dimensional space (Srivastava et al., 2013) Supervised/unsupervised classification using Gaussian mixtures in kernel space (Bouveyron et al., 2012) Gaussian process Z GP(µ, Σ) iff h H(k), < Z, h > N (< µ, h >, < Σh, h >) Goal Test H 0 : P = P 0 vs H A : P P 0, where P 0 = GP(µ, Σ) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

7 Outline 1 Introduction 2 Laplace-MMD Distinguishing between distributions with MMD Removing the characteristic kernel assumption L-MMD test 3 Assessment Theoretical assessment Empirical assessment 4 Conclusion érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

8 Distinguishing between distributions with MMD Distinguishing distributions with MMD MMD (Gretton et al., 2007) Y, Z two r.v. in any set X. MMD(Y, Z) = sup E Y f (Y ) E Z f (Z) f H(k), f 1 Advantage: MMD can be computed as a distance between two elements of H(k) (easy calculation), Problem: MMD is a metric on distributions only for some k (characteristic kernels). érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

9 Removing the characteristic kernel assumption Consider Laplace transforms of P and P 0 on H(k) L P (f ) = E Y P e <Y,f > H(k), L P0 (f ) = E Z P0 e <Z,f > H(k) Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

10 Removing the characteristic kernel assumption Consider Laplace transforms of P and P 0 on H(k) L P (f ) = E Y P e <Y,f > H(k), L P0 (f ) = E Z P0 e <Z,f > H(k) Compare L P with L P0 (P, P 0 ) = sup L P (f ) L P0 (f ) f 1 Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

11 Removing the characteristic kernel assumption Consider Laplace transforms of P and P 0 on H(k) L P (f ) = E Y P e <Y,f > H(k), L P0 (f ) = E Z P0 e <Z,f > H(k) Compare L P with L P0 We get the desired property (P, P 0 ) = sup L P (f ) L P0 (f ) f 1 (P, P 0 ) = 0 = P = P 0 without requiring that k is characteristic. Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

12 Removing the characteristic kernel assumption Introducing a second RKHS Get a computable expression for (P, P 0 ) = sup E Y k(y, f ) E Z k(z, f ) f 1 via kernel k = exp(<.,. > H(k) ) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

13 Removing the characteristic kernel assumption (P, P 0) = sup EY k(y, f ) E Z k(z, f ) f 1 érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

14 Removing the characteristic kernel assumption (P, P 0) = sup EY k(y, f ) E Z k(z, f ) f 1 = sup E P < k(y,.), k(f,.) > E P0 < k(z,.), k(f,.) > f 1 (from reproducing property) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

15 Removing the characteristic kernel assumption (P, P 0) = sup EY k(y, f ) E Z k(z, f ) f 1 = sup E P < k(y,.), k(f,.) > E P0 < k(z,.), k(f,.) > f 1 = sup < µ P µ P0, k(f,.) > H( k) f 1 (from reproducing property) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

16 e 1/2 µ P µ P0 H( k) (from Cauchy-Schwarz) Removing the characteristic kernel assumption (P, P 0) = sup EY k(y, f ) E Z k(z, f ) f 1 = sup E P < k(y,.), k(f,.) > E P0 < k(z,.), k(f,.) > f 1 = sup < µ P µ P0, k(f,.) > H( k) f 1 (from reproducing property) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

17 Removing the characteristic kernel assumption (P, P 0) = sup EY k(y, f ) E Z k(z, f ) f 1 = sup E P < k(y,.), k(f,.) > E P0 < k(z,.), k(f,.) > f 1 = sup < µ P µ P0, k(f,.) > H( k) f 1 (from reproducing property) e 1/2 µ P µ P0 H( k) (from Cauchy-Schwarz) Definition (Laplace-MMD) Assume max(e P e Y 2 /2, E P0 e Z 2 /2 ) < +. L is an easy-to-handle quantity: µ P estimated by µˆp (sample mean) Expand the (squared) norm L = µ P µ P0 = 0 P = P 0 Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

18 L-MMD test L-MMD test Gram matrix: K = [k(x i, X j)] i,j Proposition (K., 2013) Assume P 0 = GP(0, Σ) and ρ(σ) < 1. Then, nˆl 2 = 1 n 1 n n e K i,j 2 i j i=1 is an unbiaised estimator of nl 2. [ ] e [K 2 1/2 ] i,i /(2n) + n det(i n 2 K 2 ) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

19 L-MMD test L-MMD test Gram matrix: K = [k(x i, X j)] i,j Proposition (K., 2013) Assume P 0 = GP(0, Σ) and ρ(σ) < 1. Then, nˆl 2 = 1 n 1 n n e K i,j 2 i j i=1 is an unbiaised estimator of nl 2. Rejection region Generate nˆl 2 (1)... nˆl 2 (B) under H 0 Set ˆq α,n := nˆl 2 (t) where t = t(α) Reject H 0 if nˆl 2 ˆq α,n, accept otherwise. [ ] e [K 2 1/2 ] i,i /(2n) + n det(i n 2 K 2 ) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

20 Outline 1 Introduction 2 Laplace-MMD Distinguishing between distributions with MMD Removing the characteristic kernel assumption L-MMD test 3 Assessment Theoretical assessment Empirical assessment 4 Conclusion érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

21 Theoretical assessment Type-II error: theoretical bound Theorem (K., 2014): If Y M P-a.s. Then for n > qα,n+m(2) P L 2 { [ P HA (nˆl 2 ˆq α,n ) 1 + o B (1/ ] B) exp n L n 1 where = q α,n + m (2) P = 2m (2) P m (2) P L2 exp(m 2 /2) + o n (1) m (2) P = E Y P k(y, ) µ P 2 H( k) = E k(y, ) E [ k(y, ) } 2 ] 2 H( k) Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

22 Empirical assessment Synthetic data (finite d): X = R d, k =<.,. > R d : L-MMD used as a multivariate normality test Common multivariate normality tests lose power when d large 1 Henze-Zirkler (characteristic functions, L 2 distance) 2 Energy distance (pairwise distance) Alternative: mixture of two Gaussians N (µ 1, Σ) and N (µ 2, Σ) Two cases: low dimension (d = 2), larger dimension (d = 50) Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

23 Empirical assessment Real data (d = + ): USPS236 dataset input space X = R 64 Gaussian kernel k(x, y) = exp( (2σ 2 ) 1 x y 2 ) Compare L-MMD with Random Projection method = Kolmogorov-Smirnov (univariate) test on p random projections Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

24 Conclusion Summary: High-dimensional test for normality Bypassed characteristic assumption Mild sensitivity to high-dimensionality Further works: In practice, µ and Σ unknown How does parameters estimations affect Type-I/II errors? Type-I adjustement method within this framework? Extension to two-sample homogeneity test érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

25 Conclusion Summary: High-dimensional test for normality Bypassed characteristic assumption Mild sensitivity to high-dimensionality Further works: In practice, µ and Σ unknown How does parameters estimations affect Type-I/II errors? Type-I adjustement method within this framework? Extension to two-sample homogeneity test Merci pour votre attention. érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

Maximum Mean Discrepancy

Maximum Mean Discrepancy Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia

More information

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Computation time/accuracy trade-off and linear regression

Computation time/accuracy trade-off and linear regression Computation time/accuracy trade-off and linear regression Maxime BRUNIN & Christophe BIERNACKI & Alain CELISSE Laboratoire Paul Painlevé, Université de Lille, Science et Technologie INRIA Lille-Nord Europe,

More information

Kernel change-point detection

Kernel change-point detection 1,2 (joint work with Alain Celisse 3 & Zaïd Harchaoui 4 ) 1 Cnrs 2 École Normale Supérieure (Paris), DIENS, Équipe Sierra 3 Université Lille 1 4 INRIA Grenoble Workshop Kernel methods for big data, Lille,

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Learning Interpretable Features to Compare Distributions

Learning Interpretable Features to Compare Distributions Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections

More information

Hilbert Space Representations of Probability Distributions

Hilbert Space Representations of Probability Distributions Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

INDEPENDENCE MEASURES

INDEPENDENCE MEASURES INDEPENDENCE MEASURES Beatriz Bueno Larraz Máster en Investigación e Innovación en Tecnologías de la Información y las Comunicaciones. Escuela Politécnica Superior. Máster en Matemáticas y Aplicaciones.

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information

Kernels to detect abrupt changes in time series

Kernels to detect abrupt changes in time series 1 UMR 8524 CNRS - Université Lille 1 2 Modal INRIA team-project 3 SSB group Paris joint work with S. Arlot, Z. Harchaoui, G. Rigaill, and G. Marot Computational and statistical trade-offs in learning IHES

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Advances in kernel exponential families

Advances in kernel exponential families Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate

More information

Forefront of the Two Sample Problem

Forefront of the Two Sample Problem Forefront of the Two Sample Problem From classical to state-of-the-art methods Yuchi Matsuoka What is the Two Sample Problem? X ",, X % ~ P i. i. d Y ",, Y - ~ Q i. i. d Two Sample Problem P = Q? Example:

More information

Regularization in Reproducing Kernel Banach Spaces

Regularization in Reproducing Kernel Banach Spaces .... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred

More information

Big Hypothesis Testing with Kernel Embeddings

Big Hypothesis Testing with Kernel Embeddings Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

CS 7140: Advanced Machine Learning

CS 7140: Advanced Machine Learning Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)

More information

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444 Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444

More information

Support Vector Method for Multivariate Density Estimation

Support Vector Method for Multivariate Density Estimation Support Vector Method for Multivariate Density Estimation Vladimir N. Vapnik Royal Halloway College and AT &T Labs, 100 Schultz Dr. Red Bank, NJ 07701 vlad@research.att.com Sayan Mukherjee CBCL, MIT E25-201

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 04: Features and Kernels Lorenzo Rosasco Linear functions Let H lin be the space of linear functions f(x) = w x. f w is one

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels EECS 598: Statistical Learning Theory, Winter 2014 Topic 11 Kernels Lecturer: Clayton Scott Scribe: Jun Guo, Soumik Chatterjee Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

High Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution

High Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution High Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution Presented by 1 In collaboration with Mathieu Fauvel1, Stéphane Girard2

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Statistical learning on graphs

Statistical learning on graphs Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,

More information

Lecture 11. Multivariate Normal theory

Lecture 11. Multivariate Normal theory 10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances

More information

Lecture 4 February 2

Lecture 4 February 2 4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have

More information

Joint distribution optimal transportation for domain adaptation

Joint distribution optimal transportation for domain adaptation Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation

More information

Causal Inference by Minimizing the Dual Norm of Bias. Nathan Kallus. Cornell University and Cornell Tech

Causal Inference by Minimizing the Dual Norm of Bias. Nathan Kallus. Cornell University and Cornell Tech Causal Inference by Minimizing the Dual Norm of Bias Nathan Kallus Cornell University and Cornell Tech www.nathankallus.com Matching Zoo It s a zoo of matching estimators for causal effects: PSM, NN, CM,

More information

Computer simulation on homogeneity testing for weighted data sets used in HEP

Computer simulation on homogeneity testing for weighted data sets used in HEP Computer simulation on homogeneity testing for weighted data sets used in HEP Petr Bouř and Václav Kůs Department of Mathematics, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute

More information

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1 Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,

More information

Variable selection and machine learning methods in causal inference

Variable selection and machine learning methods in causal inference Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Hilbert Schmidt Independence Criterion

Hilbert Schmidt Independence Criterion Hilbert Schmidt Independence Criterion Thanks to Arthur Gretton, Le Song, Bernhard Schölkopf, Olivier Bousquet Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au

More information

Markov processes and queueing networks

Markov processes and queueing networks Inria September 22, 2015 Outline Poisson processes Markov jump processes Some queueing networks The Poisson distribution (Siméon-Denis Poisson, 1781-1840) { } e λ λ n n! As prevalent as Gaussian distribution

More information

Multiple Kernel Learning

Multiple Kernel Learning CS 678A Course Project Vivek Gupta, 1 Anurendra Kumar 2 Sup: Prof. Harish Karnick 1 1 Department of Computer Science and Engineering 2 Department of Electrical Engineering Indian Institute of Technology,

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Examples are not Enough, Learn to Criticize! Criticism for Interpretability

Examples are not Enough, Learn to Criticize! Criticism for Interpretability Examples are not Enough, Learn to Criticize! Criticism for Interpretability Been Kim, Rajiv Khanna, Oluwasanmi Koyejo Wittawat Jitkrittum Gatsby Machine Learning Journal Club 16 Jan 2017 1/20 Summary Examples

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information

A Kernel Method for the Two-Sample-Problem

A Kernel Method for the Two-Sample-Problem A Kernel Method for the Two-Sample-Problem Arthur Gretton MPI for Biological Cybernetics Tübingen, Germany arthur@tuebingen.mpg.de Karsten M. Borgwardt Ludwig-Maximilians-Univ. Munich, Germany kb@dbs.ifi.lmu.de

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Kaggle.

Kaggle. Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project

More information

Learning features to compare distributions

Learning features to compare distributions Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this

More information

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,

More information

Modelling Dropouts by Conditional Distribution, a Copula-Based Approach

Modelling Dropouts by Conditional Distribution, a Copula-Based Approach The 8th Tartu Conference on MULTIVARIATE STATISTICS, The 6th Conference on MULTIVARIATE DISTRIBUTIONS with Fixed Marginals Modelling Dropouts by Conditional Distribution, a Copula-Based Approach Ene Käärik

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Kernel Adaptive Metropolis-Hastings

Kernel Adaptive Metropolis-Hastings Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1

More information

Lecture 7: Kernels for Classification and Regression

Lecture 7: Kernels for Classification and Regression Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive

More information

Differential Privacy in an RKHS

Differential Privacy in an RKHS Differential Privacy in an RKHS Rob Hall (with Larry Wasserman and Alessandro Rinaldo) 2/20/2012 rjhall@cs.cmu.edu http://www.cs.cmu.edu/~rjhall 1 Overview Why care about privacy? Differential Privacy

More information

Measuring Sample Quality with Stein s Method

Measuring Sample Quality with Stein s Method Measuring Sample Quality with Stein s Method Lester Mackey, Joint work with Jackson Gorham Microsoft Research Stanford University July 4, 2017 Mackey (MSR) Kernel Stein Discrepancy July 4, 2017 1 / 25

More information

Stable Process. 2. Multivariate Stable Distributions. July, 2006

Stable Process. 2. Multivariate Stable Distributions. July, 2006 Stable Process 2. Multivariate Stable Distributions July, 2006 1. Stable random vectors. 2. Characteristic functions. 3. Strictly stable and symmetric stable random vectors. 4. Sub-Gaussian random vectors.

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Kernels MIT Course Notes

Kernels MIT Course Notes Kernels MIT 15.097 Course Notes Cynthia Rudin Credits: Bartlett, Schölkopf and Smola, Cristianini and Shawe-Taylor The kernel trick that I m going to show you applies much more broadly than SVM, but we

More information

A Conditional Approach to Modeling Multivariate Extremes

A Conditional Approach to Modeling Multivariate Extremes A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying

More information

Inner products. Theorem (basic properties): Given vectors u, v, w in an inner product space V, and a scalar k, the following properties hold:

Inner products. Theorem (basic properties): Given vectors u, v, w in an inner product space V, and a scalar k, the following properties hold: Inner products Definition: An inner product on a real vector space V is an operation (function) that assigns to each pair of vectors ( u, v) in V a scalar u, v satisfying the following axioms: 1. u, v

More information

Median Statistics Analysis of Non- Gaussian Astrophysical and Cosmological Data Compilations

Median Statistics Analysis of Non- Gaussian Astrophysical and Cosmological Data Compilations Median Statistics Analysis of Non- Gaussian Astrophysical and Cosmological Data Compilations Amber Thompson Mentor: Dr. Bharat Ratra Graduate Student: Tia Camarillo Background Motivation Scientific integrity

More information

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Quadrature using sparse grids on products of spheres

Quadrature using sparse grids on products of spheres Quadrature using sparse grids on products of spheres Paul Leopardi Mathematical Sciences Institute, Australian National University. For presentation at ANZIAM NSW/ACT Annual Meeting, Batemans Bay. Joint

More information

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 10: Support Vector Machine and Large Margin Classifier Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu

More information

Representer theorem and kernel examples

Representer theorem and kernel examples CS81B/Stat41B Spring 008) Statistical Learning Theory Lecture: 8 Representer theorem and kernel examples Lecturer: Peter Bartlett Scribe: Howard Lei 1 Representer Theorem Recall that the SVM optimization

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms 南京大学 尹一通 Martingales Definition: A sequence of random variables X 0, X 1,... is a martingale if for all i > 0, E[X i X 0,...,X i1 ] = X i1 x 0, x 1,...,x i1, E[X i X 0 = x 0, X 1

More information

7) Important properties of functions: homogeneity, homotheticity, convexity and quasi-convexity

7) Important properties of functions: homogeneity, homotheticity, convexity and quasi-convexity 30C00300 Mathematical Methods for Economists (6 cr) 7) Important properties of functions: homogeneity, homotheticity, convexity and quasi-convexity Abolfazl Keshvari Ph.D. Aalto University School of Business

More information

AUTOREGRESSIVE IMPLICIT QUANTILE NETWORKS FOR TIME SERIES GENERATION MAXIMILIEN BAUDRY *

AUTOREGRESSIVE IMPLICIT QUANTILE NETWORKS FOR TIME SERIES GENERATION MAXIMILIEN BAUDRY * AUTOREGRESSIVE IMPLICIT QUANTILE NETWORKS FOR TIME SERIES GENERATION MAXIMILIEN BAUDRY * *DAMI Chair & Université Lyon 1 - ISFA http://chaire-dami.fr/ AKNOWLEDGEMENTS 2 ACKNOWLEDGEMENTS TO Christian Y.

More information

A spectral clustering algorithm based on Gram operators

A spectral clustering algorithm based on Gram operators A spectral clustering algorithm based on Gram operators Ilaria Giulini De partement de Mathe matiques et Applications ENS, Paris Joint work with Olivier Catoni 1 july 2015 Clustering task of grouping

More information

The Kernel Trick. Carlos C. Rodríguez October 25, Why don t we do it in higher dimensions?

The Kernel Trick. Carlos C. Rodríguez  October 25, Why don t we do it in higher dimensions? The Kernel Trick Carlos C. Rodríguez http://omega.albany.edu:8008/ October 25, 2004 Why don t we do it in higher dimensions? If SVMs were able to handle only linearly separable data, their usefulness would

More information

Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches

Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Arthur Gretton 1 and László Györfi 2 1. Gatsby Computational Neuroscience Unit London, UK 2. Budapest University of Technology

More information

Convergence of Multivariate Quantile Surfaces

Convergence of Multivariate Quantile Surfaces Convergence of Multivariate Quantile Surfaces Adil Ahidar Institut de Mathématiques de Toulouse - CERFACS August 30, 2013 Adil Ahidar (Institut de Mathématiques de Toulouse Convergence - CERFACS) of Multivariate

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Manifold Regularization

Manifold Regularization Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1 The Problem of Learning is

More information

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.

Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Michel Besserve MPI for Intelligent Systems, Tübingen michel.besserve@tuebingen.mpg.de Nikos K. Logothetis MPI

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee 9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic

More information

SVMs: nonlinearity through kernels

SVMs: nonlinearity through kernels Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly

More information

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017 COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SEQUENTIAL DATA So far, when thinking

More information