High-dimensional test for normality
|
|
- Darcy Barker
- 5 years ago
- Views:
Transcription
1 High-dimensional test for normality Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Inria joint work with Alain Celisse Rennes - June 5th, 2014 Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
2 Framework Input space X of any kind: Scalars or vectors, Structured objects (strings, graphs, trees,... ), Functional space,... érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
3 Working in kernel space X 1,..., X n X i.i.d. Positive semi-definite kernel k : X X R Mappings Y i = k(x i,.) H(k) Definition (RKHS) H(k) = Span{k(x,.) x X } Reproducing property: x, y X, < k(x,.), k(y,.) > H(k) = k(x, y) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
4 Gaussian process in RKHS Gaussian assumption in high-dimensional/kernel spaces Mean equality test in a high-dimensional space (Srivastava et al., 2013) Supervised/unsupervised classification using Gaussian mixtures in kernel space (Bouveyron et al., 2012) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
5 Gaussian process in RKHS Gaussian assumption in high-dimensional/kernel spaces Mean equality test in a high-dimensional space (Srivastava et al., 2013) Supervised/unsupervised classification using Gaussian mixtures in kernel space (Bouveyron et al., 2012) Gaussian process Z GP(µ, Σ) iff h H(k), < Z, h > N (< µ, h >, < Σh, h >) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
6 Gaussian process in RKHS Gaussian assumption in high-dimensional/kernel spaces Mean equality test in a high-dimensional space (Srivastava et al., 2013) Supervised/unsupervised classification using Gaussian mixtures in kernel space (Bouveyron et al., 2012) Gaussian process Z GP(µ, Σ) iff h H(k), < Z, h > N (< µ, h >, < Σh, h >) Goal Test H 0 : P = P 0 vs H A : P P 0, where P 0 = GP(µ, Σ) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
7 Outline 1 Introduction 2 Laplace-MMD Distinguishing between distributions with MMD Removing the characteristic kernel assumption L-MMD test 3 Assessment Theoretical assessment Empirical assessment 4 Conclusion érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
8 Distinguishing between distributions with MMD Distinguishing distributions with MMD MMD (Gretton et al., 2007) Y, Z two r.v. in any set X. MMD(Y, Z) = sup E Y f (Y ) E Z f (Z) f H(k), f 1 Advantage: MMD can be computed as a distance between two elements of H(k) (easy calculation), Problem: MMD is a metric on distributions only for some k (characteristic kernels). érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
9 Removing the characteristic kernel assumption Consider Laplace transforms of P and P 0 on H(k) L P (f ) = E Y P e <Y,f > H(k), L P0 (f ) = E Z P0 e <Z,f > H(k) Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
10 Removing the characteristic kernel assumption Consider Laplace transforms of P and P 0 on H(k) L P (f ) = E Y P e <Y,f > H(k), L P0 (f ) = E Z P0 e <Z,f > H(k) Compare L P with L P0 (P, P 0 ) = sup L P (f ) L P0 (f ) f 1 Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
11 Removing the characteristic kernel assumption Consider Laplace transforms of P and P 0 on H(k) L P (f ) = E Y P e <Y,f > H(k), L P0 (f ) = E Z P0 e <Z,f > H(k) Compare L P with L P0 We get the desired property (P, P 0 ) = sup L P (f ) L P0 (f ) f 1 (P, P 0 ) = 0 = P = P 0 without requiring that k is characteristic. Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
12 Removing the characteristic kernel assumption Introducing a second RKHS Get a computable expression for (P, P 0 ) = sup E Y k(y, f ) E Z k(z, f ) f 1 via kernel k = exp(<.,. > H(k) ) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
13 Removing the characteristic kernel assumption (P, P 0) = sup EY k(y, f ) E Z k(z, f ) f 1 érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
14 Removing the characteristic kernel assumption (P, P 0) = sup EY k(y, f ) E Z k(z, f ) f 1 = sup E P < k(y,.), k(f,.) > E P0 < k(z,.), k(f,.) > f 1 (from reproducing property) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
15 Removing the characteristic kernel assumption (P, P 0) = sup EY k(y, f ) E Z k(z, f ) f 1 = sup E P < k(y,.), k(f,.) > E P0 < k(z,.), k(f,.) > f 1 = sup < µ P µ P0, k(f,.) > H( k) f 1 (from reproducing property) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
16 e 1/2 µ P µ P0 H( k) (from Cauchy-Schwarz) Removing the characteristic kernel assumption (P, P 0) = sup EY k(y, f ) E Z k(z, f ) f 1 = sup E P < k(y,.), k(f,.) > E P0 < k(z,.), k(f,.) > f 1 = sup < µ P µ P0, k(f,.) > H( k) f 1 (from reproducing property) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
17 Removing the characteristic kernel assumption (P, P 0) = sup EY k(y, f ) E Z k(z, f ) f 1 = sup E P < k(y,.), k(f,.) > E P0 < k(z,.), k(f,.) > f 1 = sup < µ P µ P0, k(f,.) > H( k) f 1 (from reproducing property) e 1/2 µ P µ P0 H( k) (from Cauchy-Schwarz) Definition (Laplace-MMD) Assume max(e P e Y 2 /2, E P0 e Z 2 /2 ) < +. L is an easy-to-handle quantity: µ P estimated by µˆp (sample mean) Expand the (squared) norm L = µ P µ P0 = 0 P = P 0 Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
18 L-MMD test L-MMD test Gram matrix: K = [k(x i, X j)] i,j Proposition (K., 2013) Assume P 0 = GP(0, Σ) and ρ(σ) < 1. Then, nˆl 2 = 1 n 1 n n e K i,j 2 i j i=1 is an unbiaised estimator of nl 2. [ ] e [K 2 1/2 ] i,i /(2n) + n det(i n 2 K 2 ) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
19 L-MMD test L-MMD test Gram matrix: K = [k(x i, X j)] i,j Proposition (K., 2013) Assume P 0 = GP(0, Σ) and ρ(σ) < 1. Then, nˆl 2 = 1 n 1 n n e K i,j 2 i j i=1 is an unbiaised estimator of nl 2. Rejection region Generate nˆl 2 (1)... nˆl 2 (B) under H 0 Set ˆq α,n := nˆl 2 (t) where t = t(α) Reject H 0 if nˆl 2 ˆq α,n, accept otherwise. [ ] e [K 2 1/2 ] i,i /(2n) + n det(i n 2 K 2 ) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
20 Outline 1 Introduction 2 Laplace-MMD Distinguishing between distributions with MMD Removing the characteristic kernel assumption L-MMD test 3 Assessment Theoretical assessment Empirical assessment 4 Conclusion érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
21 Theoretical assessment Type-II error: theoretical bound Theorem (K., 2014): If Y M P-a.s. Then for n > qα,n+m(2) P L 2 { [ P HA (nˆl 2 ˆq α,n ) 1 + o B (1/ ] B) exp n L n 1 where = q α,n + m (2) P = 2m (2) P m (2) P L2 exp(m 2 /2) + o n (1) m (2) P = E Y P k(y, ) µ P 2 H( k) = E k(y, ) E [ k(y, ) } 2 ] 2 H( k) Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
22 Empirical assessment Synthetic data (finite d): X = R d, k =<.,. > R d : L-MMD used as a multivariate normality test Common multivariate normality tests lose power when d large 1 Henze-Zirkler (characteristic functions, L 2 distance) 2 Energy distance (pairwise distance) Alternative: mixture of two Gaussians N (µ 1, Σ) and N (µ 2, Σ) Two cases: low dimension (d = 2), larger dimension (d = 50) Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
23 Empirical assessment Real data (d = + ): USPS236 dataset input space X = R 64 Gaussian kernel k(x, y) = exp( (2σ 2 ) 1 x y 2 ) Compare L-MMD with Random Projection method = Kolmogorov-Smirnov (univariate) test on p random projections Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
24 Conclusion Summary: High-dimensional test for normality Bypassed characteristic assumption Mild sensitivity to high-dimensionality Further works: In practice, µ and Σ unknown How does parameters estimations affect Type-I/II errors? Type-I adjustement method within this framework? Extension to two-sample homogeneity test érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
25 Conclusion Summary: High-dimensional test for normality Bypassed characteristic assumption Mild sensitivity to high-dimensionality Further works: In practice, µ and Σ unknown How does parameters estimations affect Type-I/II errors? Type-I adjustement method within this framework? Extension to two-sample homogeneity test Merci pour votre attention. érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15
Maximum Mean Discrepancy
Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationComputation time/accuracy trade-off and linear regression
Computation time/accuracy trade-off and linear regression Maxime BRUNIN & Christophe BIERNACKI & Alain CELISSE Laboratoire Paul Painlevé, Université de Lille, Science et Technologie INRIA Lille-Nord Europe,
More informationKernel change-point detection
1,2 (joint work with Alain Celisse 3 & Zaïd Harchaoui 4 ) 1 Cnrs 2 École Normale Supérieure (Paris), DIENS, Équipe Sierra 3 Université Lille 1 4 INRIA Grenoble Workshop Kernel methods for big data, Lille,
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationLearning Interpretable Features to Compare Distributions
Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections
More informationHilbert Space Representations of Probability Distributions
Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationINDEPENDENCE MEASURES
INDEPENDENCE MEASURES Beatriz Bueno Larraz Máster en Investigación e Innovación en Tecnologías de la Información y las Comunicaciones. Escuela Politécnica Superior. Máster en Matemáticas y Aplicaciones.
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationKernels to detect abrupt changes in time series
1 UMR 8524 CNRS - Université Lille 1 2 Modal INRIA team-project 3 SSB group Paris joint work with S. Arlot, Z. Harchaoui, G. Rigaill, and G. Marot Computational and statistical trade-offs in learning IHES
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationAdvances in kernel exponential families
Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate
More informationForefront of the Two Sample Problem
Forefront of the Two Sample Problem From classical to state-of-the-art methods Yuchi Matsuoka What is the Two Sample Problem? X ",, X % ~ P i. i. d Y ",, Y - ~ Q i. i. d Two Sample Problem P = Q? Example:
More informationRegularization in Reproducing Kernel Banach Spaces
.... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred
More informationBig Hypothesis Testing with Kernel Embeddings
Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationElements of Positive Definite Kernel and Reproducing Kernel Hilbert Space
Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department
More informationCS 7140: Advanced Machine Learning
Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)
More informationKernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444
Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444
More informationSupport Vector Method for Multivariate Density Estimation
Support Vector Method for Multivariate Density Estimation Vladimir N. Vapnik Royal Halloway College and AT &T Labs, 100 Schultz Dr. Red Bank, NJ 07701 vlad@research.att.com Sayan Mukherjee CBCL, MIT E25-201
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 04: Features and Kernels Lorenzo Rosasco Linear functions Let H lin be the space of linear functions f(x) = w x. f w is one
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationEECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels
EECS 598: Statistical Learning Theory, Winter 2014 Topic 11 Kernels Lecturer: Clayton Scott Scribe: Jun Guo, Soumik Chatterjee Disclaimer: These notes have not been subjected to the usual scrutiny reserved
More informationHigh Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution
High Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution Presented by 1 In collaboration with Mathieu Fauvel1, Stéphane Girard2
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationStatistical learning on graphs
Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,
More informationLecture 11. Multivariate Normal theory
10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances
More informationLecture 4 February 2
4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have
More informationJoint distribution optimal transportation for domain adaptation
Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation
More informationCausal Inference by Minimizing the Dual Norm of Bias. Nathan Kallus. Cornell University and Cornell Tech
Causal Inference by Minimizing the Dual Norm of Bias Nathan Kallus Cornell University and Cornell Tech www.nathankallus.com Matching Zoo It s a zoo of matching estimators for causal effects: PSM, NN, CM,
More informationComputer simulation on homogeneity testing for weighted data sets used in HEP
Computer simulation on homogeneity testing for weighted data sets used in HEP Petr Bouř and Václav Kůs Department of Mathematics, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute
More informationKernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1
Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,
More informationVariable selection and machine learning methods in causal inference
Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationHilbert Schmidt Independence Criterion
Hilbert Schmidt Independence Criterion Thanks to Arthur Gretton, Le Song, Bernhard Schölkopf, Olivier Bousquet Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au
More informationMarkov processes and queueing networks
Inria September 22, 2015 Outline Poisson processes Markov jump processes Some queueing networks The Poisson distribution (Siméon-Denis Poisson, 1781-1840) { } e λ λ n n! As prevalent as Gaussian distribution
More informationMultiple Kernel Learning
CS 678A Course Project Vivek Gupta, 1 Anurendra Kumar 2 Sup: Prof. Harish Karnick 1 1 Department of Computer Science and Engineering 2 Department of Electrical Engineering Indian Institute of Technology,
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationExamples are not Enough, Learn to Criticize! Criticism for Interpretability
Examples are not Enough, Learn to Criticize! Criticism for Interpretability Been Kim, Rajiv Khanna, Oluwasanmi Koyejo Wittawat Jitkrittum Gatsby Machine Learning Journal Club 16 Jan 2017 1/20 Summary Examples
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationA Kernel Method for the Two-Sample-Problem
A Kernel Method for the Two-Sample-Problem Arthur Gretton MPI for Biological Cybernetics Tübingen, Germany arthur@tuebingen.mpg.de Karsten M. Borgwardt Ludwig-Maximilians-Univ. Munich, Germany kb@dbs.ifi.lmu.de
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationKaggle.
Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project
More informationLearning features to compare distributions
Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this
More informationCS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings
CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,
More informationModelling Dropouts by Conditional Distribution, a Copula-Based Approach
The 8th Tartu Conference on MULTIVARIATE STATISTICS, The 6th Conference on MULTIVARIATE DISTRIBUTIONS with Fixed Marginals Modelling Dropouts by Conditional Distribution, a Copula-Based Approach Ene Käärik
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationKernel Adaptive Metropolis-Hastings
Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1
More informationLecture 7: Kernels for Classification and Regression
Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive
More informationDifferential Privacy in an RKHS
Differential Privacy in an RKHS Rob Hall (with Larry Wasserman and Alessandro Rinaldo) 2/20/2012 rjhall@cs.cmu.edu http://www.cs.cmu.edu/~rjhall 1 Overview Why care about privacy? Differential Privacy
More informationMeasuring Sample Quality with Stein s Method
Measuring Sample Quality with Stein s Method Lester Mackey, Joint work with Jackson Gorham Microsoft Research Stanford University July 4, 2017 Mackey (MSR) Kernel Stein Discrepancy July 4, 2017 1 / 25
More informationStable Process. 2. Multivariate Stable Distributions. July, 2006
Stable Process 2. Multivariate Stable Distributions July, 2006 1. Stable random vectors. 2. Characteristic functions. 3. Strictly stable and symmetric stable random vectors. 4. Sub-Gaussian random vectors.
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationKernels MIT Course Notes
Kernels MIT 15.097 Course Notes Cynthia Rudin Credits: Bartlett, Schölkopf and Smola, Cristianini and Shawe-Taylor The kernel trick that I m going to show you applies much more broadly than SVM, but we
More informationA Conditional Approach to Modeling Multivariate Extremes
A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying
More informationInner products. Theorem (basic properties): Given vectors u, v, w in an inner product space V, and a scalar k, the following properties hold:
Inner products Definition: An inner product on a real vector space V is an operation (function) that assigns to each pair of vectors ( u, v) in V a scalar u, v satisfying the following axioms: 1. u, v
More informationMedian Statistics Analysis of Non- Gaussian Astrophysical and Cosmological Data Compilations
Median Statistics Analysis of Non- Gaussian Astrophysical and Cosmological Data Compilations Amber Thompson Mentor: Dr. Bharat Ratra Graduate Student: Tia Camarillo Background Motivation Scientific integrity
More informationSPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS
SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationQuadrature using sparse grids on products of spheres
Quadrature using sparse grids on products of spheres Paul Leopardi Mathematical Sciences Institute, Australian National University. For presentation at ANZIAM NSW/ACT Annual Meeting, Batemans Bay. Joint
More informationLecture 10: Support Vector Machine and Large Margin Classifier
Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu
More informationRepresenter theorem and kernel examples
CS81B/Stat41B Spring 008) Statistical Learning Theory Lecture: 8 Representer theorem and kernel examples Lecturer: Peter Bartlett Scribe: Howard Lei 1 Representer Theorem Recall that the SVM optimization
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationRandomized Algorithms
Randomized Algorithms 南京大学 尹一通 Martingales Definition: A sequence of random variables X 0, X 1,... is a martingale if for all i > 0, E[X i X 0,...,X i1 ] = X i1 x 0, x 1,...,x i1, E[X i X 0 = x 0, X 1
More information7) Important properties of functions: homogeneity, homotheticity, convexity and quasi-convexity
30C00300 Mathematical Methods for Economists (6 cr) 7) Important properties of functions: homogeneity, homotheticity, convexity and quasi-convexity Abolfazl Keshvari Ph.D. Aalto University School of Business
More informationAUTOREGRESSIVE IMPLICIT QUANTILE NETWORKS FOR TIME SERIES GENERATION MAXIMILIEN BAUDRY *
AUTOREGRESSIVE IMPLICIT QUANTILE NETWORKS FOR TIME SERIES GENERATION MAXIMILIEN BAUDRY * *DAMI Chair & Université Lyon 1 - ISFA http://chaire-dami.fr/ AKNOWLEDGEMENTS 2 ACKNOWLEDGEMENTS TO Christian Y.
More informationA spectral clustering algorithm based on Gram operators
A spectral clustering algorithm based on Gram operators Ilaria Giulini De partement de Mathe matiques et Applications ENS, Paris Joint work with Olivier Catoni 1 july 2015 Clustering task of grouping
More informationThe Kernel Trick. Carlos C. Rodríguez October 25, Why don t we do it in higher dimensions?
The Kernel Trick Carlos C. Rodríguez http://omega.albany.edu:8008/ October 25, 2004 Why don t we do it in higher dimensions? If SVMs were able to handle only linearly separable data, their usefulness would
More informationNonparametric Indepedence Tests: Space Partitioning and Kernel Approaches
Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Arthur Gretton 1 and László Györfi 2 1. Gatsby Computational Neuroscience Unit London, UK 2. Budapest University of Technology
More informationConvergence of Multivariate Quantile Surfaces
Convergence of Multivariate Quantile Surfaces Adil Ahidar Institut de Mathématiques de Toulouse - CERFACS August 30, 2013 Adil Ahidar (Institut de Mathématiques de Toulouse Convergence - CERFACS) of Multivariate
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationManifold Regularization
Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1 The Problem of Learning is
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationStatistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. Michel Besserve MPI for Intelligent Systems, Tübingen michel.besserve@tuebingen.mpg.de Nikos K. Logothetis MPI
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More information9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee
9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic
More informationSVMs: nonlinearity through kernels
Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly
More informationCOMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017
COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SEQUENTIAL DATA So far, when thinking
More information