High-dimensional test for normality

Size: px

Start display at page:

Download "High-dimensional test for normality"

Darcy Barker
5 years ago
Views:

1 High-dimensional test for normality Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Inria joint work with Alain Celisse Rennes - June 5th, 2014 Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

2 Framework Input space X of any kind: Scalars or vectors, Structured objects (strings, graphs, trees,... ), Functional space,... érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

3 Working in kernel space X 1,..., X n X i.i.d. Positive semi-definite kernel k : X X R Mappings Y i = k(x i,.) H(k) Definition (RKHS) H(k) = Span{k(x,.) x X } Reproducing property: x, y X, < k(x,.), k(y,.) > H(k) = k(x, y) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

4 Gaussian process in RKHS Gaussian assumption in high-dimensional/kernel spaces Mean equality test in a high-dimensional space (Srivastava et al., 2013) Supervised/unsupervised classification using Gaussian mixtures in kernel space (Bouveyron et al., 2012) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

5 Gaussian process in RKHS Gaussian assumption in high-dimensional/kernel spaces Mean equality test in a high-dimensional space (Srivastava et al., 2013) Supervised/unsupervised classification using Gaussian mixtures in kernel space (Bouveyron et al., 2012) Gaussian process Z GP(µ, Σ) iff h H(k), < Z, h > N (< µ, h >, < Σh, h >) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

6 Gaussian process in RKHS Gaussian assumption in high-dimensional/kernel spaces Mean equality test in a high-dimensional space (Srivastava et al., 2013) Supervised/unsupervised classification using Gaussian mixtures in kernel space (Bouveyron et al., 2012) Gaussian process Z GP(µ, Σ) iff h H(k), < Z, h > N (< µ, h >, < Σh, h >) Goal Test H 0 : P = P 0 vs H A : P P 0, where P 0 = GP(µ, Σ) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

7 Outline 1 Introduction 2 Laplace-MMD Distinguishing between distributions with MMD Removing the characteristic kernel assumption L-MMD test 3 Assessment Theoretical assessment Empirical assessment 4 Conclusion érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

8 Distinguishing between distributions with MMD Distinguishing distributions with MMD MMD (Gretton et al., 2007) Y, Z two r.v. in any set X. MMD(Y, Z) = sup E Y f (Y ) E Z f (Z) f H(k), f 1 Advantage: MMD can be computed as a distance between two elements of H(k) (easy calculation), Problem: MMD is a metric on distributions only for some k (characteristic kernels). érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

9 Removing the characteristic kernel assumption Consider Laplace transforms of P and P 0 on H(k) L P (f ) = E Y P e <Y,f > H(k), L P0 (f ) = E Z P0 e <Z,f > H(k) Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

10 Removing the characteristic kernel assumption Consider Laplace transforms of P and P 0 on H(k) L P (f ) = E Y P e <Y,f > H(k), L P0 (f ) = E Z P0 e <Z,f > H(k) Compare L P with L P0 (P, P 0 ) = sup L P (f ) L P0 (f ) f 1 Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

11 Removing the characteristic kernel assumption Consider Laplace transforms of P and P 0 on H(k) L P (f ) = E Y P e <Y,f > H(k), L P0 (f ) = E Z P0 e <Z,f > H(k) Compare L P with L P0 We get the desired property (P, P 0 ) = sup L P (f ) L P0 (f ) f 1 (P, P 0 ) = 0 = P = P 0 without requiring that k is characteristic. Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

12 Removing the characteristic kernel assumption Introducing a second RKHS Get a computable expression for (P, P 0 ) = sup E Y k(y, f ) E Z k(z, f ) f 1 via kernel k = exp(<.,. > H(k) ) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

13 Removing the characteristic kernel assumption (P, P 0) = sup EY k(y, f ) E Z k(z, f ) f 1 érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

14 Removing the characteristic kernel assumption (P, P 0) = sup EY k(y, f ) E Z k(z, f ) f 1 = sup E P < k(y,.), k(f,.) > E P0 < k(z,.), k(f,.) > f 1 (from reproducing property) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

15 Removing the characteristic kernel assumption (P, P 0) = sup EY k(y, f ) E Z k(z, f ) f 1 = sup E P < k(y,.), k(f,.) > E P0 < k(z,.), k(f,.) > f 1 = sup < µ P µ P0, k(f,.) > H( k) f 1 (from reproducing property) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

16 e 1/2 µ P µ P0 H( k) (from Cauchy-Schwarz) Removing the characteristic kernel assumption (P, P 0) = sup EY k(y, f ) E Z k(z, f ) f 1 = sup E P < k(y,.), k(f,.) > E P0 < k(z,.), k(f,.) > f 1 = sup < µ P µ P0, k(f,.) > H( k) f 1 (from reproducing property) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

17 Removing the characteristic kernel assumption (P, P 0) = sup EY k(y, f ) E Z k(z, f ) f 1 = sup E P < k(y,.), k(f,.) > E P0 < k(z,.), k(f,.) > f 1 = sup < µ P µ P0, k(f,.) > H( k) f 1 (from reproducing property) e 1/2 µ P µ P0 H( k) (from Cauchy-Schwarz) Definition (Laplace-MMD) Assume max(e P e Y 2 /2, E P0 e Z 2 /2 ) < +. L is an easy-to-handle quantity: µ P estimated by µˆp (sample mean) Expand the (squared) norm L = µ P µ P0 = 0 P = P 0 Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

18 L-MMD test L-MMD test Gram matrix: K = [k(x i, X j)] i,j Proposition (K., 2013) Assume P 0 = GP(0, Σ) and ρ(σ) < 1. Then, nˆl 2 = 1 n 1 n n e K i,j 2 i j i=1 is an unbiaised estimator of nl 2. [ ] e [K 2 1/2 ] i,i /(2n) + n det(i n 2 K 2 ) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

19 L-MMD test L-MMD test Gram matrix: K = [k(x i, X j)] i,j Proposition (K., 2013) Assume P 0 = GP(0, Σ) and ρ(σ) < 1. Then, nˆl 2 = 1 n 1 n n e K i,j 2 i j i=1 is an unbiaised estimator of nl 2. Rejection region Generate nˆl 2 (1)... nˆl 2 (B) under H 0 Set ˆq α,n := nˆl 2 (t) where t = t(α) Reject H 0 if nˆl 2 ˆq α,n, accept otherwise. [ ] e [K 2 1/2 ] i,i /(2n) + n det(i n 2 K 2 ) érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

20 Outline 1 Introduction 2 Laplace-MMD Distinguishing between distributions with MMD Removing the characteristic kernel assumption L-MMD test 3 Assessment Theoretical assessment Empirical assessment 4 Conclusion érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

21 Theoretical assessment Type-II error: theoretical bound Theorem (K., 2014): If Y M P-a.s. Then for n > qα,n+m(2) P L 2 { [ P HA (nˆl 2 ˆq α,n ) 1 + o B (1/ ] B) exp n L n 1 where = q α,n + m (2) P = 2m (2) P m (2) P L2 exp(m 2 /2) + o n (1) m (2) P = E Y P k(y, ) µ P 2 H( k) = E k(y, ) E [ k(y, ) } 2 ] 2 H( k) Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

22 Empirical assessment Synthetic data (finite d): X = R d, k =<.,. > R d : L-MMD used as a multivariate normality test Common multivariate normality tests lose power when d large 1 Henze-Zirkler (characteristic functions, L 2 distance) 2 Energy distance (pairwise distance) Alternative: mixture of two Gaussians N (µ 1, Σ) and N (µ 2, Σ) Two cases: low dimension (d = 2), larger dimension (d = 50) Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

23 Empirical assessment Real data (d = + ): USPS236 dataset input space X = R 64 Gaussian kernel k(x, y) = exp( (2σ 2 ) 1 x y 2 ) Compare L-MMD with Random Projection method = Kolmogorov-Smirnov (univariate) test on p random projections Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

24 Conclusion Summary: High-dimensional test for normality Bypassed characteristic assumption Mild sensitivity to high-dimensionality Further works: In practice, µ and Σ unknown How does parameters estimations affect Type-I/II errors? Type-I adjustement method within this framework? Extension to two-sample homogeneity test érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

25 Conclusion Summary: High-dimensional test for normality Bypassed characteristic assumption Mild sensitivity to high-dimensionality Further works: In practice, µ and Σ unknown How does parameters estimations affect Type-I/II errors? Type-I adjustement method within this framework? Extension to two-sample homogeneity test Merci pour votre attention. érémie Kellner Ph.D Student University Lille I - MODAL project-team Presentation Inria (joint work with Alain Rennes Celisse - June ) 5th, / 15

Maximum Mean Discrepancy

Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia