Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches

Similar documents
Nonparametric Independence Tests: Space Partitioning and Kernel Approaches

Minimax Estimation of Kernel Mean Embeddings

Hilbert Space Representations of Probability Distributions

Learning Interpretable Features to Compare Distributions

Learning features to compare distributions

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Asymptotic Statistics-VI. Changliang Zou

An Adaptive Test of Independence with Analytic Kernel Embeddings

A Linear-Time Kernel Goodness-of-Fit Test

Bayesian Regularization

An Adaptive Test of Independence with Analytic Kernel Embeddings

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Asymptotic Statistics-III. Changliang Zou

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Curve Fitting Re-visited, Bishop1.2.5

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

Hypothesis Testing with Kernel Embeddings on Interdependent Data

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Lectures 16-17: Poisson Approximation. Using Lemma (2.4.3) with θ = 1 and then Lemma (2.4.4), which is valid when max m p n,m 1/2, we have

Analysis Comprehensive Exam Questions Fall 2008

Learning features to compare distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Unsupervised Nonparametric Anomaly Detection: A Kernel Method

Advances in kernel exponential families

Hypothesis testing: theory and methods

Information geometry for bivariate distribution control

Local Minimax Testing

Optimal global rates of convergence for interpolation problems with random design

arxiv: v1 [math.st] 24 Aug 2017

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Stochastic Models (Lecture #4)

Nonparametric Bayesian Uncertainty Quantification

Asymptotics for posterior hazards

PATTERNS OF PRIMES IN ARITHMETIC PROGRESSIONS

Confidence intervals for kernel density estimation

A Wild Bootstrap for Degenerate Kernel Tests

Nonparametric Methods

A Few Special Distributions and Their Properties

STA 2101/442 Assignment 3 1

A TEST OF FIT FOR THE GENERALIZED PARETO DISTRIBUTION BASED ON TRANSFORMS

A Statistical Journey through Historic Haverford

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Analysis methods of heavy-tailed data

1 Exponential manifold by reproducing kernel Hilbert spaces

Student-t Process as Alternative to Gaussian Processes Discussion

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Complexity of two and multi-stage stochastic programming problems

Local Polynomial Modelling and Its Applications

Two-sample hypothesis testing for random dot product graphs

Lecture 15. Hypothesis testing in the linear model

Lecture I: Asymptotics for large GUE random matrices

Math 562 Homework 1 August 29, 2006 Dr. Ron Sahoo

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Nonparametric Inference In Functional Data

Advanced topics from statistics

Big Hypothesis Testing with Kernel Embeddings

A Kernel Statistical Test of Independence

Homework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples

A new method of nonparametric density estimation

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

On prediction and density estimation Peter McCullagh University of Chicago December 2004

A tailor made nonparametric density estimate

Bahadur representations for bootstrap quantiles 1

STAT 135 Lab 7 Distributions derived from the normal distribution, and comparing independent samples.

Multivariate Statistics

Penalized Barycenters in the Wasserstein space

Large sample covariance matrices and the T 2 statistic

Introduction to General and Generalized Linear Models

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Statistical Estimation: Data & Non-data Information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

Goodness-of-Fit Pitfalls and Power Example: Testing Consistency of Two Histograms

LIST OF FORMULAS FOR STK1100 AND STK1110

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings

Economics 883 Spring 2016 Tauchen. Jump Regression

Distribution Regression with Minimax-Optimal Guarantee

Ch 3: Multiple Linear Regression

Maximum Mean Discrepancy

Kernel change-point detection

On a Nonparametric Notion of Residual and its Applications

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Understanding Regressions with Observations Collected at High Frequency over Long Span

A central limit theorem for an omnibus embedding of random dot product graphs

Goodness-of-Fit Considerations and Comparisons Example: Testing Consistency of Two Histograms

P Values and Nuisance Parameters

Dual Space of L 1. C = {E P(I) : one of E or I \ E is countable}.

Estimation of the functional Weibull-tail coefficient

Empirical Likelihood Tests for High-dimensional Data

The high order moments method in endpoint estimation: an overview

Time Series and Forecasting Lecture 4 NonLinear Time Series

Séminaire INRA de Toulouse

STAT 4385 Topic 01: Introduction & Review

Susceptible-Infective-Removed Epidemics and Erdős-Rényi random

Inference on distributions and quantiles using a finite-sample Dirichlet process

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS

Estimation of information-theoretic quantities

Unsupervised Learning

Transcription:

Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Arthur Gretton 1 and László Györfi 2 1. Gatsby Computational Neuroscience Unit London, UK 2. Budapest University of Technology and Economics Budapest, Hungary September, 2010 Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 1 / 21

Introduction Statistical tests of independence Multivariate Nonparametric Two kinds of tests: Strongly consistent, distribution-free Asymptotically α-level Three test statistics: L1 distance Log-likelihood Kernel dependence measure (HSIC) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 2 / 21

Introduction Statistical tests of independence Multivariate Nonparametric Two kinds of tests: Strongly consistent, distribution-free Asymptotically α-level Three test statistics: L1 distance Log-likelihood Kernel dependence measure (HSIC) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 2 / 21

Introduction Statistical tests of independence Multivariate Nonparametric Two kinds of tests: Strongly consistent, distribution-free Asymptotically α-level Three test statistics: L1 distance Log-likelihood Kernel dependence measure (HSIC) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 2 / 21

Problem overview Given distribution Pr, test H 0 : Pr = Pr x Pr y Continuous valued, multivariate: X := R d and Y := R d 5 Dependent P XY 5 Independent P XY =P X P Y Y 0 Y 0 5 5 0 5 X 5 5 0 5 X Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 3 / 21

Problem overview Finite i.i.d. sample (X 1, Y 1 ),..., (X n, Y n ) from Pr Partition space X into m n bins, space Y into m n bins Discretized empirical P XY Discretized empirical P X P Y Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 4 / 21

Problem overview Finite i.i.d. sample (X 1, Y 1 ),..., (X n, Y n ) from Pr Refine partition m n, m n for increasing n Discretized empirical P XY Discretized empirical P X P Y Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 5 / 21

L 1 statistic Space partition Pn = {A n,1,..., A n,mn } of X = R d Qn = {B n,1,..., B n,m n } of Y = R d, Empirical measures νn (A B) = n 1 #{i : (X i, Y i ) A B, i = 1,..., n} µn,1 (A) = n 1 #{i : X i A, i = 1,..., n} µn,2 (B) = n 1 #{i : Y i B, i = 1,..., n} Test statistic: L n (ν n, µ n,1 µ n,2 ) = ν n (A B) µ n,1 (A) µ n,2 (B). A P n B Q n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 6 / 21

L 1 statistic Space partition Pn = {A n,1,..., A n,mn } of X = R d Qn = {B n,1,..., B n,m n } of Y = R d, Empirical measures νn (A B) = n 1 #{i : (X i, Y i ) A B, i = 1,..., n} µn,1 (A) = n 1 #{i : X i A, i = 1,..., n} µn,2 (B) = n 1 #{i : Y i B, i = 1,..., n} Test statistic: L n (ν n, µ n,1 µ n,2 ) = ν n (A B) µ n,1 (A) µ n,2 (B). A P n B Q n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 6 / 21

L 1 statistic Space partition Pn = {A n,1,..., A n,mn } of X = R d Qn = {B n,1,..., B n,m n } of Y = R d, Empirical measures νn (A B) = n 1 #{i : (X i, Y i ) A B, i = 1,..., n} µn,1 (A) = n 1 #{i : X i A, i = 1,..., n} µn,2 (B) = n 1 #{i : Y i B, i = 1,..., n} Test statistic: L n (ν n, µ n,1 µ n,2 ) = ν n (A B) µ n,1 (A) µ n,2 (B). A P n B Q n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 6 / 21

L 1 : Distribution-free strong consistent test Test: reject H 0 when mn m n L n (ν n, µ n,1 µ n,2 ) > c 1, where c 1 > 2 ln 2. n Distribution-free Strongly consistent: after random sample size, test makes a.s. no error Conditions limn m n m n/n = 0, limn m n /ln n =, lim n m n/ln n =, P n and Q n cells shrink: lim n S any sphere centred at origin max diam(a) = 0 A P n, A S 0 Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 7 / 21

L 1 : Distribution-free strong consistent test Test: reject H 0 when mn m n L n (ν n, µ n,1 µ n,2 ) > c 1, where c 1 > 2 ln 2. n Distribution-free Strongly consistent: after random sample size, test makes a.s. no error Conditions limn m n m n/n = 0, limn m n /ln n =, lim n m n/ln n =, P n and Q n cells shrink: lim n S any sphere centred at origin max diam(a) = 0 A P n, A S 0 Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 7 / 21

L 1 : Distribution-free strong consistent test Type I error: Start with goodness of fit L n (µ n,1, µ 1 ) = A P n µ n,1 (A) µ 1 (A). Associated bound Beirlant et al. (2001), Biau and Györfi (2005) Pr{L n (µ n,1, µ 1 ) > ε} 2 mn e nε2 /2 for all 0 < ε Then decompose: L n(ν n, µ n,1 µ n,2 ) X X ν n(a B) ν(a B) A P n B Q n + X X ν(a B) µ 1 (A) µ 2 (B) + X X µ 1 (A) µ 2 (B) µ n,1 (A) µ n,2 (B). A P n B Q n A P n B Q n {z } {z } =0 under H 0 L n(µ n,1,µ 1 )+L n(µ n,2,µ 2 ) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 8 / 21

L 1 : Distribution-free strong consistent test Type I error: Start with goodness of fit L n (µ n,1, µ 1 ) = A P n µ n,1 (A) µ 1 (A). Associated bound Beirlant et al. (2001), Biau and Györfi (2005) Pr{L n (µ n,1, µ 1 ) > ε} 2 mn e nε2 /2 for all 0 < ε Then decompose: L n(ν n, µ n,1 µ n,2 ) X X ν n(a B) ν(a B) A P n B Q n + X X ν(a B) µ 1 (A) µ 2 (B) + X X µ 1 (A) µ 2 (B) µ n,1 (A) µ n,2 (B). A P n B Q n A P n B Q n {z } {z } =0 under H 0 L n(µ n,1,µ 1 )+L n(µ n,2,µ 2 ) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 8 / 21

L 1 : Distribution-free strong consistent test Type II error: Decompose L n(ν n, µ n,1 µ n,2 ) X X ν(a B) µ 1 (A) µ 2 (B) A P n B Q n X X X X ν n(a B) ν(a B) µ 2 (B) µ n,2 (B) µ 1 (A) µ n,1 (A). A P n B Q n B Q n A P n {z } 0 a.s. Given cells shrink in P n and Q n, ν(a B) µ 1 (A) µ 2 (B) 2 sup ν(c) µ 1 µ 2 (C) > 0 C B Q n A P n C Borel subset of R d R d ; see Barron et al. (1992) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 9 / 21

L 1 : Distribution-free strong consistent test Type II error: Decompose L n(ν n, µ n,1 µ n,2 ) X X ν(a B) µ 1 (A) µ 2 (B) A P n B Q n X X X X ν n(a B) ν(a B) µ 2 (B) µ n,2 (B) µ 1 (A) µ n,1 (A). A P n B Q n B Q n A P n {z } 0 a.s. Given cells shrink in P n and Q n, ν(a B) µ 1 (A) µ 2 (B) 2 sup ν(c) µ 1 µ 2 (C) > 0 C B Q n A P n C Borel subset of R d R d ; see Barron et al. (1992) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 9 / 21

L 1 : Asymptotic α-level test Additional conditions: lim max µ 1 (A) = 0, n A P n lim max µ 2 (B) = 0, n B Q n Asymptotic distribution: Under H 0, n (Ln (ν n, µ n,1 µ n,2 ) C n ) /σ D N (0, 1), σ 2 = 1 2/π. Upper bound: Cn 2m n m n/(πn) Test: reject H 0 when L n (ν n, µ n,1 µ n,2 ) > 2m n m n/(πn) + σ/ n Φ 1 (1 α) Φ denotes standard normal distribution. Properties of test: Not distribution-free (nonatomic) α level: probability of Type I error asymptotically α Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 10 / 21

L 1 : Asymptotic α-level test Additional conditions: lim max µ 1 (A) = 0, n A P n lim max µ 2 (B) = 0, n B Q n Asymptotic distribution: Under H 0, n (Ln (ν n, µ n,1 µ n,2 ) C n ) /σ D N (0, 1), σ 2 = 1 2/π. Upper bound: Cn 2m n m n/(πn) Test: reject H 0 when L n (ν n, µ n,1 µ n,2 ) > 2m n m n/(πn) + σ/ n Φ 1 (1 α) Φ denotes standard normal distribution. Properties of test: Not distribution-free (nonatomic) α level: probability of Type I error asymptotically α Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 10 / 21

L 1 : Asymptotic α-level test Additional conditions: lim max µ 1 (A) = 0, n A P n lim max µ 2 (B) = 0, n B Q n Asymptotic distribution: Under H 0, n (Ln (ν n, µ n,1 µ n,2 ) C n ) /σ D N (0, 1), σ 2 = 1 2/π. Upper bound: Cn 2m n m n/(πn) Test: reject H 0 when L n (ν n, µ n,1 µ n,2 ) > 2m n m n/(πn) + σ/ n Φ 1 (1 α) Φ denotes standard normal distribution. Properties of test: Not distribution-free (nonatomic) α level: probability of Type I error asymptotically α Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 10 / 21

L 1 : Asymptotic α-level test Additional conditions: lim max µ 1 (A) = 0, n A P n lim max µ 2 (B) = 0, n B Q n Asymptotic distribution: Under H 0, n (Ln (ν n, µ n,1 µ n,2 ) C n ) /σ D N (0, 1), σ 2 = 1 2/π. Upper bound: Cn 2m n m n/(πn) Test: reject H 0 when L n (ν n, µ n,1 µ n,2 ) > 2m n m n/(πn) + σ/ n Φ 1 (1 α) Φ denotes standard normal distribution. Properties of test: Not distribution-free (nonatomic) α level: probability of Type I error asymptotically α Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 10 / 21

L 1 : Asymptotic α-level test Proof: sketch only Problem: samples in each bin not independent Define the poisson random variable N n Poisson(n). Define the poisson variables: nν Nn (A B) = #{i : (X i, Y i ) A B, i = 1,..., N n }, and nµ Nn,1(A) = #{i : X i A, i = 1,..., N n }, nµ Nn,2(B) = #{i : Y i B, i = 1,..., N n } New Poissonized statistic: L n (ν n, µ n,1 µ n,2 ) = ν Nn (A B) µ Nn,1(A) µ Nn,2(B). A Pn B Q n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 11 / 21

L 1 : Asymptotic α-level test Proof: sketch only Problem: samples in each bin not independent Define the poisson random variable N n Poisson(n). Define the poisson variables: nν Nn (A B) = #{i : (X i, Y i ) A B, i = 1,..., N n }, and nµ Nn,1(A) = #{i : X i A, i = 1,..., N n }, nµ Nn,2(B) = #{i : Y i B, i = 1,..., N n } New Poissonized statistic: L n (ν n, µ n,1 µ n,2 ) = ν Nn (A B) µ Nn,1(A) µ Nn,2(B). A Pn B Q n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 11 / 21

L 1 : Asymptotic α-level test Proof: sketch only Problem: samples in each bin not independent Define the poisson random variable N n Poisson(n). Define the poisson variables: nν Nn (A B) = #{i : (X i, Y i ) A B, i = 1,..., N n }, and nµ Nn,1(A) = #{i : X i A, i = 1,..., N n }, nµ Nn,2(B) = #{i : Y i B, i = 1,..., N n } New Poissonized statistic: L n (ν n, µ n,1 µ n,2 ) = ν Nn (A B) µ Nn,1(A) µ Nn,2(B). A Pn B Q n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 11 / 21

L 1 : Asymptotic α-level test Proof sketch (continued, part 1) If poissonized statistic converges... ( n [ Ln E( L n )], N ) n n n D N ([ 0 0 ] [ σ 2 0, 0 1 ])...then test statistic converges: n σ (L n(ν n, µ n,1 µ n,2 ) E( L n )) D N (0, 1) (proof by comparing characteristic functions) Beirlant et al. (1994) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 12 / 21

L 1 : Asymptotic α-level test Proof sketch (continued, part 1) If poissonized statistic converges... ( n [ Ln E( L n )], N ) n n n D N ([ 0 0 ] [ σ 2 0, 0 1 ])...then test statistic converges: n σ (L n(ν n, µ n,1 µ n,2 ) E( L n )) D N (0, 1) (proof by comparing characteristic functions) Beirlant et al. (1994) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 12 / 21

L 1 : Asymptotic α-level test Proof sketch (continued, part 2) Convergence for Poisson e.g. Jacod and Protter(2004, p. 170) 1 n (Z n n) D Z as n where Z N (0, 1). Likewise ν Nn (A B) µ Nn,1(A)µ Nn,2(B) D Z µ1 (A)µ 2 (B) n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 13 / 21

Log likelihood: test statistic Test statistic: I n (ν n, µ n,1 µ n,2 ) = 2 ν n (A B) ν n (A B) log µ n,1 (A) µ n,2 (B). A P n B Q n Require earlier conditions on m n, m n, and n Test: reject H 0 when I n (ν n, µ n,1 µ n,2 ) m n m nn 1 (2 log(n + m n m n) + 1). Distribution-free, strongly consistent Asymptotically α-level test: reject H 0 when ni n (ν n, µ n,1 µ n,2 ) > m n m n + 2m n m nφ 1 (1 α) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 14 / 21

Log likelihood: test statistic Test statistic: I n (ν n, µ n,1 µ n,2 ) = 2 ν n (A B) ν n (A B) log µ n,1 (A) µ n,2 (B). A P n B Q n Require earlier conditions on m n, m n, and n Test: reject H 0 when I n (ν n, µ n,1 µ n,2 ) m n m nn 1 (2 log(n + m n m n) + 1). Distribution-free, strongly consistent Asymptotically α-level test: reject H 0 when ni n (ν n, µ n,1 µ n,2 ) > m n m n + 2m n m nφ 1 (1 α) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 14 / 21

Log likelihood: test statistic Test statistic: I n (ν n, µ n,1 µ n,2 ) = 2 ν n (A B) ν n (A B) log µ n,1 (A) µ n,2 (B). A P n B Q n Require earlier conditions on m n, m n, and n Test: reject H 0 when I n (ν n, µ n,1 µ n,2 ) m n m nn 1 (2 log(n + m n m n) + 1). Distribution-free, strongly consistent Asymptotically α-level test: reject H 0 when ni n (ν n, µ n,1 µ n,2 ) > m n m n + 2m n m nφ 1 (1 α) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 14 / 21

Log likelihood: test statistic Test statistic: I n (ν n, µ n,1 µ n,2 ) = 2 ν n (A B) ν n (A B) log µ n,1 (A) µ n,2 (B). A P n B Q n Require earlier conditions on m n, m n, and n Test: reject H 0 when I n (ν n, µ n,1 µ n,2 ) m n m nn 1 (2 log(n + m n m n) + 1). Distribution-free, strongly consistent Asymptotically α-level test: reject H 0 when ni n (ν n, µ n,1 µ n,2 ) > m n m n + 2m n m nφ 1 (1 α) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 14 / 21

Log likelihood: tests Proof sketch (strong consistent) Under H 0 I n (ν n, ν) I n (ν n, µ n,1 µ n,2 ) = I n (µ n,1, µ 1 ) + I n (µ n,2, µ 2 ) 0. Relevant large deviation bound Kallenberg(1985), Quine and Robinson (1985) Under H 1, Pinsker: Pr{I n (ν n, ν)/2 > ɛ} = e n(ɛ+o(1)). L 2 n(ν n, µ n,1 µ n,2 ) I n (ν n, µ n,1 µ n,2 ). Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 15 / 21

Kernel Statistic Assume k(x, ) = k( x ) Kernel test statistic: T n := 1 n 2 tr(khk H) Ki,j = k(x i, x j ), centering H = I 1 n 1 n1 n K i,j = k(y i, y j ) Decreasing bandwidth: L 2 distance between densities Fixed bandwidth: Smoothed distance between characteristic functions Feuerverger (1993) OR difference between distribution embeddings to RKHS Berlinet and Thomas-Agnan (2003), Gretton et al. (2008) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 16 / 21

Asymptotic α-level tests Asymptotic distribution under H 0 : Gaussian using first two moments (decreasing bandwidth) Hall (1984), Cotterill and Csörgő (1985) Two parameter Gamma using first two moments (fixed bandwidth) Feuerverger (1993), Gretton (2008) nt n x α 1 e x/β β α Γ(α) α = (E(T n)) 2 var(t n ), β = nvar(t n) E(T n ). P(nT n < T) P(nT n < T) 1 0.8 0.6 n=200, h=0.1 0.4 Gamma 0.2 Normal Emp 0 0.4 0.6 0.8 1 1.2 1.4 T 1 0.8 0.6 0.4 0.2 n=200, h=1 0 0 0.5 1 1.5 2 T Threshold NOT distribution-free: moments under H 0 computed using sample Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 17 / 21

Experimental results Experimental setup: Independent sample (X 1, Y 1 ),..., (X n, Y n ) Rotate sample to create dependence Embed randomly in R d, remaining subspace Gaussian noise 2 Independent 2 Dependent after rotation 1.5 1.5 1 1 Sample Y 0.5 0 0.5 Sample Y 0.5 0 0.5 1 1 1.5 1.5 2 2 2 1 0 1 2 Sample X 2.5 2 1 0 1 2 Sample X Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 18 / 21

Experimental results Comparison of asymptotically α-level tests Given distribution Pr, test H 0 : Pr = Pr x Pr y Continuous valued, multivariate: X := R d and Y := R d % acceptance of H 0 1 0.8 0.6 0.4 0.2 n=512, d=d =1 Ker(g) Ker(n) L1 Pears Like % acceptance of H 0 1 0.8 0.6 0.4 0.2 n=512, d=d =2 0 0 0.2 0.4 0.6 0.8 1 Angle ( π/4) 0 0 0.2 0.4 0.6 0.8 1 Angle ( π/4) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 19 / 21

Experimental results Higher dimensions: d = d = 3 Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 20 / 21

Conclusions Multi-dimensional nonparametric independence tests Strongly consistent, distribution-free Asymptotically α-level Test statistics L1 distance Log likelihood Kernel dependence measure (HSIC) In experiments: Kernel tests have better performance, but......threshold estimated from sample Further work: Distribution-free threshold for kernel α-level test Consistent null distribution estimate for kernel α-level test Proof of χ 2 test conjecture Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 21 / 21