Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches

Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Arthur Gretton 1 and László Györfi 2 1. Gatsby Computational Neuroscience Unit London, UK 2. Budapest University of Technology and Economics Budapest, Hungary September, 2010 Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 1 / 21

Introduction Statistical tests of independence Multivariate Nonparametric Two kinds of tests: Strongly consistent, distribution-free Asymptotically α-level Three test statistics: L1 distance Log-likelihood Kernel dependence measure (HSIC) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 2 / 21

Problem overview Given distribution Pr, test H 0 : Pr = Pr x Pr y Continuous valued, multivariate: X := R d and Y := R d 5 Dependent P XY 5 Independent P XY =P X P Y Y 0 Y 0 5 5 0 5 X 5 5 0 5 X Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 3 / 21

Problem overview Finite i.i.d. sample (X 1, Y 1 ),..., (X n, Y n ) from Pr Partition space X into m n bins, space Y into m n bins Discretized empirical P XY Discretized empirical P X P Y Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 4 / 21

Problem overview Finite i.i.d. sample (X 1, Y 1 ),..., (X n, Y n ) from Pr Refine partition m n, m n for increasing n Discretized empirical P XY Discretized empirical P X P Y Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 5 / 21

L 1 statistic Space partition Pn = {A n,1,..., A n,mn } of X = R d Qn = {B n,1,..., B n,m n } of Y = R d, Empirical measures νn (A B) = n 1 #{i : (X i, Y i ) A B, i = 1,..., n} µn,1 (A) = n 1 #{i : X i A, i = 1,..., n} µn,2 (B) = n 1 #{i : Y i B, i = 1,..., n} Test statistic: L n (ν n, µ n,1 µ n,2 ) = ν n (A B) µ n,1 (A) µ n,2 (B). A P n B Q n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 6 / 21

L 1 : Distribution-free strong consistent test Test: reject H 0 when mn m n L n (ν n, µ n,1 µ n,2 ) > c 1, where c 1 > 2 ln 2. n Distribution-free Strongly consistent: after random sample size, test makes a.s. no error Conditions limn m n m n/n = 0, limn m n /ln n =, lim n m n/ln n =, P n and Q n cells shrink: lim n S any sphere centred at origin max diam(a) = 0 A P n, A S 0 Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 7 / 21

L 1 : Distribution-free strong consistent test Type I error: Start with goodness of fit L n (µ n,1, µ 1 ) = A P n µ n,1 (A) µ 1 (A). Associated bound Beirlant et al. (2001), Biau and Györfi (2005) Pr{L n (µ n,1, µ 1 ) > ε} 2 mn e nε2 /2 for all 0 < ε Then decompose: L n(ν n, µ n,1 µ n,2 ) X X ν n(a B) ν(a B) A P n B Q n + X X ν(a B) µ 1 (A) µ 2 (B) + X X µ 1 (A) µ 2 (B) µ n,1 (A) µ n,2 (B). A P n B Q n A P n B Q n {z } {z } =0 under H 0 L n(µ n,1,µ 1 )+L n(µ n,2,µ 2 ) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 8 / 21

L 1 : Distribution-free strong consistent test Type II error: Decompose L n(ν n, µ n,1 µ n,2 ) X X ν(a B) µ 1 (A) µ 2 (B) A P n B Q n X X X X ν n(a B) ν(a B) µ 2 (B) µ n,2 (B) µ 1 (A) µ n,1 (A). A P n B Q n B Q n A P n {z } 0 a.s. Given cells shrink in P n and Q n, ν(a B) µ 1 (A) µ 2 (B) 2 sup ν(c) µ 1 µ 2 (C) > 0 C B Q n A P n C Borel subset of R d R d ; see Barron et al. (1992) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 9 / 21

L 1 : Asymptotic α-level test Additional conditions: lim max µ 1 (A) = 0, n A P n lim max µ 2 (B) = 0, n B Q n Asymptotic distribution: Under H 0, n (Ln (ν n, µ n,1 µ n,2 ) C n ) /σ D N (0, 1), σ 2 = 1 2/π. Upper bound: Cn 2m n m n/(πn) Test: reject H 0 when L n (ν n, µ n,1 µ n,2 ) > 2m n m n/(πn) + σ/ n Φ 1 (1 α) Φ denotes standard normal distribution. Properties of test: Not distribution-free (nonatomic) α level: probability of Type I error asymptotically α Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 10 / 21

L 1 : Asymptotic α-level test Proof: sketch only Problem: samples in each bin not independent Define the poisson random variable N n Poisson(n). Define the poisson variables: nν Nn (A B) = #{i : (X i, Y i ) A B, i = 1,..., N n }, and nµ Nn,1(A) = #{i : X i A, i = 1,..., N n }, nµ Nn,2(B) = #{i : Y i B, i = 1,..., N n } New Poissonized statistic: L n (ν n, µ n,1 µ n,2 ) = ν Nn (A B) µ Nn,1(A) µ Nn,2(B). A Pn B Q n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 11 / 21

L 1 : Asymptotic α-level test Proof sketch (continued, part 1) If poissonized statistic converges... ( n [ Ln E( L n )], N ) n n n D N ([ 0 0 ] [ σ 2 0, 0 1 ])...then test statistic converges: n σ (L n(ν n, µ n,1 µ n,2 ) E( L n )) D N (0, 1) (proof by comparing characteristic functions) Beirlant et al. (1994) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 12 / 21

L 1 : Asymptotic α-level test Proof sketch (continued, part 2) Convergence for Poisson e.g. Jacod and Protter(2004, p. 170) 1 n (Z n n) D Z as n where Z N (0, 1). Likewise ν Nn (A B) µ Nn,1(A)µ Nn,2(B) D Z µ1 (A)µ 2 (B) n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 13 / 21

Log likelihood: test statistic Test statistic: I n (ν n, µ n,1 µ n,2 ) = 2 ν n (A B) ν n (A B) log µ n,1 (A) µ n,2 (B). A P n B Q n Require earlier conditions on m n, m n, and n Test: reject H 0 when I n (ν n, µ n,1 µ n,2 ) m n m nn 1 (2 log(n + m n m n) + 1). Distribution-free, strongly consistent Asymptotically α-level test: reject H 0 when ni n (ν n, µ n,1 µ n,2 ) > m n m n + 2m n m nφ 1 (1 α) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 14 / 21

Log likelihood: tests Proof sketch (strong consistent) Under H 0 I n (ν n, ν) I n (ν n, µ n,1 µ n,2 ) = I n (µ n,1, µ 1 ) + I n (µ n,2, µ 2 ) 0. Relevant large deviation bound Kallenberg(1985), Quine and Robinson (1985) Under H 1, Pinsker: Pr{I n (ν n, ν)/2 > ɛ} = e n(ɛ+o(1)). L 2 n(ν n, µ n,1 µ n,2 ) I n (ν n, µ n,1 µ n,2 ). Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 15 / 21

Kernel Statistic Assume k(x, ) = k( x ) Kernel test statistic: T n := 1 n 2 tr(khk H) Ki,j = k(x i, x j ), centering H = I 1 n 1 n1 n K i,j = k(y i, y j ) Decreasing bandwidth: L 2 distance between densities Fixed bandwidth: Smoothed distance between characteristic functions Feuerverger (1993) OR difference between distribution embeddings to RKHS Berlinet and Thomas-Agnan (2003), Gretton et al. (2008) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 16 / 21

Asymptotic α-level tests Asymptotic distribution under H 0 : Gaussian using first two moments (decreasing bandwidth) Hall (1984), Cotterill and Csörgő (1985) Two parameter Gamma using first two moments (fixed bandwidth) Feuerverger (1993), Gretton (2008) nt n x α 1 e x/β β α Γ(α) α = (E(T n)) 2 var(t n ), β = nvar(t n) E(T n ). P(nT n < T) P(nT n < T) 1 0.8 0.6 n=200, h=0.1 0.4 Gamma 0.2 Normal Emp 0 0.4 0.6 0.8 1 1.2 1.4 T 1 0.8 0.6 0.4 0.2 n=200, h=1 0 0 0.5 1 1.5 2 T Threshold NOT distribution-free: moments under H 0 computed using sample Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 17 / 21

Experimental results Experimental setup: Independent sample (X 1, Y 1 ),..., (X n, Y n ) Rotate sample to create dependence Embed randomly in R d, remaining subspace Gaussian noise 2 Independent 2 Dependent after rotation 1.5 1.5 1 1 Sample Y 0.5 0 0.5 Sample Y 0.5 0 0.5 1 1 1.5 1.5 2 2 2 1 0 1 2 Sample X 2.5 2 1 0 1 2 Sample X Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 18 / 21

Experimental results Comparison of asymptotically α-level tests Given distribution Pr, test H 0 : Pr = Pr x Pr y Continuous valued, multivariate: X := R d and Y := R d % acceptance of H 0 1 0.8 0.6 0.4 0.2 n=512, d=d =1 Ker(g) Ker(n) L1 Pears Like % acceptance of H 0 1 0.8 0.6 0.4 0.2 n=512, d=d =2 0 0 0.2 0.4 0.6 0.8 1 Angle ( π/4) 0 0 0.2 0.4 0.6 0.8 1 Angle ( π/4) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 19 / 21

Experimental results Higher dimensions: d = d = 3 Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 20 / 21

Conclusions Multi-dimensional nonparametric independence tests Strongly consistent, distribution-free Asymptotically α-level Test statistics L1 distance Log likelihood Kernel dependence measure (HSIC) In experiments: Kernel tests have better performance, but......threshold estimated from sample Further work: Distribution-free threshold for kernel α-level test Consistent null distribution estimate for kernel α-level test Proof of χ 2 test conjecture Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, 2010 21 / 21