Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches
|
|
- Jessica Perry
- 5 years ago
- Views:
Transcription
1 Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Arthur Gretton 1 and László Györfi 2 1. Gatsby Computational Neuroscience Unit London, UK 2. Budapest University of Technology and Economics Budapest, Hungary September, 2010 Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
2 Introduction Statistical tests of independence Multivariate Nonparametric Two kinds of tests: Strongly consistent, distribution-free Asymptotically α-level Three test statistics: L1 distance Log-likelihood Kernel dependence measure (HSIC) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
3 Introduction Statistical tests of independence Multivariate Nonparametric Two kinds of tests: Strongly consistent, distribution-free Asymptotically α-level Three test statistics: L1 distance Log-likelihood Kernel dependence measure (HSIC) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
4 Introduction Statistical tests of independence Multivariate Nonparametric Two kinds of tests: Strongly consistent, distribution-free Asymptotically α-level Three test statistics: L1 distance Log-likelihood Kernel dependence measure (HSIC) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
5 Problem overview Given distribution Pr, test H 0 : Pr = Pr x Pr y Continuous valued, multivariate: X := R d and Y := R d 5 Dependent P XY 5 Independent P XY =P X P Y Y 0 Y X X Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
6 Problem overview Finite i.i.d. sample (X 1, Y 1 ),..., (X n, Y n ) from Pr Partition space X into m n bins, space Y into m n bins Discretized empirical P XY Discretized empirical P X P Y Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
7 Problem overview Finite i.i.d. sample (X 1, Y 1 ),..., (X n, Y n ) from Pr Refine partition m n, m n for increasing n Discretized empirical P XY Discretized empirical P X P Y Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
8 L 1 statistic Space partition Pn = {A n,1,..., A n,mn } of X = R d Qn = {B n,1,..., B n,m n } of Y = R d, Empirical measures νn (A B) = n 1 #{i : (X i, Y i ) A B, i = 1,..., n} µn,1 (A) = n 1 #{i : X i A, i = 1,..., n} µn,2 (B) = n 1 #{i : Y i B, i = 1,..., n} Test statistic: L n (ν n, µ n,1 µ n,2 ) = ν n (A B) µ n,1 (A) µ n,2 (B). A P n B Q n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
9 L 1 statistic Space partition Pn = {A n,1,..., A n,mn } of X = R d Qn = {B n,1,..., B n,m n } of Y = R d, Empirical measures νn (A B) = n 1 #{i : (X i, Y i ) A B, i = 1,..., n} µn,1 (A) = n 1 #{i : X i A, i = 1,..., n} µn,2 (B) = n 1 #{i : Y i B, i = 1,..., n} Test statistic: L n (ν n, µ n,1 µ n,2 ) = ν n (A B) µ n,1 (A) µ n,2 (B). A P n B Q n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
10 L 1 statistic Space partition Pn = {A n,1,..., A n,mn } of X = R d Qn = {B n,1,..., B n,m n } of Y = R d, Empirical measures νn (A B) = n 1 #{i : (X i, Y i ) A B, i = 1,..., n} µn,1 (A) = n 1 #{i : X i A, i = 1,..., n} µn,2 (B) = n 1 #{i : Y i B, i = 1,..., n} Test statistic: L n (ν n, µ n,1 µ n,2 ) = ν n (A B) µ n,1 (A) µ n,2 (B). A P n B Q n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
11 L 1 : Distribution-free strong consistent test Test: reject H 0 when mn m n L n (ν n, µ n,1 µ n,2 ) > c 1, where c 1 > 2 ln 2. n Distribution-free Strongly consistent: after random sample size, test makes a.s. no error Conditions limn m n m n/n = 0, limn m n /ln n =, lim n m n/ln n =, P n and Q n cells shrink: lim n S any sphere centred at origin max diam(a) = 0 A P n, A S 0 Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
12 L 1 : Distribution-free strong consistent test Test: reject H 0 when mn m n L n (ν n, µ n,1 µ n,2 ) > c 1, where c 1 > 2 ln 2. n Distribution-free Strongly consistent: after random sample size, test makes a.s. no error Conditions limn m n m n/n = 0, limn m n /ln n =, lim n m n/ln n =, P n and Q n cells shrink: lim n S any sphere centred at origin max diam(a) = 0 A P n, A S 0 Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
13 L 1 : Distribution-free strong consistent test Type I error: Start with goodness of fit L n (µ n,1, µ 1 ) = A P n µ n,1 (A) µ 1 (A). Associated bound Beirlant et al. (2001), Biau and Györfi (2005) Pr{L n (µ n,1, µ 1 ) > ε} 2 mn e nε2 /2 for all 0 < ε Then decompose: L n(ν n, µ n,1 µ n,2 ) X X ν n(a B) ν(a B) A P n B Q n + X X ν(a B) µ 1 (A) µ 2 (B) + X X µ 1 (A) µ 2 (B) µ n,1 (A) µ n,2 (B). A P n B Q n A P n B Q n {z } {z } =0 under H 0 L n(µ n,1,µ 1 )+L n(µ n,2,µ 2 ) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
14 L 1 : Distribution-free strong consistent test Type I error: Start with goodness of fit L n (µ n,1, µ 1 ) = A P n µ n,1 (A) µ 1 (A). Associated bound Beirlant et al. (2001), Biau and Györfi (2005) Pr{L n (µ n,1, µ 1 ) > ε} 2 mn e nε2 /2 for all 0 < ε Then decompose: L n(ν n, µ n,1 µ n,2 ) X X ν n(a B) ν(a B) A P n B Q n + X X ν(a B) µ 1 (A) µ 2 (B) + X X µ 1 (A) µ 2 (B) µ n,1 (A) µ n,2 (B). A P n B Q n A P n B Q n {z } {z } =0 under H 0 L n(µ n,1,µ 1 )+L n(µ n,2,µ 2 ) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
15 L 1 : Distribution-free strong consistent test Type II error: Decompose L n(ν n, µ n,1 µ n,2 ) X X ν(a B) µ 1 (A) µ 2 (B) A P n B Q n X X X X ν n(a B) ν(a B) µ 2 (B) µ n,2 (B) µ 1 (A) µ n,1 (A). A P n B Q n B Q n A P n {z } 0 a.s. Given cells shrink in P n and Q n, ν(a B) µ 1 (A) µ 2 (B) 2 sup ν(c) µ 1 µ 2 (C) > 0 C B Q n A P n C Borel subset of R d R d ; see Barron et al. (1992) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
16 L 1 : Distribution-free strong consistent test Type II error: Decompose L n(ν n, µ n,1 µ n,2 ) X X ν(a B) µ 1 (A) µ 2 (B) A P n B Q n X X X X ν n(a B) ν(a B) µ 2 (B) µ n,2 (B) µ 1 (A) µ n,1 (A). A P n B Q n B Q n A P n {z } 0 a.s. Given cells shrink in P n and Q n, ν(a B) µ 1 (A) µ 2 (B) 2 sup ν(c) µ 1 µ 2 (C) > 0 C B Q n A P n C Borel subset of R d R d ; see Barron et al. (1992) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
17 L 1 : Asymptotic α-level test Additional conditions: lim max µ 1 (A) = 0, n A P n lim max µ 2 (B) = 0, n B Q n Asymptotic distribution: Under H 0, n (Ln (ν n, µ n,1 µ n,2 ) C n ) /σ D N (0, 1), σ 2 = 1 2/π. Upper bound: Cn 2m n m n/(πn) Test: reject H 0 when L n (ν n, µ n,1 µ n,2 ) > 2m n m n/(πn) + σ/ n Φ 1 (1 α) Φ denotes standard normal distribution. Properties of test: Not distribution-free (nonatomic) α level: probability of Type I error asymptotically α Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
18 L 1 : Asymptotic α-level test Additional conditions: lim max µ 1 (A) = 0, n A P n lim max µ 2 (B) = 0, n B Q n Asymptotic distribution: Under H 0, n (Ln (ν n, µ n,1 µ n,2 ) C n ) /σ D N (0, 1), σ 2 = 1 2/π. Upper bound: Cn 2m n m n/(πn) Test: reject H 0 when L n (ν n, µ n,1 µ n,2 ) > 2m n m n/(πn) + σ/ n Φ 1 (1 α) Φ denotes standard normal distribution. Properties of test: Not distribution-free (nonatomic) α level: probability of Type I error asymptotically α Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
19 L 1 : Asymptotic α-level test Additional conditions: lim max µ 1 (A) = 0, n A P n lim max µ 2 (B) = 0, n B Q n Asymptotic distribution: Under H 0, n (Ln (ν n, µ n,1 µ n,2 ) C n ) /σ D N (0, 1), σ 2 = 1 2/π. Upper bound: Cn 2m n m n/(πn) Test: reject H 0 when L n (ν n, µ n,1 µ n,2 ) > 2m n m n/(πn) + σ/ n Φ 1 (1 α) Φ denotes standard normal distribution. Properties of test: Not distribution-free (nonatomic) α level: probability of Type I error asymptotically α Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
20 L 1 : Asymptotic α-level test Additional conditions: lim max µ 1 (A) = 0, n A P n lim max µ 2 (B) = 0, n B Q n Asymptotic distribution: Under H 0, n (Ln (ν n, µ n,1 µ n,2 ) C n ) /σ D N (0, 1), σ 2 = 1 2/π. Upper bound: Cn 2m n m n/(πn) Test: reject H 0 when L n (ν n, µ n,1 µ n,2 ) > 2m n m n/(πn) + σ/ n Φ 1 (1 α) Φ denotes standard normal distribution. Properties of test: Not distribution-free (nonatomic) α level: probability of Type I error asymptotically α Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
21 L 1 : Asymptotic α-level test Proof: sketch only Problem: samples in each bin not independent Define the poisson random variable N n Poisson(n). Define the poisson variables: nν Nn (A B) = #{i : (X i, Y i ) A B, i = 1,..., N n }, and nµ Nn,1(A) = #{i : X i A, i = 1,..., N n }, nµ Nn,2(B) = #{i : Y i B, i = 1,..., N n } New Poissonized statistic: L n (ν n, µ n,1 µ n,2 ) = ν Nn (A B) µ Nn,1(A) µ Nn,2(B). A Pn B Q n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
22 L 1 : Asymptotic α-level test Proof: sketch only Problem: samples in each bin not independent Define the poisson random variable N n Poisson(n). Define the poisson variables: nν Nn (A B) = #{i : (X i, Y i ) A B, i = 1,..., N n }, and nµ Nn,1(A) = #{i : X i A, i = 1,..., N n }, nµ Nn,2(B) = #{i : Y i B, i = 1,..., N n } New Poissonized statistic: L n (ν n, µ n,1 µ n,2 ) = ν Nn (A B) µ Nn,1(A) µ Nn,2(B). A Pn B Q n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
23 L 1 : Asymptotic α-level test Proof: sketch only Problem: samples in each bin not independent Define the poisson random variable N n Poisson(n). Define the poisson variables: nν Nn (A B) = #{i : (X i, Y i ) A B, i = 1,..., N n }, and nµ Nn,1(A) = #{i : X i A, i = 1,..., N n }, nµ Nn,2(B) = #{i : Y i B, i = 1,..., N n } New Poissonized statistic: L n (ν n, µ n,1 µ n,2 ) = ν Nn (A B) µ Nn,1(A) µ Nn,2(B). A Pn B Q n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
24 L 1 : Asymptotic α-level test Proof sketch (continued, part 1) If poissonized statistic converges... ( n [ Ln E( L n )], N ) n n n D N ([ 0 0 ] [ σ 2 0, 0 1 ])...then test statistic converges: n σ (L n(ν n, µ n,1 µ n,2 ) E( L n )) D N (0, 1) (proof by comparing characteristic functions) Beirlant et al. (1994) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
25 L 1 : Asymptotic α-level test Proof sketch (continued, part 1) If poissonized statistic converges... ( n [ Ln E( L n )], N ) n n n D N ([ 0 0 ] [ σ 2 0, 0 1 ])...then test statistic converges: n σ (L n(ν n, µ n,1 µ n,2 ) E( L n )) D N (0, 1) (proof by comparing characteristic functions) Beirlant et al. (1994) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
26 L 1 : Asymptotic α-level test Proof sketch (continued, part 2) Convergence for Poisson e.g. Jacod and Protter(2004, p. 170) 1 n (Z n n) D Z as n where Z N (0, 1). Likewise ν Nn (A B) µ Nn,1(A)µ Nn,2(B) D Z µ1 (A)µ 2 (B) n Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
27 Log likelihood: test statistic Test statistic: I n (ν n, µ n,1 µ n,2 ) = 2 ν n (A B) ν n (A B) log µ n,1 (A) µ n,2 (B). A P n B Q n Require earlier conditions on m n, m n, and n Test: reject H 0 when I n (ν n, µ n,1 µ n,2 ) m n m nn 1 (2 log(n + m n m n) + 1). Distribution-free, strongly consistent Asymptotically α-level test: reject H 0 when ni n (ν n, µ n,1 µ n,2 ) > m n m n + 2m n m nφ 1 (1 α) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
28 Log likelihood: test statistic Test statistic: I n (ν n, µ n,1 µ n,2 ) = 2 ν n (A B) ν n (A B) log µ n,1 (A) µ n,2 (B). A P n B Q n Require earlier conditions on m n, m n, and n Test: reject H 0 when I n (ν n, µ n,1 µ n,2 ) m n m nn 1 (2 log(n + m n m n) + 1). Distribution-free, strongly consistent Asymptotically α-level test: reject H 0 when ni n (ν n, µ n,1 µ n,2 ) > m n m n + 2m n m nφ 1 (1 α) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
29 Log likelihood: test statistic Test statistic: I n (ν n, µ n,1 µ n,2 ) = 2 ν n (A B) ν n (A B) log µ n,1 (A) µ n,2 (B). A P n B Q n Require earlier conditions on m n, m n, and n Test: reject H 0 when I n (ν n, µ n,1 µ n,2 ) m n m nn 1 (2 log(n + m n m n) + 1). Distribution-free, strongly consistent Asymptotically α-level test: reject H 0 when ni n (ν n, µ n,1 µ n,2 ) > m n m n + 2m n m nφ 1 (1 α) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
30 Log likelihood: test statistic Test statistic: I n (ν n, µ n,1 µ n,2 ) = 2 ν n (A B) ν n (A B) log µ n,1 (A) µ n,2 (B). A P n B Q n Require earlier conditions on m n, m n, and n Test: reject H 0 when I n (ν n, µ n,1 µ n,2 ) m n m nn 1 (2 log(n + m n m n) + 1). Distribution-free, strongly consistent Asymptotically α-level test: reject H 0 when ni n (ν n, µ n,1 µ n,2 ) > m n m n + 2m n m nφ 1 (1 α) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
31 Log likelihood: tests Proof sketch (strong consistent) Under H 0 I n (ν n, ν) I n (ν n, µ n,1 µ n,2 ) = I n (µ n,1, µ 1 ) + I n (µ n,2, µ 2 ) 0. Relevant large deviation bound Kallenberg(1985), Quine and Robinson (1985) Under H 1, Pinsker: Pr{I n (ν n, ν)/2 > ɛ} = e n(ɛ+o(1)). L 2 n(ν n, µ n,1 µ n,2 ) I n (ν n, µ n,1 µ n,2 ). Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
32 Kernel Statistic Assume k(x, ) = k( x ) Kernel test statistic: T n := 1 n 2 tr(khk H) Ki,j = k(x i, x j ), centering H = I 1 n 1 n1 n K i,j = k(y i, y j ) Decreasing bandwidth: L 2 distance between densities Fixed bandwidth: Smoothed distance between characteristic functions Feuerverger (1993) OR difference between distribution embeddings to RKHS Berlinet and Thomas-Agnan (2003), Gretton et al. (2008) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
33 Asymptotic α-level tests Asymptotic distribution under H 0 : Gaussian using first two moments (decreasing bandwidth) Hall (1984), Cotterill and Csörgő (1985) Two parameter Gamma using first two moments (fixed bandwidth) Feuerverger (1993), Gretton (2008) nt n x α 1 e x/β β α Γ(α) α = (E(T n)) 2 var(t n ), β = nvar(t n) E(T n ). P(nT n < T) P(nT n < T) n=200, h= Gamma 0.2 Normal Emp T n=200, h= T Threshold NOT distribution-free: moments under H 0 computed using sample Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
34 Experimental results Experimental setup: Independent sample (X 1, Y 1 ),..., (X n, Y n ) Rotate sample to create dependence Embed randomly in R d, remaining subspace Gaussian noise 2 Independent 2 Dependent after rotation Sample Y Sample Y Sample X Sample X Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
35 Experimental results Comparison of asymptotically α-level tests Given distribution Pr, test H 0 : Pr = Pr x Pr y Continuous valued, multivariate: X := R d and Y := R d % acceptance of H n=512, d=d =1 Ker(g) Ker(n) L1 Pears Like % acceptance of H n=512, d=d = Angle ( π/4) Angle ( π/4) Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
36 Experimental results Higher dimensions: d = d = 3 Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
37 Conclusions Multi-dimensional nonparametric independence tests Strongly consistent, distribution-free Asymptotically α-level Test statistics L1 distance Log likelihood Kernel dependence measure (HSIC) In experiments: Kernel tests have better performance, but......threshold estimated from sample Further work: Distribution-free threshold for kernel α-level test Consistent null distribution estimate for kernel α-level test Proof of χ 2 test conjecture Gretton and Györfi (Gatsby) Nonparametric Indepedence Tests September, / 21
Nonparametric Independence Tests: Space Partitioning and Kernel Approaches
Nonparametric Independence Tests: Space Partitioning and Kernel Approaches Arthur Gretton and László Györfi 2 MPI for Biological Cybernetics, Spemannstr. 38, 7276 Tübingen, Germany, arthur@tuebingen.mpg.de
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationHilbert Space Representations of Probability Distributions
Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck
More informationLearning Interpretable Features to Compare Distributions
Learning Interpretable Features to Compare Distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Theory of Big Data, 2017 1/41 Goal of this talk Given: Two collections
More informationLearning features to compare distributions
Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney
More informationA Linear-Time Kernel Goodness-of-Fit Test
A Linear-Time Kernel Goodness-of-Fit Test Wittawat Jitkrittum 1; Wenkai Xu 1 Zoltán Szabó 2 NIPS 2017 Best paper! Kenji Fukumizu 3 Arthur Gretton 1 wittawatj@gmail.com 1 Gatsby Unit, University College
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute
More informationGeneralized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses
Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationA Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers
6th St.Petersburg Workshop on Simulation (2009) 1-3 A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers Ansgar Steland 1 Abstract Sequential kernel smoothers form a class of procedures
More informationAsymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More information5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.
88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal
More informationHypothesis Testing with Kernel Embeddings on Interdependent Data
Hypothesis Testing with Kernel Embeddings on Interdependent Data Dino Sejdinovic Department of Statistics University of Oxford joint work with Kacper Chwialkowski and Arthur Gretton (Gatsby Unit, UCL)
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationLectures 16-17: Poisson Approximation. Using Lemma (2.4.3) with θ = 1 and then Lemma (2.4.4), which is valid when max m p n,m 1/2, we have
Lectures 16-17: Poisson Approximation 1. The Law of Rare Events Theorem 2.6.1: For each n 1, let X n,m, 1 m n be a collection of independent random variables with PX n,m = 1 = p n,m and PX n,m = = 1 p
More informationAnalysis Comprehensive Exam Questions Fall 2008
Analysis Comprehensive xam Questions Fall 28. (a) Let R be measurable with finite Lebesgue measure. Suppose that {f n } n N is a bounded sequence in L 2 () and there exists a function f such that f n (x)
More informationLearning features to compare distributions
Learning features to compare distributions Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS 2016 Workshop on Adversarial Learning, Barcelona Spain 1/28 Goal of this
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationUnsupervised Nonparametric Anomaly Detection: A Kernel Method
Fifty-second Annual Allerton Conference Allerton House, UIUC, Illinois, USA October - 3, 24 Unsupervised Nonparametric Anomaly Detection: A Kernel Method Shaofeng Zou Yingbin Liang H. Vincent Poor 2 Xinghua
More informationAdvances in kernel exponential families
Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate
More informationHypothesis testing: theory and methods
Statistical Methods Warsaw School of Economics November 3, 2017 Statistical hypothesis is the name of any conjecture about unknown parameters of a population distribution. The hypothesis should be verifiable
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationLocal Minimax Testing
Local Minimax Testing Sivaraman Balakrishnan and Larry Wasserman Carnegie Mellon June 11, 2017 1 / 32 Hypothesis testing beyond classical regimes Example 1: Testing distribution of bacteria in gut microbiome
More informationOptimal global rates of convergence for interpolation problems with random design
Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289
More informationarxiv: v1 [math.st] 24 Aug 2017
Submitted to the Annals of Statistics MULTIVARIATE DEPENDENCY MEASURE BASED ON COPULA AND GAUSSIAN KERNEL arxiv:1708.07485v1 math.st] 24 Aug 2017 By AngshumanRoy and Alok Goswami and C. A. Murthy We propose
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationKernel Bayes Rule: Nonparametric Bayesian inference with kernels
Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December
More informationStochastic Models (Lecture #4)
Stochastic Models (Lecture #4) Thomas Verdebout Université libre de Bruxelles (ULB) Today Today, our goal will be to discuss limits of sequences of rv, and to study famous limiting results. Convergence
More informationNonparametric Bayesian Uncertainty Quantification
Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery
More informationAsymptotics for posterior hazards
Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and
More informationPATTERNS OF PRIMES IN ARITHMETIC PROGRESSIONS
PATTERNS OF PRIMES IN ARITHMETIC PROGRESSIONS JÁNOS PINTZ Rényi Institute of the Hungarian Academy of Sciences CIRM, Dec. 13, 2016 2 1. Patterns of primes Notation: p n the n th prime, P = {p i } i=1,
More informationConfidence intervals for kernel density estimation
Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting
More informationA Wild Bootstrap for Degenerate Kernel Tests
A Wild Bootstrap for Degenerate Kernel Tests Kacper Chwialkowski Department of Computer Science University College London London, Gower Street, WCE 6BT kacper.chwialkowski@gmail.com Dino Sejdinovic Gatsby
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationA Few Special Distributions and Their Properties
A Few Special Distributions and Their Properties Econ 690 Purdue University Justin L. Tobias (Purdue) Distributional Catalog 1 / 20 Special Distributions and Their Associated Properties 1 Uniform Distribution
More informationSTA 2101/442 Assignment 3 1
STA 2101/442 Assignment 3 1 These questions are practice for the midterm and final exam, and are not to be handed in. 1. Suppose X 1,..., X n are a random sample from a distribution with mean µ and variance
More informationA TEST OF FIT FOR THE GENERALIZED PARETO DISTRIBUTION BASED ON TRANSFORMS
A TEST OF FIT FOR THE GENERALIZED PARETO DISTRIBUTION BASED ON TRANSFORMS Dimitrios Konstantinides, Simos G. Meintanis Department of Statistics and Acturial Science, University of the Aegean, Karlovassi,
More informationA Statistical Journey through Historic Haverford
A Statistical Journey through Historic Haverford Arthur Berg University of California, San Diego Arthur Berg A Statistical Journey through Historic Haverford 2/ 18 The Dataset Histograms Oldest? Arthur
More informationElements of Positive Definite Kernel and Reproducing Kernel Hilbert Space
Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department
More informationAnalysis methods of heavy-tailed data
Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course Chapter
More information1 Exponential manifold by reproducing kernel Hilbert spaces
1 Exponential manifold by reproducing kernel Hilbert spaces 1.1 Introduction The purpose of this paper is to propose a method of constructing exponential families of Hilbert manifold, on which estimation
More informationStudent-t Process as Alternative to Gaussian Processes Discussion
Student-t Process as Alternative to Gaussian Processes Discussion A. Shah, A. G. Wilson and Z. Gharamani Discussion by: R. Henao Duke University June 20, 2014 Contributions The paper is concerned about
More informationQualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama
Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Instructions This exam has 7 pages in total, numbered 1 to 7. Make sure your exam has all the pages. This exam will be 2 hours
More informationComplexity of two and multi-stage stochastic programming problems
Complexity of two and multi-stage stochastic programming problems A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA The concept
More informationLocal Polynomial Modelling and Its Applications
Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,
More informationTwo-sample hypothesis testing for random dot product graphs
Two-sample hypothesis testing for random dot product graphs Minh Tang Department of Applied Mathematics and Statistics Johns Hopkins University JSM 2014 Joint work with Avanti Athreya, Vince Lyzinski,
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationLecture I: Asymptotics for large GUE random matrices
Lecture I: Asymptotics for large GUE random matrices Steen Thorbjørnsen, University of Aarhus andom Matrices Definition. Let (Ω, F, P) be a probability space and let n be a positive integer. Then a random
More informationMath 562 Homework 1 August 29, 2006 Dr. Ron Sahoo
Math 56 Homework August 9, 006 Dr. Ron Sahoo He who labors diligently need never despair; for all things are accomplished by diligence and labor. Menander of Athens Direction: This homework worths 60 points
More informationBootstrapping high dimensional vector: interplay between dependence and dimensionality
Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang
More informationNonparametric Inference In Functional Data
Nonparametric Inference In Functional Data Zuofeng Shang Purdue University Joint work with Guang Cheng from Purdue Univ. An Example Consider the functional linear model: Y = α + where 1 0 X(t)β(t)dt +
More informationAdvanced topics from statistics
Advanced topics from statistics Anders Ringgaard Kristensen Advanced Herd Management Slide 1 Outline Covariance and correlation Random vectors and multivariate distributions The multinomial distribution
More informationBig Hypothesis Testing with Kernel Embeddings
Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big
More informationA Kernel Statistical Test of Independence
A Kernel Statistical Test of Independence Arthur Gretton MPI for Biological Cybernetics Tübingen, Germany arthur@tuebingen.mpg.de Kenji Fukumizu Inst. of Statistical Mathematics Tokyo Japan fukumizu@ism.ac.jp
More informationHomework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples
Homework #3 36-754, Spring 27 Due 14 May 27 1 Convergence of the empirical CDF, uniform samples In this problem and the next, X i are IID samples on the real line, with cumulative distribution function
More informationA new method of nonparametric density estimation
A new method of nonparametric density estimation Andrey Pepelyshev Cardi December 7, 2011 1/32 A. Pepelyshev A new method of nonparametric density estimation Contents Introduction A new density estimate
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationOn prediction and density estimation Peter McCullagh University of Chicago December 2004
On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating
More informationA tailor made nonparametric density estimate
A tailor made nonparametric density estimate Daniel Carando 1, Ricardo Fraiman 2 and Pablo Groisman 1 1 Universidad de Buenos Aires 2 Universidad de San Andrés School and Workshop on Probability Theory
More informationBahadur representations for bootstrap quantiles 1
Bahadur representations for bootstrap quantiles 1 Yijun Zuo Department of Statistics and Probability, Michigan State University East Lansing, MI 48824, USA zuo@msu.edu 1 Research partially supported by
More informationSTAT 135 Lab 7 Distributions derived from the normal distribution, and comparing independent samples.
STAT 135 Lab 7 Distributions derived from the normal distribution, and comparing independent samples. Rebecca Barter March 16, 2015 The χ 2 distribution The χ 2 distribution We have seen several instances
More informationMultivariate Statistics
Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical
More informationPenalized Barycenters in the Wasserstein space
Penalized Barycenters in the Wasserstein space Elsa Cazelles, joint work with Jérémie Bigot & Nicolas Papadakis Université de Bordeaux & CNRS Journées IOP - Du 5 au 8 Juillet 2017 Bordeaux Elsa Cazelles
More informationLarge sample covariance matrices and the T 2 statistic
Large sample covariance matrices and the T 2 statistic EURANDOM, the Netherlands Joint work with W. Zhou Outline 1 2 Basic setting Let {X ij }, i, j =, be i.i.d. r.v. Write n s j = (X 1j,, X pj ) T and
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More information(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.
Problem 1 (21 points) An economist runs the regression y i = β 0 + x 1i β 1 + x 2i β 2 + x 3i β 3 + ε i (1) The results are summarized in the following table: Equation 1. Variable Coefficient Std. Error
More informationStatistical Estimation: Data & Non-data Information
Statistical Estimation: Data & Non-data Information Roger J-B Wets University of California, Davis & M.Casey @ Raytheon G.Pflug @ U. Vienna, X. Dong @ EpiRisk, G-M You @ EpiRisk. a little background Decision
More informationASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin
The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian
More informationGoodness-of-Fit Pitfalls and Power Example: Testing Consistency of Two Histograms
Goodness-of-Fit Pitfalls and Power Example: Testing Consistency of Two Histograms Sometimes we have two histograms and are faced with the question: Are they consistent? That is, are our two histograms
More informationLIST OF FORMULAS FOR STK1100 AND STK1110
LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function
More informationCS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings
CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,
More informationEconomics 883 Spring 2016 Tauchen. Jump Regression
Economics 883 Spring 2016 Tauchen Jump Regression 1 Main Model In the jump regression setting we have X = ( Z Y where Z is the log of the market index and Y is the log of an asset price. The dynamics are
More informationDistribution Regression with Minimax-Optimal Guarantee
Distribution Regression with Minimax-Optimal Guarantee (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationMaximum Mean Discrepancy
Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia
More informationKernel change-point detection
1,2 (joint work with Alain Celisse 3 & Zaïd Harchaoui 4 ) 1 Cnrs 2 École Normale Supérieure (Paris), DIENS, Équipe Sierra 3 Université Lille 1 4 INRIA Grenoble Workshop Kernel methods for big data, Lille,
More informationOn a Nonparametric Notion of Residual and its Applications
On a Nonparametric Notion of Residual and its Applications Bodhisattva Sen and Gábor Székely arxiv:1409.3886v1 [stat.me] 12 Sep 2014 Columbia University and National Science Foundation September 16, 2014
More informationMultivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma
Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional
More informationSTAT 135 Lab 3 Asymptotic MLE and the Method of Moments
STAT 135 Lab 3 Asymptotic MLE and the Method of Moments Rebecca Barter February 9, 2015 Maximum likelihood estimation (a reminder) Maximum likelihood estimation Suppose that we have a sample, X 1, X 2,...,
More informationUnderstanding Regressions with Observations Collected at High Frequency over Long Span
Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University
More informationA central limit theorem for an omnibus embedding of random dot product graphs
A central limit theorem for an omnibus embedding of random dot product graphs Keith Levin 1 with Avanti Athreya 2, Minh Tang 2, Vince Lyzinski 3 and Carey E. Priebe 2 1 University of Michigan, 2 Johns
More informationGoodness-of-Fit Considerations and Comparisons Example: Testing Consistency of Two Histograms
Goodness-of-Fit Considerations and Comparisons Example: Testing Consistency of Two Histograms Sometimes we have two histograms and are faced with the question: Are they consistent? That is, are our two
More informationP Values and Nuisance Parameters
P Values and Nuisance Parameters Luc Demortier The Rockefeller University PHYSTAT-LHC Workshop on Statistical Issues for LHC Physics CERN, Geneva, June 27 29, 2007 Definition and interpretation of p values;
More informationDual Space of L 1. C = {E P(I) : one of E or I \ E is countable}.
Dual Space of L 1 Note. The main theorem of this note is on page 5. The secondary theorem, describing the dual of L (µ) is on page 8. Let (X, M, µ) be a measure space. We consider the subsets of X which
More informationEstimation of the functional Weibull-tail coefficient
1/ 29 Estimation of the functional Weibull-tail coefficient Stéphane Girard Inria Grenoble Rhône-Alpes & LJK, France http://mistis.inrialpes.fr/people/girard/ June 2016 joint work with Laurent Gardes,
More informationEmpirical Likelihood Tests for High-dimensional Data
Empirical Likelihood Tests for High-dimensional Data Department of Statistics and Actuarial Science University of Waterloo, Canada ICSA - Canada Chapter 2013 Symposium Toronto, August 2-3, 2013 Based on
More informationThe high order moments method in endpoint estimation: an overview
1/ 33 The high order moments method in endpoint estimation: an overview Gilles STUPFLER (Aix Marseille Université) Joint work with Stéphane GIRARD (INRIA Rhône-Alpes) and Armelle GUILLOU (Université de
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationSéminaire INRA de Toulouse
Séminaire INRA de Toulouse Mélisande ALBERT Travaux de thèse, Université Nice Sophia Antipolis, LJAD Vendredi 17 mars Mélisande ALBERT Séminaire INRA Toulouse Vendredi 17 mars 1 / 28 Table of contents
More informationSTAT 4385 Topic 01: Introduction & Review
STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics
More informationSusceptible-Infective-Removed Epidemics and Erdős-Rényi random
Susceptible-Infective-Removed Epidemics and Erdős-Rényi random graphs MSR-Inria Joint Centre October 13, 2015 SIR epidemics: the Reed-Frost model Individuals i [n] when infected, attempt to infect all
More informationInference on distributions and quantiles using a finite-sample Dirichlet process
Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics
More informationA PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS
Statistica Sinica 20 2010, 365-378 A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Liang Peng Georgia Institute of Technology Abstract: Estimating tail dependence functions is important for applications
More informationEstimation of information-theoretic quantities
Estimation of information-theoretic quantities Liam Paninski Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/ liam liam@gatsby.ucl.ac.uk November 16, 2004 Some
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More information