Package IndepTest. August 23, 2018

Similar documents
Package depth.plot. December 20, 2015

Package eel. September 1, 2015

Package gtheory. October 30, 2016

Package covsep. May 6, 2018

Package NonpModelCheck

Package ShrinkCovMat

Package msu. September 30, 2017

Package EnergyOnlineCPM

Package NlinTS. September 10, 2018

Package BNN. February 2, 2018

Package monmlp. R topics documented: December 5, Type Package

Package scio. February 20, 2015

Package unbalhaar. February 20, 2015

Package BayesNI. February 19, 2015

Package polypoly. R topics documented: May 27, 2017

Package rbvs. R topics documented: August 29, Type Package Title Ranking-Based Variable Selection Version 1.0.

Package ebal. R topics documented: February 19, Type Package

Package esaddle. R topics documented: January 9, 2017

Package goftest. R topics documented: April 3, Type Package

Package basad. November 20, 2017

Package QuantifQuantile

Package sscor. January 28, 2016

Package AID. R topics documented: November 10, Type Package Title Box-Cox Power Transformation Version 2.3 Date Depends R (>= 2.15.

Package esreg. August 22, 2017

Package Grace. R topics documented: April 9, Type Package

Package bmem. February 15, 2013

Package RTransferEntropy

Package IGG. R topics documented: April 9, 2018

Package EMVS. April 24, 2018

Package varband. November 7, 2016

Package ltsbase. R topics documented: February 20, 2015

Package MultisiteMediation

Package bigtime. November 9, 2017

Package horseshoe. November 8, 2016

Package gma. September 19, 2017

Package RI2by2. October 21, 2016

Package SpatialNP. June 5, 2018

Package threeboost. February 20, 2015

Package milineage. October 20, 2017

Package RATest. November 30, Type Package Title Randomization Tests

Package darts. February 19, 2015

Package abundant. January 1, 2017

Package spatial.gev.bma

Package robustrao. August 14, 2017

Package invgamma. May 7, 2017

Package sbgcop. May 29, 2018

Package mcmcse. July 4, 2017

Package OGI. December 20, 2017

Package MicroMacroMultilevel

Package netdep. July 10, 2018

Package msir. R topics documented: April 7, Type Package Version Date Title Model-Based Sliced Inverse Regression

Package dhsic. R topics documented: July 27, 2017

Package leiv. R topics documented: February 20, Version Type Package

Package TSF. July 15, 2017

Package sodavis. May 13, 2018

Package blme. August 29, 2016

Package coop. November 14, 2017

Package stability. R topics documented: August 23, Type Package

Package symmoments. R topics documented: February 20, Type Package

Package thief. R topics documented: January 24, Version 0.3 Title Temporal Hierarchical Forecasting

Package LIStest. February 19, 2015

Package LPTime. March 3, 2015

Package face. January 19, 2018

Package CovSel. September 25, 2013

Package pfa. July 4, 2016

Package rgabriel. February 20, 2015

Package jaccard. R topics documented: June 14, Type Package

Package cvequality. March 31, 2018

Package FinCovRegularization

Package bivariate. December 10, 2018

Package asus. October 25, 2017

Package SimCorMultRes

Package RootsExtremaInflections

Package reinforcedpred

Package SK. May 16, 2018

Package decon. February 19, 2015

Package breakaway. R topics documented: March 30, 2016

Package bayeslm. R topics documented: June 18, Type Package

Package multivariance

Package flexcwm. R topics documented: July 11, Type Package. Title Flexible Cluster-Weighted Modeling. Version 1.0.

Package SimSCRPiecewise

Package comparec. R topics documented: February 19, Type Package

Package RcppSMC. March 18, 2018

Package jmcm. November 25, 2017

Package FDRSeg. September 20, 2017

Package spcov. R topics documented: February 20, 2015

Package mpmcorrelogram

Package Bhat. R topics documented: February 19, Title General likelihood exploration. Version

Package factorqr. R topics documented: February 19, 2015

Package IsingFit. September 7, 2016

Package GD. January 12, 2018

Package flexcwm. R topics documented: July 2, Type Package. Title Flexible Cluster-Weighted Modeling. Version 1.1.

Package SHIP. February 19, 2015

Package denseflmm. R topics documented: March 17, 2017

Package mmm. R topics documented: February 20, Type Package

Package Delaporte. August 13, 2017

Package WaveletArima

Package Blendstat. February 21, 2018

Package RZigZag. April 30, 2018

Package CoxRidge. February 27, 2015

Transcription:

Type Package Package IndepTest August 23, 2018 Title Nonparametric Independence Tests Based on Entropy Estimation Version 0.2.0 Date 2018-08-23 Author Thomas B. Berrett <t.berrett@statslab.cam.ac.uk> [aut], Daniel J. Grose <dan.grose@lancaster.ac.uk> [cre,ctb], R worth <r.samworth@statslab.cam.ac.uk> [aut] Maintainer Daniel Grose <dan.grose@lancaster.ac.uk> Implementations of the weighted Kozachenko-Leonenko entropy estimator and independence tests based on this estimator, (Kozachenko and Leonenko (1987) <http://mi.mathnet.ru/eng/ppi797>). Also includes a goodness-of-fit test for a linear model which is an independence test between covariates and errors. Depends FNN,mvtnorm Imports Rdpack RdMacros Rdpack License GPL RoygenNote 6.0.1 NeedsCompilation no Repository CRAN Date/Publication 2018-08-23 14:04:28 UTC R topics documented: KLentropy.......................................... 2 L2OptW........................................... 3 MINTauto.......................................... 4 MINTav........................................... 5 MINTknown........................................ 6 MINTperm......................................... 7 MINTregression....................................... 8 Inde 10 1

2 KLentropy KLentropy KLentropy Calculates the (weighted) Kozachenko Leonenko entropy estimator studied in Berrett, Samworth and Yuan (2018), which is based on the k-nearest neighbour distances of the sample. KLentropy(, k, weights = FALSE, stderror = FALSE) k weights stderror The n d data matri. The tuning parameter that gives the maimum number of neighbours that will be considered by the estimator. Specifies whether a weighted or unweighted estimator is used. If a weighted estimator is to be used then the default (weights=true) results in the weights being calculated by L2OptW, otherwise the user may specify their own weights. Specifies whether an estimate of the standard error of the weighted estimate is calculated. The calculation is done using an unweighted version of the variance estimator described on page 7 of Berrett, Samworth and Yuan (2018). The first element of the list is the unweighted estimator for the value of 1 up to the user-specified k. The second element of the list is the weighted estimator, obtained by taking the inner product between the first element of the list and the weight vector. If stderror=true the third element of the list is an estimate of the standard error of the weighted estimate. Berrett, T. B., Samworth, R. J. and Yuan, M. (2018). Efficient multivariate entropy estimation via k-nearest neighbour distances. Annals of Statistics, to appear. n=1000; =rnorm(n); KLentropy(,30,stderror=TRUE) # The true value is 0.5*log(2*pi*ep(1)) = 1.42. n=5000; =matri(rnorm(4*n),ncol=4) # The true value is 2*log(2*pi*ep(1)) = 5.68 KLentropy(,30,weights=FALSE) # Unweighted estimator KLentropy(,30,weights=TRUE) # Weights chosen by L2OptW w=runif(30); w=w/sum(w); KLentropy(,30,weights=w) # User-specified weights

L2OptW 3 L2OptW L2OptW Calculates a weight vector to be used for the weighted Kozachenko Leonenko estimator. The weight vector has minimum L 2 norm subject to the linear and sum-to-one constraints of (2) in Berrett, Samworth and Yuan (2018). L2OptW(k, d) k d The tuning parameter that gives the number of neighbours that will be considered by the weighted Kozachenko Leonenko estimator. The dimension of the data. The weight vector that is the solution of the optimisation problem. Berrett, T. B., Samworth, R. J. and Yuan, M. (2018). Efficient multivariate entropy estimation via k-nearest neighbour distances. Annals of Statistics, to appear. # When d < 4 there are no linear constraints and the returned vector is (0,0,...,0,1). L2OptW(100,3) w=l2optw(100,4) plot(w,type="l") w=l2optw(100,8); # For each multiple of 4 that d increases an etra constraint is added. plot(w,type="l") w=l2optw(100,12) plot(w, type="l") # This can be seen in the shape of the plot

4 MINTauto MINTauto MINTauto Performs an independence test without knowledge of either marginal distribution using permutations and using a data-driven choice of k. MINTauto(, y, kma, B1 = 1000, B2 = 1000) y kma B1 B2 The n d X data matri of the X values. The response vector of length n d Y data matri of the Y values. The maimum value of k to be considered for estimation of the joint entropy H(X, Y ). The number of repetitions used when choosing k, set to 1000 by default. The number of permutations to use for the final test, set at 1000 by default. The p-value corresponding the independence test carried out and the value of k used. Berrett, T. B. and Samworth R. J. (2017). Nonparametric independence testing via mutual information. ArXiv e-prints. 1711.06642. # Independent univariate normal data =rnorm(1000); y=rnorm(1000); MINTauto(,y,kma=200,B1=100,B2=100) # Dependent univariate normal data library(mvtnorm) data=rmvnorm(1000,sigma=matri(c(1,0.5,0.5,1),ncol=2)) MINTauto(data[,1],data[,2],kma=200,B1=100,B2=100) # Dependent multivariate normal data Sigma=matri(c(1,0,0,0,0,1,0,0,0,0,1,0.5,0,0,0.5,1),ncol=4) data=rmvnorm(1000,sigma=sigma) MINTauto(data[,1:3],data[,4],kma=50,B1=100,B2=100)

MINTav 5 MINTav MINTav Performs an independence test without knowledge of either marginal distribution using permutations and averaging over a range of values of k. MINTav(, y, K, B = 1000) y K B The n d X data matri of the X values. The n d Y data matri of the Y values. The vector of values of k to be considered for estimation of the joint entropy H(X, Y ). The number of permutations to use for the test, set at 1000 by default. The p-value corresponding the independence test carried out. Berrett, T. B. and Samworth R. J. (2017). Nonparametric independence testing via mutual information. ArXiv e-prints. 1711.06642. # Independent univariate normal data =rnorm(1000); y=rnorm(1000); MINTav(,y,K=1:200,B=100) # Dependent univariate normal data library(mvtnorm); data=rmvnorm(1000,sigma=matri(c(1,0.5,0.5,1),ncol=2)) MINTav(data[,1],data[,2],K=1:200,B=100) # Dependent multivariate normal data Sigma=matri(c(1,0,0,0,0,1,0,0,0,0,1,0.5,0,0,0.5,1),ncol=4); data=rmvnorm(1000,sigma=sigma) MINTav(data[,1:3],data[,4],K=1:50,B=100)

6 MINTknown MINTknown MINTknown Performs an independence test when it is assumed that the marginal distribution of Y is known and can be simulated from. MINTknown(, y, k, ky, w = FALSE, wy = FALSE, y0) y The n d X data matri of X values. The n d Y data matri of Y values. k The value of k to be used for estimation of the joint entropy H(X, Y ). ky The value of k to be used for estimation of the marginal entropy H(Y ). w wy y0 The weight vector to used for estimation of the joint entropy H(X, Y ), with the same options as for the KLentropy function. The weight vector to used for estimation of the marginal entropy H(Y ), with the same options as for the KLentropy function. The data matri of simulated Y values. The p-value corresponding the independence test carried out. Berrett, T. B. and Samworth R. J. (2017). Nonparametric independence testing via mutual information. ArXiv e-prints. 1711.06642. library(mvtnorm) =rnorm(1000); y=rnorm(1000); # Independent univariate normal data MINTknown(,y,k=20,ky=30,y0=rnorm(100000)) library(mvtnorm) # Dependent univariate normal data data=rmvnorm(1000,sigma=matri(c(1,0.5,0.5,1),ncol=2)) # Dependent multivariate normal data MINTknown(data[,1],data[,2],k=20,ky=30,y0=rnorm(100000)) Sigma=matri(c(1,0,0,0,0,1,0,0,0,0,1,0.5,0,0,0.5,1),ncol=4) data=rmvnorm(1000,sigma=sigma) MINTknown(data[,1:3],data[,4],k=20,ky=30,w=TRUE,wy=FALSE,y0=rnorm(100000))

MINTperm 7 MINTperm MINTknown Performs an independence test without knowledge of either marginal distribution using permutations. MINTperm(, y, k, w = FALSE, B = 1000) y The n d X data matri of X values. The n d Y data matri of Y values. k The value of k to be used for estimation of the joint entropy H(X, Y ). w B The weight vector to used for estimation of the joint entropy H(X, Y ), with the same options as for the KLentropy function. The number of permutations to use, set at 1000 by default. The p-value corresponding the independence test carried out. Berrett, T. B. and Samworth R. J. (2017). Nonparametric independence testing via mutual information. ArXiv e-prints. 1711.06642. # Independent univariate normal data =rnorm(1000); y=rnorm(1000) MINTperm(,y,k=20,B=100) # Dependent univariate normal data library(mvtnorm) data=rmvnorm(1000,sigma=matri(c(1,0.5,0.5,1),ncol=2)) MINTperm(data[,1],data[,2],k=20,B=100) # Dependent multivariate normal data Sigma=matri(c(1,0,0,0,0,1,0,0,0,0,1,0.5,0,0,0.5,1),ncol=4) data=rmvnorm(1000,sigma=sigma) MINTperm(data[,1:3],data[,4],k=20,w=TRUE,B=100)

8 MINTregression MINTregression MINTregression Performs a goodness-of-fit test of a linear model by testing whether the errors are independent of the covariates. MINTregression(, y, k, keps, w = FALSE, eps) The n p design matri. y The response vector of length n. k keps w eps The value of k to be used for estimation of the joint entropy H(X, ɛ). The value of k to be used for estimation of the marginal entropy H(ɛ). The weight vector to be used for estimation of the joint entropy H(X, ɛ), with the same options as for the KLentropy function. A vector of null errors which should have the same distribution as the errors are assumed to have in the linear model. The p-value corresponding the independence test carried out. Berrett, T. B. and Samworth R. J. (2017). Nonparametric independence testing via mutual information. ArXiv e-prints. 1711.06642. # Correctly specified linear model =runif(100,min=-1.5,ma=1.5); y=+rnorm(100) plot(lm(y~),which=1) MINTregression(,y,5,10,w=FALSE,rnorm(10000)) # Misspecified mean linear model =runif(100,min=-1.5,ma=1.5); y=^3+rnorm(100) plot(lm(y~),which=1) MINTregression(,y,5,10,w=FALSE,rnorm(10000)) # Heteroscedastic linear model =runif(100,min=-1.5,ma=1.5); y=+*rnorm(100); plot(lm(y~),which=1) MINTregression(,y,5,10,w=FALSE,rnorm(10000))

MINTregression 9 # Multivariate misspecified mean linear model =matri(runif(1500,min=-1.5,ma=1.5),ncol=3) y=[,1]^3+0.3*[,2]-0.3*[,3]+rnorm(500) plot(lm(y~),which=1) MINTregression(,y,30,50,w=TRUE,rnorm(50000))

Inde KLentropy, 2, 6 8 L2OptW, 2, 3 MINTauto, 4 MINTav, 5 MINTknown, 6 MINTperm, 7 MINTregression, 8 10