Package dhsic. R topics documented: July 27, 2017

Similar documents
The cramer Package. June 19, cramer.test Index 5

Package RI2by2. October 21, 2016

Package sscor. January 28, 2016

Package multivariance

Package covsep. May 6, 2018

Package severity. February 20, 2015

Package EnergyOnlineCPM

Package BayesNI. February 19, 2015

Package RTransferEntropy

Package jaccard. R topics documented: June 14, Type Package

Package SpatialNP. June 5, 2018

Package NlinTS. September 10, 2018

Package RZigZag. April 30, 2018

Package jmcm. November 25, 2017

Package NonpModelCheck

Package goftest. R topics documented: April 3, Type Package

Package CCP. R topics documented: December 17, Type Package. Title Significance Tests for Canonical Correlation Analysis (CCA) Version 0.

Package scio. February 20, 2015

Package EL. February 19, 2015

Package ShrinkCovMat

Package basad. November 20, 2017

Package mcmcse. July 4, 2017

Package RcppSMC. March 18, 2018

Package kpcalg. January 22, 2017

Package varband. November 7, 2016

Package EMVS. April 24, 2018

Package Grace. R topics documented: April 9, Type Package

Package esreg. August 22, 2017

Package gtheory. October 30, 2016

Package netcoh. May 2, 2016

Package rnmf. February 20, 2015

Package elhmc. R topics documented: July 4, Type Package

Package codadiags. February 19, 2015

Package MiRKATS. May 30, 2016

Package gma. September 19, 2017

Package bayeslm. R topics documented: June 18, Type Package

Package ppcc. June 28, 2017

Package metansue. July 11, 2018

Package unbalhaar. February 20, 2015

Package fwsim. October 5, 2018

Package invgamma. May 7, 2017

Package LIStest. February 19, 2015

Package generalhoslem

Package mpmcorrelogram

Package depth.plot. December 20, 2015

Package noncompliance

Package scpdsi. November 18, 2018

Package FDRSeg. September 20, 2017

Package BNN. February 2, 2018

Package rbvs. R topics documented: August 29, Type Package Title Ranking-Based Variable Selection Version 1.0.

Package SEMModComp. R topics documented: February 19, Type Package Title Model Comparisons for SEM Version 1.0 Date Author Roy Levy

Package darts. February 19, 2015

Package weightquant. R topics documented: December 30, Type Package

Hilbert Space Representations of Probability Distributions

Package RootsExtremaInflections

Package MultisiteMediation

Package KRLS. July 10, 2017

Package drgee. November 8, 2016

Package CovSel. September 25, 2013

Package HIMA. November 8, 2017

Package idmtpreg. February 27, 2018

Package monmlp. R topics documented: December 5, Type Package

Package RATest. November 30, Type Package Title Randomization Tests

Package polypoly. R topics documented: May 27, 2017

Package clogitboost. R topics documented: December 21, 2015

Package Delaporte. August 13, 2017

Package pssm. March 1, 2017

Package causalweight

Package HDtest. R topics documented: September 13, Type Package

Package CPE. R topics documented: February 19, 2015

Package gwqs. R topics documented: May 7, Type Package. Title Generalized Weighted Quantile Sum Regression. Version 1.1.0

Package bigtime. November 9, 2017

Package msu. September 30, 2017

Package milineage. October 20, 2017

Package pfa. July 4, 2016

Package esaddle. R topics documented: January 9, 2017

Package PVR. February 15, 2013

Package SpatPCA. R topics documented: February 20, Type Package

Package covtest. R topics documented:

Package penmsm. R topics documented: February 20, Type Package

Package FDRreg. August 29, 2016

Package CompQuadForm

Package sbgcop. May 29, 2018

Package CopulaRegression

Package sklarsomega. May 24, 2018

Package BSSasymp. R topics documented: September 12, Type Package

Package rgabriel. February 20, 2015

Package exuber. June 17, 2018

Package SimSCRPiecewise

Package msbp. December 13, 2018

Package TSPred. April 5, 2017

Package OrdNor. February 28, 2019

The SpatialNP Package

Package abundant. January 1, 2017

Package HarmonicRegression

Package locfdr. July 15, Index 5

Package pearson7. June 22, 2016

Package ForwardSearch

Package peacots. November 13, 2016

Package comparec. R topics documented: February 19, Type Package

Transcription:

Package dhsic July 27, 2017 Type Package Title Independence Testing via Hilbert Schmidt Independence Criterion Version 2.0 Date 2017-07-24 Author Niklas Pfister and Jonas Peters Maintainer Niklas Pfister <pfister@stat.math.ethz.ch> Description Contains an implementation of the d-variable Hilbert Schmidt independence criterion and several hypothesis tests based on it, as described in Pfister et al. (2017) <doi:10.1111/rssb.12235>. License GPL-3 LinkingTo Rcpp Imports stats, Rcpp NeedsCompilation yes Repository CRAN Date/Publication 2017-07-27 21:12:00 UTC R topics documented: dhsic-package....................................... 2 dhsic............................................. 3 dhsic.test.......................................... 5 Index 8 1

2 dhsic-package dhsic-package Independence Testing via Hilbert Schmidt Independence Criterion Description Details Contains an implementation of the d-variable Hilbert Schmidt independence criterion and several hypothesis tests based on it, as described in Pfister et al. (2017) <doi:10.1111/rssb.12235>. The DESCRIPTION file: Package: dhsic Type: Package Title: Independence Testing via Hilbert Schmidt Independence Criterion Version: 2.0 Date: 2017-07-24 Author: Niklas Pfister and Jonas Peters Maintainer: Niklas Pfister <pfister@stat.math.ethz.ch> Description: Contains an implementation of the d-variable Hilbert Schmidt independence criterion and several hypothesis te License: GPL-3 LinkingTo: Rcpp Imports: stats, Rcpp Index of help topics: dhsic-package dhsic dhsic.test Independence Testing via Hilbert Schmidt Independence Criterion d-variable Hilbert Schmidt independence criterion - dhsic Independence test based on dhsic Author(s) Niklas Pfister and Jonas Peters Maintainer: Niklas Pfister <pfister@stat.math.ethz.ch> References Gretton, A., K. Fukumizu, C. H. Teo, L. Song, B. Sch\"olkopf and A. J. Smola (2007). A kernel statistical test of independence. In Advances in Neural Information Processing Systems (pp. 585-592). Pfister, N., P. B\"uhlmann, B. Sch\"olkopf and J. Peters (2017). Kernel-based Tests for Joint Independence. To appear in the Journal of the Royal Statistical Society, Series B.

dhsic 3 dhsic d-variable Hilbert Schmidt independence criterion - dhsic Description Usage The d-variable Hilbert Schmidt independence criterion (dhsic) is a non-parametric measure of dependence between an arbitrary number of variables. In the large sample limit the value of dhsic is 0 if the variables are jointly independent and positive if there is a dependence. It is therefore able to detect any type of dependence given a sufficient amount of data. dhsic(x, Y, K, kernel = "gaussian", bandwidth = 1, matrix.input = FALSE) Arguments X Y K kernel bandwidth matrix.input either a list of at least two numeric matrices or a single numeric matrix. The rows of a matrix correspond to the observations of a variable. It is always required that there are an equal number of observations for all variables (i.e. all matrices have to have the same number of rows). If X is a single numeric matrix than one has to specify the second variable as Y or set matrix.input to "TRUE". See below for more details. a numeric matrix if X is also a numeric matrix and omitted if X is a list. a list of the gram matrices corresponding to each variable. If K specified the other inputs will have no effect on the computations. a vector of character strings specifying the kernels for each variable. There exist two pre-defined kernels: "gaussian" (Gaussian kernel with median heuristic as bandwidth) and "discrete" (discrete kernel). User defined kernels can also be used by passing the function name as a string, which will then be matched using match.fun. If the length of kernel is smaller than the number of variables the kernel specified in kernel[1] will be used for all variables. a numeric value specifying the size of the bandwidth used for the Gaussian kernel. Only used if kernel="gaussian.fixed". a boolean. If matrix.input is "TRUE" the input X is assumed to be a matrix in which the columns correspond to the variables. Details The d-variable Hilbert Schmidt independence criterion is a direct extension of the standard Hilbert Schmidt independence criterion (HSIC) from two variables to an arbitrary number of variables. It is 0 if and only if all the variables are jointly independent. This function computes an estimator of dhsic, which converges to the actual dhsic in the large sample limit. It is therefore possible to detect any type of dependence in the large sample limit. If X is a list with d matrices, the function computes dhsic for the corresponding d random vectors. If X is a matrix and matrix.input is "TRUE" the functions dhsic for the columns of X. If X is a

4 dhsic matrix and matrix.input is "FALSE" then Y needs to be a matrix, too; in this case, the function computes the dhsic (HSIC) for the corresponding two random vectors. For more details see the references. Value A list containing the following components: dhsic the value of the empirical estimator of dhsic time numeric vector containing computation times. time[1] is time to compute Gram matrix and time[2] is time to compute dhsic. bandwidth bandwidth used during computations. Only relevant if Gaussian kernel was used. Author(s) Niklas Pfister and Jonas Peters References Gretton, A., K. Fukumizu, C. H. Teo, L. Song, B. Sch\"olkopf and A. J. Smola (2007). A kernel statistical test of independence. In Advances in Neural Information Processing Systems (pp. 585-592). Pfister, N., P. B\"uhlmann, B. Sch\"olkopf and J. Peters (2017). Kernel-based Tests for Joint Independence. To appear in the Journal of the Royal Statistical Society, Series B. See Also In order to perform hypothesis tests based on dhsic use the function dhsic.test. Examples ### Three different input methods set.seed(0) x <- matrix(rnorm(200),ncol=2) y <- matrix(rbinom(100,30,0.1),ncol=1) # compute dhsic of x and y (x is taken as a single variable) dhsic(list(x,y),kernel=c("gaussian","discrete"))$dhsic dhsic(x,y,kernel=c("gaussian","discrete"))$dhsic # compute dhsic of x[,1], x[,2] and y dhsic(cbind(x,y),kernel=c("gaussian","discrete"), matrix.input=true)$dhsic ### Using a user-defined kernel (here: sigmoid kernel) set.seed(0) x <- matrix(rnorm(500),ncol=1) y <- x^2+0.02*matrix(rnorm(500),ncol=1) sigmoid <- function(x_1,x_2){ return(tanh(sum(x_1*x_2))) } dhsic(x,y,kernel="sigmoid")$dhsic

dhsic.test 5 dhsic.test Independence test based on dhsic Description Usage Hypothesis test for finding statistically significant evidence of dependence between several variables. Uses the d-variable Hilbert Schmidt independence criterion (dhsic) as measure of dependence. Several types of hypothesis tests are included. The null hypothesis (H_0) is that all variables are jointly independent. dhsic.test(x, Y, K, alpha = 0.05, method = "permutation", kernel = "gaussian", B = 1000, pairwise = FALSE, bandwidth = 1, matrix.input = FALSE) Arguments X Y K alpha method kernel B pairwise bandwidth either a list of at least two numeric matrices or a single numeric matrix. The rows of a matrix correspond to the observations of a variable. It is always required that there are an equal number of observations for all variables (i.e. all matrices have to have the same number of rows). If X is a single numeric matrix than one has to specify the second variable as Y or set matrix.input to "TRUE". See below for more details. a numeric matrix if X is also a numeric matrix and omitted if X is a list. a list of the gram matrices corresponding to each variable. If K the following inputs X, Y, kernel, pairwise, bandwidth and matrix.input will be ignored. a numeric value in (0,1) specifying the confidence level of the hypothesis test. a character string specifying the type of hypothesis test used. The available options are: "gamma" (gamma approximation based test), "permutation" (permutation test (slow)), "bootstrap" (bootstrap test (slow)) and "eigenvalue" (eigenvalue based test). a vector of character strings specifying the kernels for each variable. There exist two pre-defined kernels: "gaussian" (Gaussian kernel with median heuristic as bandwidth) and "discrete" (discrete kernel). User defined kernels can also be used by passing the function name as a string, which will then be matched using match.fun. If the length of kernel is smaller than the number of variables the kernel specified in kernel[1] will be used for all variables. an integer value specifying the number of Monte-Carlo iterations made in the permutation and bootstrap test. Only relevant if method is set to "permutation" or to "bootstrap". a logical value indicating whether one should use HSIC with pairwise comparisons instead of dhsic. Can only be true if there are more than two variables. a numeric value specifying the size of the bandwidth used for the Gaussian kernel. Only used if kernel="gaussian.fixed".

6 dhsic.test matrix.input a boolean. If matrix.input is "TRUE" the input X is assumed to be a matrix in which the columns correspond to the variables. Details Value The d-variable Hilbert Schmidt independence criterion is a direct extension of the standard Hilbert Schmidt independence criterion (HSIC) from two variables to an arbitrary number of variables. It is 0 if and only if the variables are jointly independent. 4 different statistical hypothesis tests are implemented all with null hypothesis (H_0: X[[1]],...,X[[d]] are jointly independent) and alternative hypothesis (H_A: X[[1]],...,X[[d]] are not jointly independent): 1. Permutation test for dhsic: exact level, slow 2. Bootstrap test for dhsic: pointwise asymptotic level and pointwise consistent, slow 3. Gamma approximation based test for dhsic: only approximate, fast 4. Eigenvalue based test for dhsic: pointwise asymptotic level and pointwise consistent, medium The null hypothesis is rejected if statistic is strictly greater than crit.value. If X is a list with d matrices, the function tests for joint independence of the corresponding d random vectors. If X is a matrix and matrix.input is "TRUE" the functions tests the independence between the columns of X. If X is a matrix and matrix.input is "FALSE" then Y needs to be a matrix, too; in this case, the function tests the (pairwise) independence between the corresponding two random vectors. For more details see the references. A list containing the following components: statistic crit.value the value of the test statistic critical value of the hypothesis test. The null hypothesis (H_0: joint independence) is rejected if statistic is greater than crit.value. p.value p-value of the hypothesis test, i.e. the probability that a random version of the test statistic is greater than statistic under the calculated null hypothesis (H_0: joint independence) based on the data. time numeric vector containing computation times. time[1] is time to compute Gram matrix, time[2] is time to compute dhsic and time[3] is the time to compute crit.value and p.value. bandwidth Author(s) Niklas Pfister and Jonas Peters References bandwidth used during the computation. Only relevant if Gaussian kernel was used. Gretton, A., K. Fukumizu, C. H. Teo, L. Song, B. Sch\"olkopf and A. J. Smola (2007). A kernel statistical test of independence. In Advances in Neural Information Processing Systems (pp. 585-592).

dhsic.test 7 Pfister, N., P. B\"uhlmann, B. Sch\"olkopf and J. Peters (2017). Kernel-based Tests for Joint Independence. To appear in the Journal of the Royal Statistical Society, Series B. See Also In order to only compute the test statistic without p-values, use the function dhsic. Examples ### pairwise independent but not jointly independent (pairwise HSIC vs dhsic) set.seed(0) x <- matrix(rbinom(100,1,0.5),ncol=1) y <- matrix(rbinom(100,1,0.5),ncol=1) z <- matrix(as.numeric((x+y)==1)+rnorm(100),ncol=1) X <- list(x,y,z) dhsic.test(x, method="permutation", kernel=c("discrete", "discrete", "gaussian"), pairwise=true, B=1000)$p.value dhsic.test(x, method="permutation", kernel=c("discrete", "discrete", "gaussian"), pairwise=false, B=1000)$p.value

Index Topic htest dhsic.test, 5 Topic nonparametric dhsic, 3 dhsic.test, 5 Topic package dhsic-package, 2 dhsic (dhsic-package), 2 dhsic, 3, 7 dhsic-package, 2 dhsic.test, 4, 5 match.fun, 3, 5 8