Package msu. September 30, 2017

Similar documents
Package polypoly. R topics documented: May 27, 2017

Package gtheory. October 30, 2016

Package netdep. July 10, 2018

Package depth.plot. December 20, 2015

Package robustrao. August 14, 2017

Package Tcomp. June 6, 2018

Package covsep. May 6, 2018

Package RTransferEntropy

Package kgc. December 21, Version Date Title Koeppen-Geiger Climatic Zones

Package IndepTest. August 23, 2018

Package invgamma. May 7, 2017

Package OGI. December 20, 2017

Package MultisiteMediation

Package PeriodicTable

Package reinforcedpred

Package sscor. January 28, 2016

Package SK. May 16, 2018

Package ADPF. September 13, 2017

Package sbgcop. May 29, 2018

Package EnergyOnlineCPM

Package noncompliance

Package HIMA. November 8, 2017

Package Select. May 11, 2018

Package bigtime. November 9, 2017

Package dhsic. R topics documented: July 27, 2017

Package paramgui. March 15, 2018

Package esreg. August 22, 2017

Package SCEPtERbinary

Package multivariance

Package jmcm. November 25, 2017

Package FinCovRegularization

Package matrixnormal

Package cvequality. March 31, 2018

Package gma. September 19, 2017

Package stability. R topics documented: August 23, Type Package

Package NlinTS. September 10, 2018

Package xdcclarge. R topics documented: July 12, Type Package

Package asus. October 25, 2017

Package effectfusion

Package MPkn. May 7, 2018

Package RI2by2. October 21, 2016

Package cartogram. June 21, 2018

Package solrad. November 5, 2018

Package RSwissMaps. October 3, 2017

Package noise. July 29, 2016

Package nardl. R topics documented: May 7, Type Package

Package ShrinkCovMat

Package SimSCRPiecewise

Package platetools. R topics documented: June 25, Title Tools and Plots for Multi-Well Plates Version 0.1.1

Package horseshoe. November 8, 2016

Package RZigZag. April 30, 2018

Package BNN. February 2, 2018

Package supernova. February 23, 2018

Package varband. November 7, 2016

Package RATest. November 30, Type Package Title Randomization Tests

Package rnmf. February 20, 2015

Package thief. R topics documented: January 24, Version 0.3 Title Temporal Hierarchical Forecasting

Package My.stepwise. June 29, 2017

Package CPE. R topics documented: February 19, 2015

Package rgabriel. February 20, 2015

Package camsrad. November 30, 2016

Package severity. February 20, 2015

Package scpdsi. November 18, 2018

Package rdefra. March 20, 2017

Package OrdNor. February 28, 2019

Package SpatialVS. R topics documented: November 10, Title Spatial Variable Selection Version 1.1

Package r2glmm. August 5, 2017

Package SMFilter. December 12, 2018

Package chemcal. July 17, 2018

Package Delaporte. August 13, 2017

Package ltsbase. R topics documented: February 20, 2015

Package baystability

Package tspi. August 7, 2017

Package LIStest. February 19, 2015

Package goftest. R topics documented: April 3, Type Package

Package HMMcopula. December 1, 2018

Package BGGE. August 10, 2018

Package TimeSeries.OBeu

Package aspi. R topics documented: September 20, 2016

Package weathercan. July 5, 2018

Package AID. R topics documented: November 10, Type Package Title Box-Cox Power Transformation Version 2.3 Date Depends R (>= 2.15.

Package SPADAR. April 30, 2017

Package generalhoslem

Package owmr. January 14, 2017

Package ensemblepp. R topics documented: September 20, Title Ensemble Postprocessing Data Sets. Version

Package icmm. October 12, 2017

Package DNMF. June 9, 2015

Package GD. January 12, 2018

Package SEMModComp. R topics documented: February 19, Type Package Title Model Comparisons for SEM Version 1.0 Date Author Roy Levy

Package mpmcorrelogram

Package elhmc. R topics documented: July 4, Type Package

Package EMVS. April 24, 2018

Package MAPA. R topics documented: January 5, Type Package Title Multiple Aggregation Prediction Algorithm Version 2.0.4

Package SimCorMultRes

Package RcppSMC. March 18, 2018

Package KMgene. November 22, 2017

Package R1magic. August 29, 2016

Package robis. July 2, 2018

Package CEC. R topics documented: August 29, Title Cross-Entropy Clustering Version Date

Package MonoPoly. December 20, 2017

Transcription:

Package msu September 30, 2017 Title Multivariate Symmetric Uncertainty and Other Measurements Version 0.0.1 Estimators for multivariate symmetrical uncertainty based on the work of Gustavo Sosa et al. (2016) <arxiv:1709.08730>, total correlation, information gain and symmetrical uncertainty of categorical variables. Depends R (>= 3.4.1) Imports entropy (>= 1.2.1) License GPL-3 file LICENSE Encoding UTF-8 LazyData true RoxygenNote 6.0.1 Suggests testthat NeedsCompilation no Author Gustavo Sosa [aut], Elias Maciel [cre] Maintainer Elias Maciel <eliasmacielr@gmail.com> Repository CRAN Date/Publication 2017-09-30 16:26:00 UTC R topics documented: categorical_sample_size.................................. 2 information_gain...................................... 2 joint_shannon_entropy................................... 3 msu............................................. 4 multivar_joint_shannon_entropy.............................. 5 new_informative_variable................................. 6 new_variable........................................ 6 new_xor_variables..................................... 7 rel_freq........................................... 7 1

2 information_gain sample_size......................................... 8 shannon_entropy...................................... 9 symmetrical_uncertainty.................................. 9 total_correlation....................................... 10 Index 12 categorical_sample_size Estimate the sample size for a variable in function of its categories. Estimate the sample size for a variable in function of its categories. categorical_sample_size(categories, increment = 10) categories increment A vector containing the number of categories of each variable. A number as a constant to which the sample size is incremented as a product. The sample size for a categorical variable based on a ordered permutation heuristic approximation of its categories. information_gain Estimating information gain between two categorical variables. Information gain (also called mutual information) is a measure of the mutual dependence between two variables (see https://en.wikipedia.org/wiki/mutual_information). information_gain(x, y) IG(x, y) x y A factor representing a categorical variable. A factor representing a categorical variable.

joint_shannon_entropy 3 Information gain estimation based on Sannon entropy for variables x and y. information_gain(factor(c(0,1)), factor(c(1,0))) information_gain(factor(c(0,0,1,1)), factor(c(0,1,1,1))) information_gain(factor(c(0,0,1,1)), factor(c(0,1,0,1))) information_gain(c(0,1), c(1,0)) joint_shannon_entropy Estimation of the Joint Shannon entropy for two categorical variables. The joint Shannon entropy provides an estimation of the measure of uncertainty between two random variables (see https://en.wikipedia.org/wiki/joint_entropy). joint_shannon_entropy(x, y) joint_h(x, y) x y A factor as the represented categorical variable. A factor as the represented categorical variable. Joint Shannon entropy estimation for variables x and y. See Also shannon_entropy for the entropy for a single variable and multivar_joint_shannon_entropy for the entropy associated with more than two random variables. joint_shannon_entropy(factor(c(0,0,1,1)), factor(c(0,1,0,1))) joint_shannon_entropy(factor(c('a','b','c')), factor(c('c','b','a'))) joint_shannon_entropy(1) joint_shannon_entropy(c('a','b'), c('d','e'))

4 msu msu Estimating Multivariate Symmetrical Uncertainty. MSU is a generalization of symmetrical uncertainty (SU) where it is considered the interaction between two or more variables, whereas SU can only consider the interaction between two variables. For instance, consider a table with two variables X1 and X2 and a third variable, Y (the class of the case), that results from the logical XOR operator applied to X1 and X2 X1 X2 Y 0 0 0 0 1 1 1 0 1 1 1 0 For this case MSU(X1, X2, Y ) = 0.5. This, in contrast to the measurements obtained by SU of the variables X1 and X2 against Y, SU(X1, Y ) = 0 and SU(X2, Y ) = 0. msu(table_variables, table_class) table_variables A list of factors as categorical variables. table_class A factor representing the class of the case. Multivariate symmetrical uncertainty estimation for the variable set {table_variables, table_class}. The result is rounded to 7 decimal places. See Also symmetrical_uncertainty

multivar_joint_shannon_entropy 5 # completely predictable msu(list(factor(c(0,0,1,1))), factor(c(0,0,1,1))) # XOR msu(list(factor(c(0,0,1,1)), factor(c(0,1,0,1))), factor(c(0,1,1,0))) msu(c(factor(c(0,0,1,1)), factor(c(0,1,0,1))), factor(c(0,1,1,0))) msu(list(factor(c(0,0,1,1)), factor(c(0,1,0,1))), c(0,1,1,0)) multivar_joint_shannon_entropy Estimation of joint Shannon entropy for a set of categorical variables. The multivariate joint Shannon entropy provides an estimation of the measure of the uncertainty associated with a set of variables (see https://en.wikipedia.org/wiki/joint_entropy). multivar_joint_shannon_entropy(table_variables, table_class) multivar_joint_h(table_variables, table_class) table_variables A list of factors as categorical variables. table_class A factor representing the class of the case. Joint Shannon entropy estimation for the variable set table.variables, table.class. See Also shannon_entropy for the entropy for a single variable and joint_shannon_entropy for the entropy associated with two random variables. multivar_joint_shannon_entropy(list(factor(c(0,1)), factor(c(1,0))), factor(c(1,1)))

6 new_variable new_informative_variable Create an informative uniform categorical random variable. The sampling for the items of the created variable is done with replacement. new_informative_variable(variable_labels, variable_class, information_level = 1) variable_labels A factor as the labels for the new informative variable. variable_class A factor as the class of the variable. information_level A integer as the information level of the new variable. A factor that represents an informative uniform categorical random variable created using the Kononenko method. new_variable Create a uniform categorical random variable. The sampling for the items of the created variable is done with replacement. new_variable(elements, n) elements n A vector with the elements from which to choose to create the variable. An integer indicating the number of items to be contained in the variable. A factor that represents a uniform categorical variable.

new_xor_variables 7 new_variable(c(0,1), 4) new_variable(c('a','b','c'), 10) new_xor_variables Create a set of categorical variables using the logical XOR operator. Create a set of categorical variables using the logical XOR operator. new_xor_variables(n_variables = 2, n_instances = 1000, noise = 0) n_variables n_instances noise An integer as the number of variables to be created. It is the number of column variables of the table, an additional column is added as a result of the XOR operator over the instances. An integer as the number of instances to be created. It is the number of rows of the table. A float number as the noise level for the variables. A set of random variables constructed using the logical XOR operator. new_xor_variables(2, 4, 0) new_xor_variables(5, 10, 0.5) rel_freq Relative frequency of values of a categorical variable. Relative frequency of values of a categorical variable. rel_freq(variable)

8 sample_size variable A factor as a categorical variable Relative frecuency distribution table for the values in variable. rel_freq(factor(c(0,1))) rel_freq(factor(c('a','a','b'))) rel_freq(c(0,1)) sample_size Estimate the sample size for a categorical variable. Estimate the sample size for a categorical variable. sample_size(max, min = 1, z = 1.96, error = 0.05) max min z error A number as the maximum value of the possible categories. A number as the minimum value of the possible categories. A number as the confidence coefficient. Admissible sampling error. The sample size for a categorical variable based on a variance heuristic approximation.

shannon_entropy 9 shannon_entropy Estimation of Shannon entropy for a categorical variable. The Shannon entropy estimates the average minimum number of bits needed to encode a string of symbols, based on the frequency of the symbols (see http://www.bearcave.com/misl/misl_ tech/wavelets/compression/shannon.html). shannon_entropy(x) H(x) x A factor as the represented categorical variable. Shannon entropy estimation of the categorical variable. shannon_entropy(factor(c(1,0))) shannon_entropy(factor(c('a','b','c'))) shannon_entropy(1) shannon_entropy(c('a','b','c')) symmetrical_uncertainty Estimating Symmetrical Uncertainty of two categorical variables. Symmetrical uncertainty (SU) is the product of a normalization of the information gain (IG) with respect to entropy. SU(X,Y) is a value in the range [0,1], where SU(X, Y ) = 0 if X and Y are totally independent and SU(X, Y ) = 1 if X and Y are totally dependent.

10 total_correlation symmetrical_uncertainty(x, y) SU(x, y) x y A factor as the represented categorical variable. A factor as the represented categorical variable. Symmetrical uncertainty estimation based on Sannon entropy. The result is rounded to 7 decimal places. See Also msu # completely predictable symmetrical_uncertainty(factor(c(0,1,0,1)), factor(c(0,1,0,1))) # XOR factor variables symmetrical_uncertainty(factor(c(0,0,1,1)), factor(c(0,1,1,0))) symmetrical_uncertainty(factor(c(0,1,0,1)), factor(c(0,1,1,0))) symmetrical_uncertainty(c(0,1,0,1), c(0,1,1,0)) total_correlation Estimation of total correlation for a set of categorical random variables. Total Correlation is a generalization of information gain (IG) to measure the dependency of a set of categorical random variables (see https://en.wikipedia.org/wiki/total_correlation). total_correlation(table_variables, table_class) C(table_variables, table_class)

total_correlation 11 table_variables A list of factors as categorical variables. table_class A factor representing the class of the case. Total correlation estimation for the variable set table.variables, table.class. total_correlation(list(factor(c(0,1)), factor(c(1,0))), factor(c(0,0))) total_correlation(list(factor(c('a','b')), factor(c('a','b'))), factor(c('a','b'))) total_correlation(list(factor(c(0,1)), factor(c(1,0))), c(0,0)) total_correlation(c(factor(c(0,1)), factor(c(1,0))), c(0,0))

Index C (total_correlation), 10 categorical_sample_size, 2 H (shannon_entropy), 9 IG, 9, 10 IG (information_gain), 2 information_gain, 2 joint_h (joint_shannon_entropy), 3 joint_shannon_entropy, 3, 5 msu, 4, 10 multivar_joint_h (multivar_joint_shannon_entropy), 5 multivar_joint_shannon_entropy, 3, 5 new_informative_variable, 6 new_variable, 6 new_xor_variables, 7 rel_freq, 7 sample_size, 8 shannon_entropy, 3, 5, 9 SU, 4 SU (symmetrical_uncertainty), 9 symmetrical_uncertainty, 4, 9 total_correlation, 10 12