Package orclus. R topics documented: March 17, Version Date

Similar documents
Package depth.plot. December 20, 2015

Package gtheory. October 30, 2016

Package rnmf. February 20, 2015

Package Grace. R topics documented: April 9, Type Package

Package mpmcorrelogram

Package ShrinkCovMat

Package leiv. R topics documented: February 20, Version Type Package

Package monmlp. R topics documented: December 5, Type Package

Package clogitboost. R topics documented: December 21, 2015

Package face. January 19, 2018

Package funrar. March 5, 2018

Package basad. November 20, 2017

Package gma. September 19, 2017

Package sbgcop. May 29, 2018

Package clustergeneration

Package SimCorMultRes

Package SpatialNP. June 5, 2018

Package denseflmm. R topics documented: March 17, 2017

Package BayesNI. February 19, 2015

Package multivariance

Package abundant. January 1, 2017

Package sscor. January 28, 2016

Package bayeslm. R topics documented: June 18, Type Package

Package noncompliance

Package CEC. R topics documented: August 29, Title Cross-Entropy Clustering Version Date

Package EMVS. April 24, 2018

Package mixture. R topics documented: February 13, 2018

Package testassay. November 29, 2016

Package NonpModelCheck

Package darts. February 19, 2015

Package msbp. December 13, 2018

Package polypoly. R topics documented: May 27, 2017

Package fwsim. October 5, 2018

Package BNN. February 2, 2018

Package msir. R topics documented: April 7, Type Package Version Date Title Model-Based Sliced Inverse Regression

Package CPE. R topics documented: February 19, 2015

The ICS Package. May 4, 2007

Package SK. May 16, 2018

Package RootsExtremaInflections

Package ltsbase. R topics documented: February 20, 2015

Package EnergyOnlineCPM

Package AID. R topics documented: November 10, Type Package Title Box-Cox Power Transformation Version 2.3 Date Depends R (>= 2.15.

Package covsep. May 6, 2018

Package drgee. November 8, 2016

Package OGI. December 20, 2017

Package magma. February 15, 2013

Package SPreFuGED. July 29, 2016

Package misctools. November 25, 2016

Package factorqr. R topics documented: February 19, 2015

Package goftest. R topics documented: April 3, Type Package

Package neuralnet. August 16, 2016

Package OrdNor. February 28, 2019

Package dhsic. R topics documented: July 27, 2017

Package RI2by2. October 21, 2016

Package survidinri. February 20, 2015

Package covtest. R topics documented:

Package HIMA. November 8, 2017

Package pearson7. June 22, 2016

Package invgamma. May 7, 2017

Package flexcwm. R topics documented: July 11, Type Package. Title Flexible Cluster-Weighted Modeling. Version 1.0.

Package rgabriel. February 20, 2015

Package ensemblepp. R topics documented: September 20, Title Ensemble Postprocessing Data Sets. Version

Package rbvs. R topics documented: August 29, Type Package Title Ranking-Based Variable Selection Version 1.0.

Package hierdiversity

Package MultisiteMediation

Package KRLS. July 10, 2017

Package Tcomp. June 6, 2018

Package unbalhaar. February 20, 2015

Package flexcwm. R topics documented: July 2, Type Package. Title Flexible Cluster-Weighted Modeling. Version 1.1.

Package emg. R topics documented: May 17, 2018

Package pssm. March 1, 2017

Package beam. May 3, 2018

Package epr. November 16, 2017

Package ForwardSearch

Package ICGOR. January 13, 2017

Package EMMIXmcfa. October 18, 2017

Package PVR. February 15, 2013

Package hot.deck. January 4, 2016

Package varband. November 7, 2016

Package robustrao. August 14, 2017

Package flora. R topics documented: August 29, Type Package. Title flora: taxonomical information on flowering species that occur in Brazil

Package idmtpreg. February 27, 2018

Package MonoPoly. December 20, 2017

Package threeboost. February 20, 2015

Package EDR. September 14, 2016

Package milineage. October 20, 2017

Package DirichletMultinomial

Package elhmc. R topics documented: July 4, Type Package

Package DNMF. June 9, 2015

Package SEMModComp. R topics documented: February 19, Type Package Title Model Comparisons for SEM Version 1.0 Date Author Roy Levy

Package GORCure. January 13, 2017

Package riv. R topics documented: May 24, Title Robust Instrumental Variables Estimator. Version Date

Package metafuse. October 17, 2016

Package IndepTest. August 23, 2018

Package scio. February 20, 2015

Package CovSel. September 25, 2013

Package SpatPCA. R topics documented: February 20, Type Package

Package esaddle. R topics documented: January 9, 2017

Package ProDenICA. February 19, 2015

Package horseshoe. November 8, 2016

Transcription:

Version 0.2-6 Date 2018-03-17 Package orclus March 17, 2018 Title Subspace Clustering Based on Arbitrarily Oriented Projected Cluster Generation Author Gero Szepannek Maintainer Gero Szepannek <gero.szepannek@web.de> Description Functions to perform subspace clustering and classification. License GPL (>= 2) NeedsCompilation no Repository CRAN Date/Publication 2018-03-17 22:55:34 UTC R topics documented: orclass............................................ 1 orclus............................................ 4 predict.orclass........................................ 7 predict.orclus........................................ 9 Index 11 orclass Subspace clustering based local classification using ORCLUS. Description Function to perform local classification where the subclasses are concentrated in different subspaces of the data. 1

2 orclass Usage orclass(x,...) ## Default S3 method: orclass(x, grouping, k, l, k0, a = 0.5, prior = NULL, inner.loops = 1, predict.train = "nearest", verbose = TRUE,...) ## S3 method for class 'formula' orclass(formula, data = NULL,...) Arguments x grouping formula data k l k0 a Details Value prior inner.loops predict.train verbose A matrix or data frame containing the explanatory variables. The method is restricted to numerical data. A factor specifying the class for each observation. A formula of the form grouping ~ x1 + x2 +... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators. Data frame from which variables specified in formula are to be taken. Prespecifies the final number of clusters. Prespecifies the dimension of the final cluster-specific subspaces (equal for all clusters). Initial number of clusters (that are computed in the entire data space). Must be greater than k. The number of clusters is iteratively decreased by factor a until the final number of k clusters is reached. Prespecified factor for the cluster number reduction in each iteration step of the algorithm. Argument for optional specification of class prior probabilities if different from the relative class frequencies. Number of repetitive iterations (i.e. recomputation of clustering and clusterspecific subspaces) while the number of clusters and the subspace dimension are kept constant. Character pecifying whether prediction of training data should be pursued. If "nearest" the class distribution in orclus cluster assignment is used for classification.... Currently not used. Logical indicating whether the iteration process sould be displayed. For each cluster the class distribution is computed. Returns an object of class orclass. orclus.res Object of class orclus containing the resulting clusters.

orclass 3 cluster.posteriors Matrix of clusterwise class posterior probabilities where clusters are rows and classes are coloumns. cluster.priors Vector of relative cluster frequencies weighted by class priors. purity prior predict.train orclass.call Author(s) Gero Szepannek References Statistics indicating the discriminability of the identified clusters. Vector of class prior probabilities. Prediction of training data if specified. (Matched) function call. Aggarwal, C. and Yu, P. (2000): Finding generalized projected clusters in high dimensional spaces, Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 70-81. See Also predict.orclass, orclus, predict.orclus Examples # definition of a function for parameterized data simulation sim.orclus <- function(k = 3, nk = 100, d = 10, l = 4, sd.cl = 0.05, sd.rest = 1, locshift = 1){ ### input parameters for data generation # k number of clusters # nk observations per cluster # d original dimension of the data # l subspace dimension where the clusters are concentrated # sd.cl (within cluster subspace) standard deviations for data generation # sd.rest standard deviations in the remaining space # locshift parameter of a uniform distribution to sample different cluster means x <- NULL for(i in 1:k){ # cluster centers apts <- locshift*matrix(runif(l*k), ncol = l) # sample points in original space xi.original <- cbind(matrix(rnorm(nk * l, sd = sd.cl), ncol=l) + matrix(rep(apts[i,], nk), ncol = l, byrow = TRUE), matrix(rnorm(nk * (d-l), sd = sd.rest), ncol = (d-l))) # subspace generation sym.mat <- matrix(nrow=d, ncol=d) for(m in 1:d){ for(n in 1:m){ sym.mat[m,n] <- sym.mat[n,m] <- runif(1)

4 orclus subspace <- eigen(sym.mat)$vectors # transformation xi.transformed <- xi.original %*% subspace x <- rbind(x, xi.transformed) clids <- rep(1:k, each = nk) result <- list(x = x, cluster = clids) return(result) # simulate data of 2 classes where class 1 consists of 2 subclasses simdata <- sim.orclus(k = 3, nk = 200, d = 15, l = 4, sd.cl = 0.05, sd.rest = 1, locshift = 1) x <- simdata$x y <- c(rep(1,400), rep(2,200)) res <- orclass(x, y, k = 3, l = 4, k0 = 15, a = 0.75) res # compare results table(res$predict.train$class, y) orclus Arbitrarily ORiented projected CLUSter generation Description Usage Function to perform subspace clustering where the clusters are concentrated in different cluster specific subspaces of the data. orclus(x,...) ## Default S3 method: orclus(x, k, l, k0, a = 0.5, inner.loops = 1, verbose = TRUE,...) Arguments x k l k0 A matrix or data frame containing the explanatory variables. The method is restricted to numerical data. Prespecifies the final number of clusters. Prespecifies the dimension of the final cluster-specific subspaces (equal for all clusters). Initial number of clusters (that are computed in the entire data space). Must be greater than k. The number of clusters is iteratively decreased by factor a until the final number of k clusters is reached.

orclus 5 a Details Value inner.loops verbose Prespecified factor for the cluster number reduction in each iteration step of the algorithm. Number of repetitive iterations (i.e. recomputation of clustering and clusterspecific subspaces) while the number of clusters and the subspace dimension are kept constant.... Currently not used. Logical indicating whether the iteration process sould be displayed. The function performs ORCLUS subspace clustering (Aggarwal and Yu, 2000). Simultaneously both cluster assignments as well as cluster specific subspaces are computed. Cluster assignments have minimal euclidean distance from the cluster centers in the corresponding subspaces. As an extension to the originally proposed algorithm initialization in the full data space is done by calling kmeans for k0 clusters. Further, by inner.loops a number of repetitions during the iteration process for each number of clusters and subspace dimension can be specified. An outlier option has not been implemented. Even though increasing the initialzation parameter k0 most strongly effects the computation time it should be chosen as large as possible (at least several times greater then k). Returns an object of class orclus. Its structure is similar to objects resulting from calling kmeans. cluster centers size Returns the final cluster labels. A matrix where each row corresponds to a cluster center (in the original space). The final number of observations in each cluster. subspaces List of matrices for projection of the data onto the cluster-specific supspaces by post-multiplication. subspace.dimension Dimension of the final subspaces. within.projenss Corresponds to withinss of kmeans objects: projected within cluster energies for each cluster. sparsity.coefficient Sparsity coefficient of the clustering result. If its value is close to 1 the subspace dimension may have been chosen too large. A small value close to 0 can be interpreted as a hint that a strong cluster structure has been found. orclus.call Author(s) Gero Szepannek References (Matched) function call. Aggarwal, C. and Yu, P. (2000): Finding generalized projected clusters in high dimensional spaces, Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 70-81.

6 orclus See Also predict.orclus Examples # generate simple artificial example of two clusters clus1.v1 <- runif(100) clus2.v1 <- runif(100) xample <- rbind(cbind(clus1.v1, 0.5 - clus1.v1), cbind(clus2.v1, -0.5 + clus2.v1)) plot(xample, col=rep(1:2, each=100)) # try standard kmeans clustering kmeans.res <- kmeans(xample, 2) plot(xample, col = kmeans.res$cluster) # use orclus instead orclus.res <- orclus(x = xample, k = 2, l = 1, k0 = 8, a = 0.5) plot(xample, col = orclus.res$cluster) # show data in cluster-specific subspaces par(mfrow=c(1,2)) for(i in 1:length(orclus.res$size)) plot(xample %*% orclus.res$subspaces[[i]], col = orclus.res$cluster, ylab = paste("identified subspace for cluster",i)) ### second 'more multivariate' example to play with... # definition of a function for parameterized data simulation sim.orclus <- function(k = 3, nk = 100, d = 10, l = 4, sd.cl = 0.05, sd.rest = 1, locshift = 1){ ### input parameters for data generation # k number of clusters # nk observations per cluster # d original dimension of the data # l subspace dimension where the clusters are concentrated # sd.cl (within cluster subspace) standard deviations for data generation # sd.rest standard deviations in the remaining space # locshift parameter of a uniform distribution to sample different cluster means x <- NULL for(i in 1:k){ # cluster centers apts <- locshift*matrix(runif(l*k), ncol = l) # sample points in original space xi.original <- cbind(matrix(rnorm(nk * l, sd = sd.cl), ncol=l) + matrix(rep(apts[i,], nk), ncol = l, byrow = TRUE), matrix(rnorm(nk * (d-l), sd = sd.rest), ncol = (d-l))) # subspace generation sym.mat <- matrix(nrow=d, ncol=d) for(m in 1:d){ for(n in 1:m){ sym.mat[m,n] <- sym.mat[n,m] <- runif(1)

predict.orclass 7 subspace <- eigen(sym.mat)$vectors # transformation xi.transformed <- xi.original %*% subspace x <- rbind(x, xi.transformed) clids <- rep(1:k, each = nk) result <- list(x = x, cluster = clids) return(result) # simulate data, you can play with different parameterizations... simdata <- sim.orclus(k = 3, nk = 200, d = 15, l = 4, sd.cl = 0.05, sd.rest = 1, locshift = 1) # apply kmeans and orclus kmeans.res2 <- kmeans(simdata$x, 3) orclus.res2 <- orclus(x = simdata$x, k = 3, l = 4, k0 = 15, a = 0.75) cat("sc: ", orclus.res2$sparsity.coefficient, "\n") # compare results table(kmeans.res2$cluster, simdata$cluster) table(orclus.res2$cluster, simdata$cluster) predict.orclass Subspace clustering based local classification using ORCLUS. Description Assigns clusters and distances and classes for new data according to the intrinsic subspace clusters of an orclass classification model. Usage ## S3 method for class 'orclass' predict(object, newdata, type = "nearest",...) Arguments object newdata type Model resulting from a call of orclass. A matrix or data frame to be clustered by the given model. Default "nearest" computes relative class frequencies of nearest cluster as class posterior probabilities.... Currently not used.

8 predict.orclass Details For prediction the class distribution of the "nearest"" cluster is used. If type = "fuzzywts" cluster memberships (see e.g. Bezdek, 1981) are computed based on the cluster distances of cluster assignment by predict.orclus. For orclass prediction the class distributions of the clusters are weigthed using the cluster memberships of an observation. Value class posterior distances cluster Vector of predicted class levels. Matrix where coloumns contain class posterior probabilities. A matrix where coloumns are the distances to all cluster centers in the corresponding subspaces for the new data. The resulting cluster labels for the new data. Author(s) Gero Szepannek References Aggarwal, C. and Yu, P. (2000): Finding generalized projected clusters in high dimensional spaces, Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 70-81. Bezdek, J. (1981): Pattern recognition with fuzzy objective function algorithms, Kluwer Academic, Norwell, MA. See Also orclass, orclus, predict.orclus Examples # definition of a function for parameterized data simulation sim.orclus <- function(k = 3, nk = 100, d = 10, l = 4, sd.cl = 0.05, sd.rest = 1, locshift = 1){ ### input parameters for data generation # k number of clusters # nk observations per cluster # d original dimension of the data # l subspace dimension where the clusters are concentrated # sd.cl (within cluster subspace) standard deviations for data generation # sd.rest standard deviations in the remaining space # locshift parameter of a uniform distribution to sample different cluster means x <- NULL for(i in 1:k){ # cluster centers apts <- locshift*matrix(runif(l*k), ncol = l) # sample points in original space xi.original <- cbind(matrix(rnorm(nk * l, sd = sd.cl), ncol=l) + matrix(rep(apts[i,], nk),

predict.orclus 9 ncol = l, byrow = TRUE), matrix(rnorm(nk * (d-l), sd = sd.rest), ncol = (d-l))) # subspace generation sym.mat <- matrix(nrow=d, ncol=d) for(m in 1:d){ for(n in 1:m){ sym.mat[m,n] <- sym.mat[n,m] <- runif(1) subspace <- eigen(sym.mat)$vectors # transformation xi.transformed <- xi.original %*% subspace x <- rbind(x, xi.transformed) clids <- rep(1:k, each = nk) result <- list(x = x, cluster = clids) return(result) # simulate data of 2 classes where class 1 consists of 2 subclasses simdata <- sim.orclus(k = 3, nk = 200, d = 15, l = 4, sd.cl = 0.05, sd.rest = 1, locshift = 1) x <- simdata$x y <- c(rep(1,400), rep(2,200)) res <- orclass(x, y, k = 3, l = 4, k0 = 15, a = 0.75) prediction <- predict(res, x) # compare results table(prediction$class, y) predict.orclus Arbitrarily ORiented projected CLUSter generation Description Assigns clusters and distances to cluster centers in the corresponding subspaces for new data according to a subspace clustering model of class orclus. Usage ## S3 method for class 'orclus' predict(object, newdata,...) Arguments object Model resulting from a call of orclus. newdata A matrix or data frame to be clustered by the given model.... Currently not used.

10 predict.orclus Value distances cluster A matrix where coloumns are the distances to all cluster centers in the corresponding subspaces for the new data. The resulting cluster labels for the new data. Author(s) Gero Szepannek References Aggarwal, C. and Yu, P. (2000): Finding generalized projected clusters in high dimensional spaces, Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 70-81. See Also orclus Examples # generate simple artificial example of two clusters clus1.v1 <- runif(100) clus2.v1 <- runif(100) xample <- rbind(cbind(clus1.v1, 0.5 - clus1.v1), cbind(clus2.v1, -0.5 + clus2.v1)) orclus.res <- orclus(x = xample, k = 2, l = 1, k0 = 8, a = 0.5) # generate new data and predict it using the newclus1.v1 <- runif(100) newclus2.v1 <- runif(100) true.clusterids <- rep(1:2, each = 100) xample2 <- rbind(cbind(newclus1.v1, 0.5 - newclus1.v1), cbind(newclus2.v1, -0.5 + newclus2.v1)) orclus.prediction <- predict(orclus.res, xample2) table(orclus.prediction$cluster, true.clusterids)

Index Topic classif orclass, 1 orclus, 4 predict.orclass, 7 predict.orclus, 9 Topic cluster orclass, 1 orclus, 4 predict.orclass, 7 predict.orclus, 9 Topic multivariate orclass, 1 orclus, 4 predict.orclass, 7 predict.orclus, 9 kmeans, 5 orclass, 1, 7, 8 orclus, 3, 4, 8 10 predict.orclass, 3, 7 predict.orclus, 3, 6, 8, 9 print.orclass (orclass), 1 print.orclus (orclus), 4 11