COPS. Cluster Optimized Proximity Scaling

Size: px

Start display at page:

Download "COPS. Cluster Optimized Proximity Scaling"

Stuart Willis
5 years ago
Views:

1 COPS Cluster Optimized Proximity Scaling SLIDE 1 Psychoco 2015,

2 Outline 1 Objectives of Multidimensional Scaling 2 COPS: Cluster optimized proximity scaling C-Clusteredness and an Index The COPS Procedure Optimization Package 3 Conclusion And Outlook This is joint work with Patrick Mair and Kurt Hornik. SLIDE 2 Psychoco 2015,

3 Multidimensional Scaling (MDS) - I Popular method for representing multivariate high-dimensional proximities in some lower-dimensional space MDS utilizes a loss function, e.g., a least squares one σ MDS (X) = i<j w ij [f(δ ij ) g(d ij (X))] 2 and minimizes it to find the configuration X arg min σ MDS (X) X d ij (X)... fitted distances δ ij... proximities w ij... finite weights g( ), f( )... transformation functions, usually the identity function I( ) SLIDE 3 Psychoco 2015,

4 Multidimensional Scaling (MDS) - II Provides an optimal map into continuous space R M and looks for directions of spread in the low dimensional space (objective 1) But often one is also interested in discrete structures of similarity between objects ( clusters ; objective 2) MDS does solve objective 1 but not objective 2. The latter is often inferred from the former by how it looks It can happen that what is optimal for objective 1 is not very useful for objective 2 SLIDE 4 Psychoco 2015,

5 Illustration I m a Republican, because... from Mair et al. (2014) Supporters of the Republican Party have been asked why they are Republican (254 statements) Natural language data that was scraped and processed = Sparse data matrix (document term matrix) Objects are the words (we use only words that appeared at least 10 times) We look for themes in the statements: Mantras (words that occur often together) We use a cosine distance for word co-occurences and apply standard least squares MDS (SMACOF) for representation. SLIDE 5 Psychoco 2015,

6 Illustration Republican Mantras? Configurations D responsibility personal low limited military taxes defense national strong fiscal people right small government individual constitution liberty freedom life principles free best family values founding market conservative country party great america will american god nation work hard Configurations D1 SLIDE 6 Psychoco 2015,

7 Illustration Optimal configuration does not have an an all too obvious clustering structure. One way out: Fit metric MDS with power transformation by setting, e.g., f(δ ij ) = δ 20 ij Clustering is clearer but the fit is now worse (0.946 versus 0.947) Republican Mantras?! people workhard Configurations D small military taxes fiscal responsibility personal freedom government strong limited defense national right individual liberty low values conservative family principles american life constitution free party america great country nation will best market god founding Configurations D1 SLIDE 7 Psychoco 2015,

8 COPS for the Rescue We propose a general solution to this problem that consists of the following steps: Use MDS loss with θ-parametrized strictly monotonic nonlinear transformations of either proximities or fitted distances or both e.g., power transformations (powerstress, g(d ij (X)) = d ij (X) κ and f(δ ij ) = δ λ ij, so θ = c(κ, λ)) Use an index of the obtained degree of clusteredness in the configuration (c-clusteredness) to quantify how clustered the result is Combine the stress function, the transformations and the clusteredness index into a single target function and optimize over the parameters We call this COPS (Cluster Optimized Proximity Scaling; Rusch et al., 2015a) SLIDE 8 Psychoco 2015,

9 C-Clusteredness C-Clusteredness: The amount of clusteredness of a configuration c clusteredness= 0 c clusteredness= 0.03 c clusteredness= 0.23 D1 c clusteredness= 0.36 D1 c clusteredness= 0.61 D1 c clusteredness= 1 D1 SLIDE 9 Psychoco 2015, D1 D1

10 OPTICS Cordillera - I Index for clusteredness: OPTICS cordillera Employs OPTICS (Ankerst et al., 1999) with metaparameters k, ɛ on the configuration distances. For row vectors x j of X returns an ordering R of these points, R = {x (i) } i=1,...,n. So, x (1) is the x j that is at position 1 in the ordering. OPTICS also returns a reachability plot (dendrogram of minimum reachabilities r (i) of point x (i)) Ordering and reachability represent the clustering structure. We aggregate that to an index OC(X) by defining (for metaparameter q > 0) ( N i=2 OC(X) = r (i) ) 1/q r (i 1) q C C... (optional) normalizing constant SLIDE 10 Psychoco 2015,

11 OPTICS Cordillera - II c clusteredness= 0 c clusteredness= 0.03 c clusteredness= 0.36 c clusteredness= 1 D1 D1 D1 D SLIDE 11 Psychoco 2015,

12 Properties of the OPTICS Cordillera For given metaparameters ɛ, k, q the following applies (Rusch et al., 2015a) Upper bound for OC(X) in the maximal c-clusteredness case ( ) C (X, d max, ɛ, k, q) = d q N 1 N 1 max + k k Cluster assignment or a priori defined number or shape of clusters not needed OC(X) typically increases when Distances between clusters increase (Emphasis Property) Points are more densely clustered (Density Property) Number of clusters increases (Tally Property) Does not pick up unbalancedness in the number of points in a cluster as a sign of c-clusteredness (Balance Property) SLIDE 12 Psychoco 2015,

13 The Full COPS Procedure Combine the θ parametrized MDS loss measure, σ MDS (X(θ), θ) and the OPTICS cordillera OC(X) to cluster optimized loss (coploss): coploss(θ) = v 1 σ MDS (X(θ), θ) v 2 OC (X(θ)) (1) with arg min X σ MDS (X, θ) := X(θ) and v 1, v 2 R controlling how much weight should be given to the individual parts of coploss, e.g, v 1 = 1, v 2 = σ MDS (X(θ 0 ), θ 0 ) OC (X(θ 0 )) with θ 0 some reference solution, e.g., for powerstress θ 0 = (1, 1). SLIDE 13 Psychoco 2015,

14 Optimization - I We need to do (θ is t dimensional) coploss(θ) min θ! We use a nested algorithm that first solves for X(θ) and then minimizes (1) over θ. For the inner part, i.e., finding X(θ) standard MDS optimization is used (e.g., majorization) The outer part of this optimization problem is complicated so we employ metaheuristics The inner minimization is costly, so a useful metaheuristic makes little evaluations of the outer function (which is okay if t is small) Simulated Annealing or population based algorithms are not that well suited We made good experiences with a customized Luus-Jaakola algorithm (usually converges to good solution in < 200 iterations for a minimal search space width (accd) of ). SLIDE 14 Psychoco 2015,

15 Optimization - II Adaptive Luus-Jakola Algorithm (ALJ): An adaptation of Luus-Jakola search (Luus & Jaakola, 1973) Sample θ (i) from within t-orthotope [l, u] t with l, u are lower, upper boundaries Set d to be the length of the search space Repeat until termination (accd, maxiter, acc) : Pick a (i) U t ( d, d) Set θ (i+1) θ (i) + a (i) If coploss(θ (i+1) ) < coploss(θ (i) ) set θ (opt) = θ (i+1), else set d = d s Here (this is the customized part): s = o ) m+1 i m, m = min, maxiter and 0 o 1. ( log(accd) log(max(u l)) log(o) SLIDE 15 Psychoco 2015,

16 R Package stops All of this is implemented in the R package stops High level function cops(proximitymatrix,loss,...) Prespecified MDS models are strain, symmetric SMACOF (smacofsym), sammon mapping, elastic scaling, SMACOF on a sphere (smacofsphere), sstress, rstress, powerstress, Sammon mapping and elastic scaling with powers (powersammon, powerelastic) Optimization with ALJ or simulated annealing (SANN) or a particle swarm algorithm (pso) Features cordillera and interface to OPTICS in ELKI (optics) S3 methods: plot, summary, print, coef, residuals, plot3d, plot3dstatic SLIDE 16 Psychoco 2015,

17 Example: Republicans We now use COPS with powerstress on the I m a Republican, because... data set: R> resc <- cops(dt.dist,loss="powerstress", + lower=c(1,1),minpts=6,upper=c(3,20)) R> resc Call: cops(dis = dt.dist, loss = "powerstress", theta = c(1, 1), minpts = 6, lower = c(1, 1), upper = c(3, 20)) Model: COPS with powerstress loss function and parameters kappa= lambda= Number of objects: 37 MDS loss value: OPTICS cordillera: Raw= Normed= Cluster optimized loss (coploss): MDS loss weight: 1, OPTICS cordillera weight: Number of iterations of ALJ optimization: 117 SLIDE 17 Psychoco 2015,

18 Example: Republicans R> plot(resc) Republican Mantras! Configurations D american foundinglife great nation conservative party free Paleocon+Populist Right constitution america principles liberty individual right national defense Neocon+Liberalism limited strong will market country god hard work Traditionalist+Compassionate best responsibility personal government taxes low Fiscalcon+Libertarian fiscal military values small freedom Unclustered (cut at eps=0.6) family people Configurations D1 SLIDE 18 Psychoco 2015,

19 Summary COPS COPS works well when the objective is to obtain both a scaling and a clustering It is easily adaptable to many other loss functions It is particularly useful when there is only little variability in the proximities C-Clusteredness and OPTICS cordillera A concept and a measure of goodness-of-clustering in dimension reduction results that has appealing properties Interesting beyond COPS SLIDE 19 Psychoco 2015,

20 Outlook Beyond COPS c-clusteredness is an aspect of a more general idea which we coined c-structuredness (Rusch et al., 2015b) The idea of COPS can be generalized to Augmented Nonlinear Dimension Reduction and STOPS (Structure optimized proximity scaling) (Rusch et al., 2015b) Nearly there, only a few kinks to even out Future research Issues with finding global optimum Speed up the optimization problem (inner minimization) Inference is still unsolved (but we re working on that too) SLIDE 20 Psychoco 2015,

21 References Ankerst, M., Breunig, M., Kriegel, H.-P. & Sander, J. (1999) OPTICS: Ordering points to identify the clustering structure, ACM Sigmod Record 28, Luus, R. & Jaakola, T. (1973) Optimization by direct search and systematic reduction of the size of search region, AIChE Journal, 19, Mair, P., Rusch, T. & Hornik, K. (2014) The grand old party - A party of values? SpringerPlus, 3:697. Rusch, T., Mair, P. & Hornik, K. (2015a) COPS: Cluster optimized proximity scaling, Report 2015/1, Discussion Paper Series, Center for Empirical Research Methods, WU Vienna University of Economics and Business. Rusch, T., Mair, P. & Hornik, K. (2015b) Structuredness Indices and Augmented Nonlinear Dimension Reduction, Report 2015/X, Discussion Paper Series, Center for Empirical Research Methods, WU Vienna University of Economics and Business. forthcoming SLIDE 21 Psychoco 2015,

22 Thank you for your attention Thomas Rusch Competence Center for Empirical Research Methods URL: WU Vienna University of Economics and Business Welthandelsplatz 1, 1020 Vienna Austria SLIDE 22 Psychoco 2015,

Multidimensional Scaling in R: SMACOF

Multidimensional Scaling in R: SMACOF Patrick Mair Institute for Statistics and Mathematics WU Vienna University of Economics and Business Jan de Leeuw Department of Statistics University of California,