Multivariate Statistics: Hierarchical and k-means cluster analysis

Size: px
Start display at page:

Download "Multivariate Statistics: Hierarchical and k-means cluster analysis"

Transcription

1 Multivariate Statistics: Hierarchical and k-means cluster analysis Steffen Unkel Department of Medical Statistics University Medical Center Goettingen, Germany Summer term 217 1/43

2 What is a cluster? Proximity measures Cluster analysis is about discovering groups in data. Clustering methods should not be confused with discrimination and other supervised learning methods, where the groups are known a priori. Some authors suggest that that the ultimate criterion for evaluating the meaning of the term cluster is the value judgement of the user. Others attempt to define just what a cluster is in terms of internal cohesion (homogeneity) and external isolation (separation). Summer term 217 2/43

3 It is It notis entirely deardear how how a 'duster' is recognized whenwhen displayed in not entirely a 'duster' is recognized displaye Proximity measures It plane, is not entirely dear how a 'duster' is recognized when displayed in the Hierarchical clustering but one feature of the recognition process would appear to involve asse plane, but one feature of the recognition process would appear to involve ane,ment but ment one thedistances recognition process would appear to involve assessof feature the between points. How How human observers dr of relative theofrelative distances between points. human observe ent perceptually of the relative distances between points. How human observers draw Clusters with internal cohesion and/or external separation coherent dusters out ofout fields of 'dots' will be briefty perceptually coherent dusters of fields of 'dots' willconsidered be considered b rceptually coherent dusters out of fields of 'dots' will be considered briefty in Chapter 2. Chapter 2. hapter 2..,....,....., ,. Sif....,....,....,....!.,.....,. : :...!...., , :...!...,., ,., ,.....,..,., ;j::.: -.; :... :...,.. -.;j::.: : l l.e!.e! l.e!.~.~ Sif ~-~~-~.,., ~-~.,.~ Figure Clusters with internal cohesion and!orand!or external solution. (Reproduced Figure Clusters with internal cohesion external solution. (Reproduw of CRC Press f rom fgordon, 198.) Permission gure 1.2 Permission Clusters with internal cohesion and!or external solution. (Reproduced with of CRC Press rom Gordon, 198.) rmission of CRC Press f rom Gordon, 198.) Summer term 217 3/43

4 homogeneaus collection of points. Ideally, then, one might expect a m Proximity measures Hierarchical clustering cluster analysis applied to such data to come to a similar conclusion. As k-means clustering seen later, this may not be the case, and many (most) methods of cluster will divide the type of data seen in Figure 1.3 into 'groups'. Often the pr Data containing no natural clusters and dissection dividing a homogeneaus data set into different parts is referred to as dissec LUSTER ANALYSIS such a procedure may be useful in specific circumstances. If, for example, th in Figure 1. 3 represented the geographicallocations of houses in a town, d might be a useful way of dividing the town up into compact postal distric contain comparable numbers of houses - see Figure 1.4. (This exam suggested by Gordon, 198.) The problern is, of course, that since in mo , ' ~ ~. I, ~~ ~~, - -~_..! _.. _ ~-...~ ~-,, ~,, ~, Figure 1.4 Dissection of data in Figure 1.3 (Reproduced with permission of CRC P 1.3 Data containing no 'natural ' clusters. (Reproduced with permission of CRC G1 don, 198. ) om Gordon, 198. ) urther set of two-dimensional data is plotted in Figure 1.3. Here most ers would conclude that there is no 'natural' cluster structure, simply a single Summer term 217 4/43

5 The measurement of proximity To identify clusters which may be present in data knowledge on how close observations are to each other is needed. We need a quantitative measure of closeness, more commonly referred to as dissimilarity, distance, similarity, with a general term being proximity. Two observations are close when their dissimilarity is small or their similarity is large. Proximities can be determined either directly (e.g. from tasting experiments) or indirectly (usually derived from the n p data matrix X). Summer term 217 5/43

6 Similarity measures for categorical data With data in which all the variables are categorical, measures of similarity are most commonly used. The measures are generally scaled to be in the interval [, 1], or they are expressed as percentages in the range 1%. Two individuals i and j have a similarity coefficient s ij of unity (zero) if both have identical values for all variables (differ maximally for all variables). A similarity s ij could be converted into a dissimilarity δ ij by taking, for example, δ ij = 1 s ij. Summer term 217 6/43

7 Proximity measures equivalent to one- one Hierarchical matches, and therefore should be included in the calculated clustering similarity measure. An example is gender, where there is no preference as to which of the two categories should be coded zero or one. But in other cases the inclusion or otherwise of d is more problematic; for example, when the zero category corresponds to the genuine absence of some property, such as wings in a study of insects. The question that then needs to be asked is do the co-absences contain Binary data Table 3.2 Counts of binary outcomes for two individuals. Individual i Outcome Individual j Total Total +b a b a c d c+d p=a+b+c+d a +c b+d Figure: Counts of binary outcomes for two individuals. Summer term 217 7/43

8 Similarity measures for binary data Two out of many examples: 1 Matching coefficient: s ij = a + d a + b + c + d = a + d p. 2 Jaccard coefficient: s ij = a a + b + c. There is an apparent uncertainty as to how to deal with the count of zero-zero matches, d. In some cases, zero-zero matches are completely equivalent to one-one matches, but in other cases the inclusion of d is more problematic. Summer term 217 8/43

9 Similarity measures for categorical data with levels > 2 One approach is to allocate a score s ijk of zero or one to each variable k, depending on whether the two individuals i and j are the same on that variable. The similarity coefficient is then computed as s ij = 1 p s ijk. p k=1 An alternative approach is to 1 divide all possible outcomes of the kth variable into mutually exclusive subsets of categories, 2 allocate s ijk to zero or one depending on whether the two categories for individuals i and j are members of the same subset, and then 3 determine the proportion of shared subsets across variables. Summer term 217 9/43

10 Dissimilarity and distance measures for continuous data A dissimilarity measure δ ij > with δ ii = is termed a distance measure if it fulfils the metric (triangular) inequality δ ij + δ im δ jm. for pairs of individuals ij, im and jm. An n n matrix of dissimilarities,, with elements δ ij, where δ ii = for all i, is said to be metric, if the triangular inequality holds for all triples (i, j, m). We refer to metric dissimilarities as distances and denote the n n matrix of distances D, with elements d ij. Summer term 217 1/43

11 Distance measures Proximity measures General Minkowski distance or l r norm: ( p d ij = k=1 w r k x ik x jk r ) 1/r (r 1), where the w k, k = 1,..., p denote the non-negative weights of the p variables (often all w k =1). Both the Euclidean distance (l 2 norm) and the Manhattan (city-block or taxicab) distance (l 1 norm) are special cases of the Minkowski distance. Summer term /43

12 Euclidean property Proximity measures An n n dissimilarity matrix with elements δ ij is said to be Euclidean if the n individuals can be represented as points in space such that the Euclidean distance between points i and j is δ ij. This Euclidean property allows the interpretation of dissimilarities as physical distances. If a dissimilarity matrix is Euclidean then it is also metric, but the converse does not follow. Summer term /43

13 / / Illustration Proximity measures MEASUREMENT OF PROXIMITY / / Figure 3.1 An example of a set of distances that satisfy the metric inequality but which have no Euclidean representation. (Reproduced with permission from Gower and Legendre, 1986.) Figure: An example of a set of distances that satisfy the metric inequality but have no Euclidean representation. Summer term /43

14 Similarity measures for mixed data There are a number of approaches to constructing proximities for data in which some variables are continuous and some categorical, three examples of which are: 1 Dichotomize all variables and use a similarity measure for binary data. 2 Rescale all the variables so that they are on the same scale by replacing variable values by their ranks among the objects and then using a measure for continuous data. 3 Construct a dissimilarity measure for each type of variable and combine these, either with or without differential weighting into a single coefficient. Summer term /43

15 Inter-group proximity measures In clustering applications it also becomes necessary to consider how to measure the proximity between groups of individuals. There are two basic approaches: 1 Define suitable summary of the proximities between individuals from either group. Example: if group A has mean vector x A = ( x A1,..., x Ap ) and group B mean vector x B = ( x B1,..., x Bp ), then the generalized distance, D 2, is given by D 2 = ( x A x B ) W 1 ( x A x B ), where W is the pooled within-group covariance matrix for the two groups. 2 Each group might be described by a representative observation and the inter-group proximity defined as the proximity between the representative observations. Examples: nearest-neighbour distance, furthest-neighbour distance, average dissimilarity between individuals from both groups. Summer term /43

16 Standardization Proximity measures In many clustering applications the variables describing the objects to be clustered will not be measured in the same units. The solution most often suggested is to simply standardize each variable to unit variance. Standardization of variables to unit variance can be viewed as a special case of weighting. Sometimes it is preferable to standardize variables using a measure of within-group variability rather than one of total variability. Summer term /43

17 Proximity measures l() o oo~ ~8 gol' ~ oo N so~d~oo o~l?fo &'> ool o o Ct.~ oo OH ~ o Illustration of the standardization problem o CO o oo N I l() I x1 MEASUREMENT OF PROXIMITY (a) x1 (b) l() <!l <!l <t ~ l() o oo~ ~8 gol' oo o OH N so~d~oo o~l?fo &'> ool o o Ct.~ oo ~ o CO o oo N I l() I x1 (a) x1 (b) Figure x1 (c) 4 6 Illustration of standardization problem. (a) Data on (b) Undesirable standardization: weights basedontotal standard deviatio Figure: (a) Data on original scale. (b)standardization: Undesirable standardization: weights based on within-group standard deviations. weights based on total standard deviations. (c) Desirable standardization: weights based on within-group standard deviations. An alternative criterion for determining the importance of a va <!l Summer term 217 data has been proposed by De Soete (1986), who suggests findin for each variable, which yield weighted Euclidean distances t criterion for departure from ultrametricity (a term defined and d 17/43 next chapter; see Section 4.4.3). This is motivated by a well-kno

18 Introduction Proximity measures In hierarchical clustering the data are not partitioned into a particular number of clusters at a single step. Instead the clustering consists of a series of partitions and the aim is to find the optimal step. techniques may be subdivided into 1 agglomerative methods, which proceed by a series of successive fusions of the n objects into groups, and 2 divisive methods (less commonly used), which separate the n objects successively into finer groupings. With hierarchical methods, divisions or fusions, once made, are irrevocable. Summer term /43

19 Dendrogram Proximity measures Agglomerative a d Divisive Figure: Example of a hierarchical tree structure. Summer term /43

20 Agglomerative hierarchical clustering procedure An agglomerative procedure produces a series of partitions of the data, P n, P n 1,..., P 1, where P n consists of n single-member clusters, and P 1 consists of a single group containing all n objects. Algorithm Start: Clusters C 1, C 2,..., C n each containing a single individual. (1) Find the nearest point of distinct clusters, say C i and C j, merge C i and C j, delete C j and decrease the number of clusters by one. (2) If number of clusters equals one then stop, else return to (1). Difference between agglomerative methods arise because of the different ways of defining similarity between an individual and a group or between two groups of individuals. Summer term 217 2/43

21 Measuring inter-cluster dissimilarity Three inter-cluster distance measures: d AB = min (d ij), (1) i A, j B d AB = max (d ij), (2) i A, j B 1 d AB = d ij, (3) n A n B i A j B where d AB is the distance between two clusters A and B, d ij is the distance (e.g. Euclidean distance) between objects i and j and n A (n B ) is the number of objects in cluster A (B). (1) is the basis of single linkage clustering, that in (2) of complete linkage clustering, and that in (3) of group average clustering. Summer term /43

22 Proximity measures Examples of three inter-cluster distance measures Summer term /43

23 Illustrative example (single linkage) Consider the following distance matrix: D = At stage one, individuals 1 and 2 are merged to form a cluster, since d 12 is the smallest non-zero entry in D. Distances between this cluster and the other three individuals are obtained as d (12)3 = min(d 13, d 23 ) = d 23 = 5., d (12)4 = min(d 14, d 24 ) = d 24 = 9., d (12)5 = min(d 15, d 25 ) = d 25 = 8.. Summer term /43

24 Illustrative example (single linkage) A new matrix may now be constructed whose entries are inter-individual and cluster-individual distance values: (12). 3 D 1 = The smallest non-zero entry in D 1 is d 45 and so individuals 4 and 5 are now merged to form a new cluster, and a new set of distances are found: d (12)3 = 5. as before, d (12)(45) = min(d 14, d 15, d 24, d 25 ) = d 25 = 8., d (45)3 = min(d 34, d 35 ) = d 34 = 4.. Summer term /43

25 Illustrative example (single linkage) The new distances max be arranged to give the matrix D 2 = (12) (45) The smallest non-zero entry is now d (45)3, and so individual 3 is merged with the cluster containing individuals 4 and 5. Finally, fusion of the two remaining groups takes place to form a single group containing all five individuals. Summer term /43

26 Dendrogram for worked example of single linkage Summer term /43

27 Problems of agglomerative methods To illustrate some of the problems of agglomerative methods, a set of simulated data will be clustered using single, complete and average linkage. The data consist of 5 points simulated from two bivariate normal distributions with mean vectors (,) and (4,4) and common covariance matrix ( ) Σ = Two intermediate points have been added for the first analysis, in order to illustrate the problem known as chaining often found when using single linkage. Summer term /43

28 Proximity measures Dendrogram of single linkage clustering of simulated data Summer term /43

29 Proximity measures Problems of agglomerative methods Figure: Clusters obtained from simulated data: (a) single linkage, with with intermediate points, two-cluster solution; (b) single linkage, no intermediate points, five-cluster solution. Summer term /43

30 Proximity measures Problems of agglomerative methods Figure: Clusters obtained from simulated data: (c) complete linkage, no intermediate points, five-cluster solution; (d) average linkage, no intermediate points, five-cluster solution. Summer term 217 3/43

31 Global fit of a hierarchical clustering solution The method most commonly used for comparing a dendrogram with a proximity matrix or a second dendrogram is the cophenetic correlation. It is the product-moment correlation of the n(n 1)/2 entries in the lower half of the proximity matrix and the corresponding entries in the cophenetic matrix, C. The elements c ij of C are the heights at which individuals i and j appear together in the same cluster in the dendrogram. For the worked example, the cophenetic correlation between D and C is.82. Summer term /43

32 Choice of partition Proximity measures It is often the case that the investigator is not interested in the complete hierarchy but only in one or two partitions obtained from it. Partitions are obtained by cutting a dendrogram at a particular height or selecting one of the solutions on the nested sequence of clusterings. One informal approach for determining the number of groups is to examine the size of the difference between fusion levels in the resulting diagram. Large changes in fusion levels might be taken to indicate a particular number of clusters. Summer term /43

33 Introduction to optimization clustering techniques We now consider a clustering technique that produces a partition of the individuals into a specified number of groups. This is done by optimizing some numerical criterion. With each partition of the n objects into the required number of groups, k, is an index c(n, k), the value of which measures some quality of this particular criterion. Differences between optimization techniques arise both because of the variety of clustering criteria and the various optimization algorithms that might be used. Summer term /43

34 Deriving clustering criteria Considering the following three p p matrices derived from X: T = W = B = k n m m=1 l=1 k n m m=1 l=1 (x ml x)(x ml x), (x ml x m )(x ml x m ), k n m ( x m x)( x m x), m=1 where x ml is the vector of observations of the lth object in group m, x is the mean vector of all n observations, x m is the vector of sample means within group m, and n m is the number of observations in group m. Summer term /43

35 Clustering criteria Proximity measures The three matrices represent respectively total dispersion, within-group dispersion and between-group dispersion. They satisfy the equation: T = W + B. In the multivariate case (p > 1), a number of criteria based on the decomposition above have been suggested: 1 Minimization of tr(w), that is, minimize the sum of the within-group sums of squares, over all variables. 2 Minimization of det(w). 3 Maximization of tr(bw 1 ). Summer term /43

36 Number of possible partitions Having decided on a suitable criterion, consideration needs to be given to how to find a partition into k groups that optimizes the criterion. One may want to calculate the criterion value for each possible partition and choose a partition that gives an optimum. Unfortunately, there are a large number of possible partitions, N, even for moderate n and k; for example, N(5, 2) = 15, N(1, 3) = 933, N(1, 5) This problem has led to the development of hill-climbing algorithms designed to search for the optimum by rearranging the existing partitions and keeping the new one only if it provides improvement. Summer term /43

37 Optimization algorithm Hill-climbing clustering algorithm 1 Find some initial partition of the n objects into k groups. 2 Calculate the change in clustering criterion produced by moving each object from its own to another group. 3 Make the change which leads to the greatest improvement in the value of the clustering criterion. 4 Repeat steps 2 and 3 until no move of a single object causes the cluster criterion to improve. The k-means algorithm consists of iteratively updating a partition by simultaneously relocating each object to the group to whose mean (centroid) it was closest and then recalculating the group means. This can be shown to be equivalent to minimizing tr(w) when Euclidean distances are used to define closeness. Summer term /43

38 17 Proximity measures Illustrative example of K- means clustering: Example Pick K random points as cluster centers (means) Shown here for K=2 Figure: Pick k random points as cluster means (here k = 2). Summer term /43

39 Illustrative example of K- means clustering: Example Iterative Step 1 Assign data points to closest cluster center 18 Figure: Assign points to closest cluster center. Summer term /43

40 1 Proximity measures Illustrative example of K- means clustering: Example Iterative Step 2 Change the cluster center to the average the assigned points Figure: Change the cluster center to the average of the assigned points. Summer term 217 4/43

41 2 Proximity measures Illustrative example of K- means clustering: Example Repeat undl convergence Figure: Repeat until convergence. Summer term /43

42 Choosing the number of clusters The k-means approach is used to partition the data into a prespecified number of clusters set by the investigator. In practice, solutions for a range of values for number of groups are found, but the question remains as to the optimal number of clusters for the data. A number of suggestions have been made as to how tackle this question. For example, one involves plotting the value of the clustering criterion against the number of groups. As k increases the value of tr(w) will necessarily decrease but some sharp change may be indicative of the best solution. Summer term /43

43 Properties of k-means and alternatives The k-means algorithm is guaranteed to converge, at least to a local optimum. However, it suffers from the problems of 1 not being scale-invariant, that is, different solutions may be obtained from the raw data and from the data standardized in some way. 2 and of imposing a spherical structure on the observed clusters even when the natural clusters in the data are of other shapes. Alternatives to k-means are the k-median and partitioning around medoids (PAM) algorithms. Summer term /43

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

University of Florida CISE department Gator Engineering. Clustering Part 1

University of Florida CISE department Gator Engineering. Clustering Part 1 Clustering Part 1 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville What is Cluster Analysis? Finding groups of objects such that the objects

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Similarity and Dissimilarity Similarity Numerical measure of how alike two data objects are. Is higher

More information

Clustering Lecture 1: Basics. Jing Gao SUNY Buffalo

Clustering Lecture 1: Basics. Jing Gao SUNY Buffalo Clustering Lecture 1: Basics Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics Clustering

More information

Data Exploration and Unsupervised Learning with Clustering

Data Exploration and Unsupervised Learning with Clustering Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a

More information

Applying cluster analysis to 2011 Census local authority data

Applying cluster analysis to 2011 Census local authority data Applying cluster analysis to 2011 Census local authority data Kitty.Lymperopoulou@manchester.ac.uk SPSS User Group Conference November, 10 2017 Outline Basic ideas of cluster analysis How to choose variables

More information

2. Sample representativeness. That means some type of probability/random sampling.

2. Sample representativeness. That means some type of probability/random sampling. 1 Neuendorf Cluster Analysis Model: X1 X2 X3 X4 X5 Clusters (Nominal variable) Y1 Y2 Y3 Clustering/Internal Variables External Variables Assumes: 1. Actually, any level of measurement (nominal, ordinal,

More information

Machine Learning - MT Clustering

Machine Learning - MT Clustering Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:

More information

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden Clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Some slides adapted from Jacques van Helden Small vs. large parsimony A quick review Fitch s algorithm:

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering

More information

Clustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering.

Clustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering. 1 / 19 sscott@cse.unl.edu x1 If no label information is available, can still perform unsupervised learning Looking for structural information about instance space instead of label prediction function Approaches:

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

k-means clustering mark = which(md == min(md)) nearest[i] = ifelse(mark <= 5, "blue", "orange")}

k-means clustering mark = which(md == min(md)) nearest[i] = ifelse(mark <= 5, blue, orange)} 1 / 16 k-means clustering km15 = kmeans(x[g==0,],5) km25 = kmeans(x[g==1,],5) for(i in 1:6831){ md = c(mydist(xnew[i,],km15$center[1,]),mydist(xnew[i,],km15$center[2, mydist(xnew[i,],km15$center[3,]),mydist(xnew[i,],km15$center[4,]),

More information

Multivariate Analysis Cluster Analysis

Multivariate Analysis Cluster Analysis Multivariate Analysis Cluster Analysis Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Cluster Analysis System Samples Measurements Similarities Distances Clusters

More information

STATISTICA MULTIVARIATA 2

STATISTICA MULTIVARIATA 2 1 / 73 STATISTICA MULTIVARIATA 2 Fabio Rapallo Dipartimento di Scienze e Innovazione Tecnologica Università del Piemonte Orientale, Alessandria (Italy) fabio.rapallo@uniupo.it Alessandria, May 2016 2 /

More information

Overview of clustering analysis. Yuehua Cui

Overview of clustering analysis. Yuehua Cui Overview of clustering analysis Yuehua Cui Email: cuiy@msu.edu http://www.stt.msu.edu/~cui A data set with clear cluster structure How would you design an algorithm for finding the three clusters in this

More information

1 Basic Concept and Similarity Measures

1 Basic Concept and Similarity Measures THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay Lecture 10: Cluster Analysis and Multidimensional Scaling 1 Basic Concept and Similarity Measures

More information

Similarity and Dissimilarity

Similarity and Dissimilarity 1//015 Similarity and Dissimilarity COMP 465 Data Mining Similarity of Data Data Preprocessing Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed.

More information

2. Sample representativeness. That means some type of probability/random sampling.

2. Sample representativeness. That means some type of probability/random sampling. 1 Neuendorf Cluster Analysis Assumes: 1. Actually, any level of measurement (nominal, ordinal, interval/ratio) is accetable for certain tyes of clustering. The tyical methods, though, require metric (I/R)

More information

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes Week 10 Based in part on slides from textbook, slides of Susan Holmes Part I Linear regression & December 5, 2012 1 / 1 2 / 1 We ve talked mostly about classification, where the outcome categorical. If

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Clustering. Genome 373 Genomic Informatics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Clustering. Genome 373 Genomic Informatics Elhanan Borenstein. Some slides adapted from Jacques van Helden Clustering Genome 373 Genomic Informatics Elhanan Borenstein Some slides adapted from Jacques van Helden The clustering problem The goal of gene clustering process is to partition the genes into distinct

More information

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Clusters. Unsupervised Learning. Luc Anselin.   Copyright 2017 by Luc Anselin, All Rights Reserved Clusters Unsupervised Learning Luc Anselin http://spatial.uchicago.edu 1 curse of dimensionality principal components multidimensional scaling classical clustering methods 2 Curse of Dimensionality 3 Curse

More information

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden Clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Some slides adapted from Jacques van Helden Gene expression profiling A quick review Which molecular processes/functions

More information

Multivariate Analysis

Multivariate Analysis Multivariate Analysis Chapter 5: Cluster analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2015/2016 Master in Business Administration and

More information

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden Clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Some slides adapted from Jacques van Helden Gene expression profiling A quick review Which molecular processes/functions

More information

Hierarchical Clustering

Hierarchical Clustering Hierarchical Clustering Example for merging hierarchically Merging Apples Merging Oranges Merging Strawberries All together Hierarchical Clustering In hierarchical clustering the data are not partitioned

More information

More on Unsupervised Learning

More on Unsupervised Learning More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data

More information

Notion of Distance. Metric Distance Binary Vector Distances Tangent Distance

Notion of Distance. Metric Distance Binary Vector Distances Tangent Distance Notion of Distance Metric Distance Binary Vector Distances Tangent Distance Distance Measures Many pattern recognition/data mining techniques are based on similarity measures between objects e.g., nearest-neighbor

More information

Distances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Distances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Distances and similarities Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Similarities Start with X which we assume is centered and standardized. The PCA loadings were

More information

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes Mixtures of Gaussians with Sparse Regression Matrices Constantinos Boulis, Jeffrey Bilmes {boulis,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu May 2, 2017 Announcements Homework 2 due later today Due May 3 rd (11:59pm) Course project

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with

More information

DIMENSION REDUCTION AND CLUSTER ANALYSIS

DIMENSION REDUCTION AND CLUSTER ANALYSIS DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833

More information

Johnson, S. C. (1967). Hierarchical clustering. schemes. Psychometrika, 32, citations in Google Scholar as of 4/1/2016

Johnson, S. C. (1967). Hierarchical clustering. schemes. Psychometrika, 32, citations in Google Scholar as of 4/1/2016 This was my first published paper; it was written during the summer of 1965 while I was working at Bell Labs, taking a vacation from my mathematics PhD (in category theory). The paper went through many

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Clustering Ambiguity: An Overview

Clustering Ambiguity: An Overview Clustering Ambiguity: An Overview John D. MacCuish Norah E. MacCuish 3 rd Joint Sheffield Conference on Chemoinformatics April 23, 2004 Outline The Problem: Clustering Ambiguity and Chemoinformatics Preliminaries:

More information

Cluster Validity. Oct. 28, Cluster Validity 10/14/ Erin Wirch & Wenbo Wang. Outline. Hypothesis Testing. Relative Criteria.

Cluster Validity. Oct. 28, Cluster Validity 10/14/ Erin Wirch & Wenbo Wang. Outline. Hypothesis Testing. Relative Criteria. 1 Testing Oct. 28, 2010 2 Testing Testing Agenda 3 Testing Review of Testing Testing Review of Testing 4 Test a parameter against a specific value Begin with H 0 and H 1 as the null and alternative hypotheses

More information

Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data

Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Juan Torres 1, Ashraf Saad 2, Elliot Moore 1 1 School of Electrical and Computer

More information

12 Discriminant Analysis

12 Discriminant Analysis 12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into

More information

Computer Vision Group Prof. Daniel Cremers. 14. Clustering

Computer Vision Group Prof. Daniel Cremers. 14. Clustering Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Definition 2.3. We define addition and multiplication of matrices as follows.

Definition 2.3. We define addition and multiplication of matrices as follows. 14 Chapter 2 Matrices In this chapter, we review matrix algebra from Linear Algebra I, consider row and column operations on matrices, and define the rank of a matrix. Along the way prove that the row

More information

Cluster Analysis (Sect. 9.6/Chap. 14 of Wilks) Notes by Hong Li

Cluster Analysis (Sect. 9.6/Chap. 14 of Wilks) Notes by Hong Li 77 Cluster Analysis (Sect. 9.6/Chap. 14 of Wilks) Notes by Hong Li 1) Introduction Cluster analysis deals with separating data into groups whose identities are not known in advance. In general, even the

More information

Nearest Neighbors Methods for Support Vector Machines

Nearest Neighbors Methods for Support Vector Machines Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Classification methods

Classification methods Multivariate analysis (II) Cluster analysis and Cronbach s alpha Classification methods 12 th JRC Annual Training on Composite Indicators & Multicriteria Decision Analysis (COIN 2014) dorota.bialowolska@jrc.ec.europa.eu

More information

Interaction Analysis of Spatial Point Patterns

Interaction Analysis of Spatial Point Patterns Interaction Analysis of Spatial Point Patterns Geog 2C Introduction to Spatial Data Analysis Phaedon C Kyriakidis wwwgeogucsbedu/ phaedon Department of Geography University of California Santa Barbara

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Cluster Analysis CHAPTER PREVIEW KEY TERMS

Cluster Analysis CHAPTER PREVIEW KEY TERMS LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: Define cluster analysis, its roles, and its limitations. Identify the types of research questions addressed by

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 10 What is Data? Collection of data objects

More information

Proximity data visualization with h-plots

Proximity data visualization with h-plots The fifth international conference user! 2009 Proximity data visualization with h-plots Irene Epifanio Dpt. Matemàtiques, Univ. Jaume I (SPAIN) epifanio@uji.es; http://www3.uji.es/~epifanio Outline Motivating

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Data Mining 4. Cluster Analysis

Data Mining 4. Cluster Analysis Data Mining 4. Cluster Analysis 4.2 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Data Structures Interval-Valued (Numeric) Variables Binary Variables Categorical Variables Ordinal Variables Variables

More information

Some thoughts on the design of (dis)similarity measures

Some thoughts on the design of (dis)similarity measures Some thoughts on the design of (dis)similarity measures Christian Hennig Department of Statistical Science University College London Introduction In situations where p is large and n is small, n n-dissimilarity

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Mixtures of Gaussians with Sparse Structure

Mixtures of Gaussians with Sparse Structure Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or

More information

Type of data Interval-scaled variables: Binary variables: Nominal, ordinal, and ratio variables: Variables of mixed types: Attribute Types Nominal: Pr

Type of data Interval-scaled variables: Binary variables: Nominal, ordinal, and ratio variables: Variables of mixed types: Attribute Types Nominal: Pr Foundation of Data Mining i Topic: Data CMSC 49D/69D CSEE Department, e t, UMBC Some of the slides used in this presentation are prepared by Jiawei Han and Micheline Kamber Data Data types Quality of data

More information

Similarity Numerical measure of how alike two data objects are Value is higher when objects are more alike Often falls in the range [0,1]

Similarity Numerical measure of how alike two data objects are Value is higher when objects are more alike Often falls in the range [0,1] Similarity and Dissimilarity Similarity Numerical measure of how alike two data objects are Value is higher when objects are more alike Often falls in the range [0,1] Dissimilarity (e.g., distance) Numerical

More information

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events Discrete Structures II (Summer 2018) Rutgers University Instructor: Abhishek

More information

Machine Learning for Data Science (CS4786) Lecture 8

Machine Learning for Data Science (CS4786) Lecture 8 Machine Learning for Data Science (CS4786) Lecture 8 Clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Announcement Those of you who submitted HW1 and are still on waitlist email

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Popularity Recommendation Systems Predicting user responses to options Offering news articles based on users interests Offering suggestions on what the user might like to buy/consume

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Learning Objectives for Stat 225

Learning Objectives for Stat 225 Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:

More information

Multimedia Retrieval Distance. Egon L. van den Broek

Multimedia Retrieval Distance. Egon L. van den Broek Multimedia Retrieval 2018-1019 Distance Egon L. van den Broek 1 The project: Two perspectives Man Machine or? Objective Subjective 2 The default Default: distance = Euclidean distance This is how it is

More information

MULTIVARIATE ANALYSIS OF BORE HOLE DISCONTINUITY DATA

MULTIVARIATE ANALYSIS OF BORE HOLE DISCONTINUITY DATA Maerz,. H., and Zhou, W., 999. Multivariate analysis of bore hole discontinuity data. Rock Mechanics for Industry, Proceedings of the 37th US Rock Mechanics Symposium, Vail Colorado, June 6-9, 999, v.,

More information

Discrimination Among Groups. Discrimination Among Groups

Discrimination Among Groups. Discrimination Among Groups Discrimination Among Groups Id Species Canopy Snag Canopy Cover Density Height 1 A 80 1.2 35 2 A 75 0.5 32 3 A 72 2.8 28..... 31 B 35 3.3 15 32 B 75 4.1 25 60 B 15 5.0 3..... 61 C 5 2.1 5 62 C 8 3.4 2

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Module 7-2 Decomposition Approach

Module 7-2 Decomposition Approach Module 7-2 Decomposition Approach Chanan Singh Texas A&M University Decomposition Approach l Now we will describe a method of decomposing the state space into subsets for the purpose of calculating the

More information

Lecture 2: Data Analytics of Narrative

Lecture 2: Data Analytics of Narrative Lecture 2: Data Analytics of Narrative Data Analytics of Narrative: Pattern Recognition in Text, and Text Synthesis, Supported by the Correspondence Analysis Platform. This Lecture is presented in three

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Outline Lecture Notes Math /17

Outline Lecture Notes Math /17 Outline Lecture Notes Math 5772 2016/17 2 Chapter 1 Dissimilarity and distance 1.1 Introduction In this part of the course, we are going to look at two related problems that are more to do with the distances

More information

CS246 Final Exam. March 16, :30AM - 11:30AM

CS246 Final Exam. March 16, :30AM - 11:30AM CS246 Final Exam March 16, 2016 8:30AM - 11:30AM Name : SUID : I acknowledge and accept the Stanford Honor Code. I have neither given nor received unpermitted help on this examination. (signed) Directions

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Heuristics for The Whitehead Minimization Problem

Heuristics for The Whitehead Minimization Problem Heuristics for The Whitehead Minimization Problem R.M. Haralick, A.D. Miasnikov and A.G. Myasnikov November 11, 2004 Abstract In this paper we discuss several heuristic strategies which allow one to solve

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar 1 Types of data sets Record Tables Document Data Transaction Data Graph World Wide Web Molecular Structures

More information

Module Master Recherche Apprentissage et Fouille

Module Master Recherche Apprentissage et Fouille Module Master Recherche Apprentissage et Fouille Michele Sebag Balazs Kegl Antoine Cornuéjols http://tao.lri.fr 19 novembre 2008 Unsupervised Learning Clustering Data Streaming Application: Clustering

More information

Machine Learning. Clustering 1. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Clustering 1. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Clustering 1 Hamid Beigy Sharif University of Technology Fall 1395 1 Some slides are taken from P. Rai slides Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1

More information

NEAREST NEIGHBOR CLASSIFICATION WITH IMPROVED WEIGHTED DISSIMILARITY MEASURE

NEAREST NEIGHBOR CLASSIFICATION WITH IMPROVED WEIGHTED DISSIMILARITY MEASURE THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Series A, OF THE ROMANIAN ACADEMY Volume 0, Number /009, pp. 000 000 NEAREST NEIGHBOR CLASSIFICATION WITH IMPROVED WEIGHTED DISSIMILARITY MEASURE

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

REGIONALIZATION AS SPATIAL DATA MINING PROBLEM BASED ON CLUSTERING: REVIEW

REGIONALIZATION AS SPATIAL DATA MINING PROBLEM BASED ON CLUSTERING: REVIEW REGIONALIZATION AS SPATIAL DATA MINING PROBLEM BASED ON CLUSTERING: REVIEW Geetinder Saini 1, Kamaljit Kaur 2 1 Department of Computer Science & Engineering 2 Assistant Professor, Department of Computer

More information

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Machine Learning on temporal data

Machine Learning on temporal data Machine Learning on temporal data Classification rees for ime Series Ahlame Douzal (Ahlame.Douzal@imag.fr) AMA, LIG, Université Joseph Fourier Master 2R - MOSIG (2011) Plan ime Series classification approaches

More information

Supplementary Information

Supplementary Information Supplementary Information For the article"comparable system-level organization of Archaea and ukaryotes" by J. Podani, Z. N. Oltvai, H. Jeong, B. Tombor, A.-L. Barabási, and. Szathmáry (reference numbers

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von

More information

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14 STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information