Multivariate Analysis

Size: px
Start display at page:

Download "Multivariate Analysis"

Transcription

1 Multivariate Analysis Chapter 5: Cluster analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid Course 2015/2016 Master in Business Administration and Quantitative Methods Master in Mathematical Engineering Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 1 / 63

2 Chapter outline 1 Introduction. 2 Proximity measures. 3 Hierarchical clustering. 4 Partition clustering. 5 Model-based clustering. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 2 / 63

3 Introduction The purpose of cluster analysis is to group objects in a multivariate data set into different homogeneous groups. This is done by grouping individuals that are somehow similar according to some appropriate criterion. Once the clusters are obtained, it is generally useful to describe each group using some descriptive tools to create a better understanding of the differences that exists among the formulated groups. Cluster methods are also known as unsupervised classification methods. These are different than the supervised classification methods, or Classification Analysis, that will be presented in Chapter 7. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 3 / 63

4 Introduction Clustering techniques are applicable whenever a data set needs to be grouped into meaningful blocks. In some applications we know that the data naturally fall into a certain number of groups, but in many cases the number of clusters is not known. For some clustering methods the user has to specify the number of clusters prior to applying the method. This is not always easy, and unless additional information exists about the number of clusters, one typically explores different values and looks at potential interpretation of the clustering results. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 4 / 63

5 Introduction We use to think of multivariate measurements as quantitative random variables, but attributes of objects such as color, shape or species are relevant and should be integrated into an analysis as much as possible. For some data, an additional variable which assigns color or species type a numerical value might be appropriate. If some extra knowledge is available, it should inform our analysis and could guide the choice of the number of clusters. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 5 / 63

6 Introduction Central to some clustering approaches is the notion of proximity of two random vectors. We measure the degree of proximity of two multivariate observations by a distance measure. Intuitively, one might think of the Euclidean distance between two vectors, and this is typically the first and also the most common distance one applies in Cluster Analysis. We will consider also a number of distance measures, and we will explore their effect on the resulting cluster structure. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 6 / 63

7 Introduction Some cluster procedures are based on using mixtures of distributions. The underlying assumptions of these models, namely, that the data in the different parts are from a certain distribution, is not easy to verify and may not hold. However, these methods have been shown to be powerful under general circumstances. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 7 / 63

8 Introduction The strength of Cluster Analysis is its exploratory nature. As one varies the number of clusters, distance measures or mixtures distributions, different cluster patterns appear. These patterns might provide new insight into the structure of the data. Different cluster patterns can indicate the existence of unexpected substructures, which, in turn, can lead to further or more in-depth investigations of the data. For this reason, where possible, the interpretation of a cluster analysis should involve a subject expert. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 8 / 63

9 Introduction There are a large vast amount of cluster procedures. Here, we will focus on: Hierarchical clustering: start with singleton clusters (individual observations) and merges clusters or start with a single cluster (the whole dataset) and split clusters. Partition clustering: starts from a given group definition and proceed by exchanging elements between groups until a certain criterion is optimized. Model-based clustering: the random vectors are modeled by mixtures of distributions and the parameters of the mixture distributions are estimated. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 9 / 63

10 Proximity measures Two of the clustering methods that we are going to present depends on the notion of proximity. Proximities also play an important role in other multivariate techniques such as multidimensional scaling that will be presented in Chapter 6 and some of the methods for classification in Chapter 7. We already know some distances between multivariate observations: the Euclidean distance and the Mahalanobis distance. Next, we present alternative distances. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 10 / 63

11 Proximity measures We begin with the definition of distance and then consider common distances. Definition: A distance, d, between two multivariate random variables x i and x i for i, i = 1,..., n, denoted by d (x i, x i ), is a positive random variable which satisfies: 1 d (x i, x i ) 0, for all i, i = 1,..., n, 2 d (x i, x i ) = 0, if and only if i = i, and 3 d (x i, x i ) d (x i, x i ) + d (x i, x i ), for all i, i, i = 1,..., n. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 11 / 63

12 Proximity measures The two most common distances in Statistics are the Euclidean distance and the Mahalanobis distance. The Euclidean distance, d E, between x i and x i, for i, i = 1,..., n, is given by: d E (x i, x i ) = [ (x i x i ) (x i x i ) ] 1/2 The Mahalanobis distance, d M, between x i and x i, for i, i = 1,..., n, is given by: d M (x i, x i ) = [ (x i x i ) Σ 1 x (x i x i ) ] 1/2 where Σ x is the common covariance matrix of x i and x i. Note that the Euclidean distance coincides with the Mahalanobis distance if Σ x = I p. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 12 / 63

13 Proximity measures The weigthed p-distance or Minkowski distance, d p, between x i and x i, for i, i = 1,..., n, is given by: where ω 1,..., ω p are positive weights. p d p (x i, x ) i = ω j x ij x i j p j=1 If p = 1, d p is called the Manhattan distance. If, in addition, all weights are one, then d p is called the city block distance. 1/p Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 13 / 63

14 Proximity measures The maximum distance or Chebychev distance, d max, between x i and x i, for i, i = 1,..., n, is given by: d max (x i, x i ) = max x ij x i j j=1,...,p The Canberra distance, d Canb, between x i and x i, for i, i = 1,..., n, is given by: p d Canb (x i, x ) x ij x i i = j x ij + x i j The Bhattacharyya distance, d Bhat, between x i and x i, for i, i = 1,..., n, is given by: p ( ) 2 d Bhat (x i, x i ) = x 1/2 ij x 1/2 i j j=1 j=1 Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 14 / 63

15 Proximity measures The cosine distance, d cos, between x i and x i, for i, i = 1,..., n, is given by: d cos (x i, x i ) = 1 cos (x i, x i ) where cos (x i, x i ) is the cosine of the included angle of the two random vectors, given by: cos (x i, x ) i = x i xi x i x i and denotes the Euclidean norm of a vector. The correlation distance, d cor, between x i and x i, for i, i = 1,..., n, is given by: d cor (x i, x i ) = 1 ρ ii where ρ ii is the correlation coeficient betwen x i and x i. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 15 / 63

16 Proximity measures For binary random variables with entries 0 and 1, the Hamming distance, d Hamm, between x i and x i, for i, i = 1,..., n, is given by: d Hamm (x i, x i ) = # {x ij x i j : 1 j p} p Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 16 / 63

17 Hierarchical clustering There are two types of hierarchical clustering methods: 1 In agglomerative clustering, one starts with n singleton clusters and merges clusters into larger groupings. 2 In divisive clustering, one starts with a single cluster and divides it into a number of smaller clusters. Most attention has been paid on agglomerative methods; however, arguments have been made that divisive methods can provide more sophisticated and robust clusterings. The end result of all hierarchical clustering methods is a dendogram, where the k-cluster solution is obtained by merging some of the clusters from (k + 1)- cluster solution. The result of hierarchical algorithms depend on the distance considered. In particular, when the variables are in different units of measurement and the distance used do not take into account this fact, it is better to standardize the variables. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 17 / 63

18 Hierarchical clustering The algorithm for agglomerative hierarchical clustering (agglomerative nesting or agnes) is given next: 1 Let x i, for i = 1,..., n be the observations. Then, each observation is a cluster. 2 Compute D = {d ii, i, i = 1,..., n}, the matrix that contains the distances between the n observations (clusters). 3 Find the smallest distance in D, say, d II. Merge clusters I and I to form a new cluster II. 4 Compute distances, d II,I, between the new cluster II and all other clusters I II. These distances depend upon which linkage method is used. These are detailed in the next slide. 5 Form a new distance matrix, D, by deleting rows and columns I and I and adding a new row and column II with the distances computed from step 4. 6 Repeat steps 3, 4 and 5 a total of n 1 times. At the last step, all observations are merged together into a single cluster. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 18 / 63

19 Hierarchical clustering The linkage methods to compute the distances d II,I, between the new cluster II and all other clusters I II are: Single linkage: dii,i = min {d I,I, d I,I }. Complete linkage: dii,i = max {d I,I, d I,I }. Average linkage: dii,i = i II i II d i,i / (n ii n i ), where n ii and n i are the number of items in clusters II and I, respectively. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 19 / 63

20 Hierarchical clustering The dendogram allows the user to read off the distance at which clusters are combined together to form a new cluster. Clusters that are similar to each other are combined at low distances, whereas clusters that are more dissimilar are combined at high distances. The difference in distances defines how close clusters are of each other. A partition of the data into a specified number of groups can be obtained by cutting the dendogram at an appropriate distance. If we draw a horizontal line on the dendogram at a given distance, then the number, K, of vertical lines cut by that horizontal line identifies a K-cluster solution. The intersection of the horizontal line and one of those K vertical lines then represents a cluster, and the items located at the end of all branches below that intersection constitute the members of the cluster. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 20 / 63

21 Illustrative example (I) We are going to apply the agnes algorithm to the states data set. For that, we compare the results using the Euclidean and the Manhattan distances. Therefore, we use standardized variables. The next slides shows dendograms for the solutions with these two distances and the three linkage methods (simple, complete and average). Once the solutions are obtained, scatterplot matrices with the assignments are also given. For that we consider the case of 4 groups. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 21 / 63

22 Illustrative example (I) Euclidean distance and single linkage Height Alabama Arkansas Kentucky Tennessee North Carolina Georgia Louisiana West Virginia Mississippi South Carolina Colorado Idaho Iowa Nebraska Minnesota Kansas Wisconsin South Dakota Maine New Hampshire Vermont Utah Montana Wyoming Oregon Washington North Dakota Connecticut Delaware Maryland New Jersey Massachusetts Illinois Michigan Ohio Pennsylvania Indiana Missouri Oklahoma Virginia Florida Rhode Island Nevada New York Arizona New Mexico California Hawaii Texas Alaska distances Agglomerative Coefficient = 0.77 Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 22 / 63

23 Illustrative example (I) Euclidean distance and single linkage e+00 2e+05 4e+05 Population Income Illiteracy Life.Exp Murder HS.Grad Frost e+00 5e Area Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 23 / 63

24 Illustrative example (I) Euclidean distance and complete linkage Height Alabama Georgia Louisiana Mississippi South Carolina Arkansas Kentucky Tennessee North Carolina West Virginia New Mexico Arizona Florida Virginia Delaware Maryland Massachusetts New Jersey Indiana Missouri Oklahoma Hawaii Oregon Washington California Texas Illinois Michigan Ohio Pennsylvania New York Colorado Montana Wyoming Idaho Utah Iowa Nebraska Kansas Minnesota Wisconsin Maine New Hampshire Vermont South Dakota Connecticut North Dakota Rhode Island Nevada Alaska distances Agglomerative Coefficient = 0.82 Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 24 / 63

25 Illustrative example (I) Euclidean distance and complete linkage e+00 2e+05 4e+05 Population Income Illiteracy Life.Exp Murder HS.Grad Frost e+00 5e Area Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 25 / 63

26 Illustrative example (I) Euclidean distance and average linkage Height Alabama Georgia Louisiana Mississippi South Carolina Arkansas Kentucky Tennessee North Carolina West Virginia New Mexico Arizona Florida Delaware Maryland Virginia Indiana Missouri Oklahoma Illinois Michigan Ohio Pennsylvania Colorado Montana Wyoming Idaho Utah Iowa Nebraska Minnesota Wisconsin Kansas South Dakota Maine New Hampshire Vermont Connecticut Massachusetts New Jersey North Dakota Rhode Island Oregon Washington Nevada Hawaii California New York Texas Alaska distances Agglomerative Coefficient = 0.8 Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 26 / 63

27 Illustrative example (I) Euclidean distance and average linkage e+00 2e+05 4e+05 Population Income Illiteracy Life.Exp Murder HS.Grad Frost e+00 5e Area Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 27 / 63

28 Illustrative example (I) Manhattan distance and single linkage Height Alabama Louisiana Arkansas Kentucky Tennessee North Carolina Georgia West Virginia Mississippi South Carolina Colorado Idaho Iowa Nebraska Kansas Minnesota Oregon Washington Wisconsin Utah New Hampshire Vermont Maine Montana Wyoming South Dakota North Dakota Connecticut Massachusetts Delaware Maryland New Jersey Florida Illinois Michigan Indiana Ohio Pennsylvania Missouri Virginia Oklahoma Nevada Rhode Island New York Arizona New Mexico Hawaii California Texas Alaska distances Agglomerative Coefficient = 0.71 Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 28 / 63

29 Illustrative example (I) Manhattan distance and single linkage e+00 2e+05 4e+05 Population Income Illiteracy Life.Exp Murder HS.Grad Frost e+00 5e Area Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 29 / 63

30 Illustrative example (I) Manhattan distance and complete linkage Height Alabama Louisiana Georgia Mississippi South Carolina Arkansas Kentucky Tennessee North Carolina West Virginia Arizona New Mexico Missouri Virginia Oklahoma Florida New York Illinois Michigan Indiana Ohio Pennsylvania California Texas Colorado Montana Wyoming Idaho Utah Kansas Oregon Washington Iowa Nebraska Minnesota Wisconsin North Dakota South Dakota Maine New Hampshire Vermont Rhode Island Connecticut Massachusetts New Jersey Delaware Maryland Hawaii Alaska Nevada distances Agglomerative Coefficient = 0.82 Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 30 / 63

31 Illustrative example (I) Manhattan distance and complete linkage e+00 2e+05 4e+05 Population Income Illiteracy Life.Exp Murder HS.Grad Frost e+00 5e Area Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 31 / 63

32 Illustrative example (I) Manhattan distance and average linkage Height Alabama Louisiana Georgia Mississippi South Carolina Arkansas Kentucky Tennessee North Carolina West Virginia Arizona New Mexico Texas Colorado Montana Wyoming Idaho Utah Iowa Nebraska Kansas Minnesota Wisconsin Maine New Hampshire Vermont South Dakota Oregon Washington Connecticut Massachusetts Rhode Island North Dakota Delaware Maryland New Jersey Illinois Michigan Indiana Ohio Pennsylvania Missouri Virginia Oklahoma Florida New York Nevada Hawaii California Alaska distances Agglomerative Coefficient = 0.78 Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 32 / 63

33 Illustrative example (I) Manhattan distance and average linkage e+00 2e+05 4e+05 Population Income Illiteracy Life.Exp Murder HS.Grad Frost e+00 5e Area Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 33 / 63

34 Hierarchical clustering None of the distance/linkage procedures is uniformly best for all clustering problems. Singe linkage often leads to long clusters, joined by singleton observations near each other, a result that does not have much appeal in practice. Complete linkage tends to produce many small, compact clusters. Average linkage is dependent upon the size of the clusters, while single and complete linkage do not. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 34 / 63

35 Hierarchical clustering In divisive clustering (divisive analysis or diana), the idea is that at each step, the observations are divided into a splinter group (say cluster A) and the remainder group (say cluster B). The splinter group is initiated by extracting that observation that has the largest average distance from all other observations in the data set. That observation is set up as cluster A. Given the separation of the data into A and B, we next compute, for each observation in cluster B, the following quantities: 1 the average distance between that observation and all other observations in cluster B, and 2 the average distance between that observation and all observations in cluster A. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 35 / 63

36 Hierarchical clustering Then, we compute the difference between (1) and (2) above for each observation in B. There are two possibilities: 1 If all the differences are negative, we stop the algorithm. 2 If any of these differences are positive, we take the observation in B with the largest positive difference, move it to A, and repeat the procedure. This algorithm provides with a binary split of the data into two clusters A and B. This same procedure can then be used to obtain binary splits of each of the clusters A and B separately. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 36 / 63

37 Illustrative example (I) We are going to apply the diana algorithm to the states data set. For that, we compare the results using the Euclidean and the Manhattan distances. Therefore, we use standardized variables. The next slides shows dendograms for the solutions with these two distances. Once the solutions are obtained, scatterplot matrices with the assignments are also given. For that we consider the case of 4 groups. It is not difficult to see that this algorithm points out the presence of special states. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 37 / 63

38 Illustrative example (I) Euclidean distance Height Alabama Georgia Louisiana Mississippi South Carolina Arkansas Kentucky Tennessee North Carolina West Virginia New Mexico Texas Arizona Florida Illinois Michigan Ohio Pennsylvania Maryland New Jersey Virginia California New York Colorado Montana Wyoming Nevada Connecticut Massachusetts Kansas Oregon Washington Delaware Idaho Utah Maine New Hampshire Vermont Iowa Nebraska Minnesota Wisconsin South Dakota North Dakota Indiana Missouri Oklahoma Rhode Island Hawaii Alaska distances Divisive Coefficient = 0.81 Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 38 / 63

39 Illustrative example (I) Euclidean distance e+00 2e+05 4e+05 Population Income Illiteracy Life.Exp Murder HS.Grad Frost e+00 5e Area Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 39 / 63

40 Illustrative example (I) Manhattan distance Height Alabama Louisiana Georgia Mississippi South Carolina Arkansas Kentucky Tennessee North Carolina West Virginia New Mexico Texas Arizona Florida Virginia New York Illinois Michigan Ohio Pennsylvania Indiana Missouri Maryland New Jersey California Colorado Montana Wyoming Nevada Connecticut Massachusetts Rhode Island Delaware Oklahoma Idaho Utah Iowa Nebraska Kansas Minnesota Wisconsin Oregon Washington Maine New Hampshire Vermont South Dakota North Dakota Hawaii Alaska distances Divisive Coefficient = 0.8 Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 40 / 63

41 Illustrative example (I) Manhattan distance e+00 2e+05 4e+05 Population Income Illiteracy Life.Exp Murder HS.Grad Frost e+00 5e Area Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 41 / 63

42 Partition clustering Partition methods simply split the data observations into a predetermined number K of groups or clusters, where there is no hierarchical relationship between the K-cluster solution and the (K + 1)-cluster solution. Given K, we seek to partition the data into K clusters so that the observations within each cluster are similar to each other, whereas observations from different clusters are dissimilar. Ideally, one can obtain all the possible partition of the data into K clusters and selects the best partition using some optimizing criterion. Clearly, for medium or large data sets such a method rapidly becomes infeasible, requiring incredible amount of computer time and storage. As a result, all available partition methods are iterative and work on only a few possible partitions. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 42 / 63

43 Partition clustering The k-means algorithm is the most popular partition method. Because it is extremely efficient, it is often used for large-scale clustering projects. The algorithm depends on the concept of centroid of a cluster, which is a representative point of the group. Usually, the centroid is taken as the mean of the observations in the cluster, although this is not always the choice. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 43 / 63

44 Partition clustering The algorithm is given next: 1 Let x i, for i = 1,..., n be the observations set. 2 Do one of the following: 1 Form an initial random assignment of the observations into K clusters and, for cluster k, compute its current centroid, k x. 2 Pre-specify K cluster centroids, k x, for k = 1,..., K. 3 Compute the squared Euclidean distance of each observation to its current cluster centroid and sum all of them: SSE = K (x i k x) (x i k x) k=1 c(i)=k where k x is the k-th cluster centroid and c (i) is the cluster containing x i. 4 Reassign each observation to its nearest cluster centroid so that SSE is reduced in magnitude. Update the cluster centroids after each reassignment. 5 Repeat steps 3 and 4 until no further reassignment of observations takes place. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 44 / 63

45 Partition clustering The solution (a configuration of observations into K clusters) will typically not be unique; the algorithm will only find a local minumum of SSE. It is recommended that the algorithm be run using different initial random assignments to the observations to the K clusters (or by randomly selecting K initial centroids) in order to find the lowest minimum of SSE and, hence, the best clustering solution based upon K clusters. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 45 / 63

46 Illustrative example (I) We are going to apply the k-means algorithm to the states data set. As with the hierarchical algorithms, we use standardized variables, as the algorithm uses Euclidean distances. The next slide shows scatterplot matrices with the assignments made by the algorithm. For that we consider the case of 4 groups, as previously done. We run the algorithm 25 times. In other words, we form 25 initial random assignment of the observations into 4 clusters and run the algorithm. The value of SSE attained by the algorithm is Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 46 / 63

47 Illustrative example (I) Manhattan distance e+00 2e+05 4e+05 Population Income Illiteracy Life.Exp Murder HS.Grad Frost e+00 5e Area Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 47 / 63

48 Partition clustering The partition around medoids (pam) is another partition algorithm. Essentially, pam is a modification of the k-means algorithm. This algorithm searches for K representative objects rather than the centroids among the observations in the data set. Then, the method is expected to be more robust to data anomalies such as outliers. A disadvantage of the pam algorithm is that, although it run well on small data sets, they are not efficient enough to use for clustering large data sets. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 48 / 63

49 Partition clustering The algorithm is given next: 1 Let x i, for i = 1,..., n be the observations set. 2 Compute D = {d ii, i, i = 1,..., n}, the matrix that contains the distances between the n observations. 3 Choose K observations as the medoids of K initial clusters. 4 Assign every observation to its closest medoid using the matrix D. 5 For each cluster, search the observation, x i, of the cluster (if any) that gives the largest reduction in: K SSE med = d ii k=1 c(i)=k and select this observation as the medoid for this cluster (note that SSE med only considers distances from every observation in the cluster to the medoid). 6 Repeat steps 4 and 5 until no further reduction in SSE med takes place. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 49 / 63

50 Illustrative example (I) We are going to apply the pam algorithm to the states data set. As with the previous algorithms, we use standardized variables, as we are going to use the Euclidean distance. The next slide shows scatterplot matrices with the assignments made by the algorithm. For that we consider the case of 4 groups, as previously done. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 50 / 63

51 Illustrative example (I) Euclidean distance e+00 2e+05 4e+05 Population Income Illiteracy Life.Exp Murder HS.Grad Frost e+00 5e Area Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 51 / 63

52 Model-based clustering In model-based clustering, it is assumed that the data have been generated by a mixture of K unknown distributions. Maximum likelihood estimation can be carried out to estimate the parameters of the mixture model. This is usually undertaken using the Expectation- Maximization (EM) algorithm. Then, one model parameters have been estimated, each observation is assigned to the mixture (cluster) with larger probability of having generated the observation. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 52 / 63

53 Model-based clustering Then, we assume that the data set have been generated from a mixture of distributions with pdf given by: f x (x θ) = K π k f x,k (x θ k ) k=1 where θ is a vector with all the parameters of the model, including the weights π k and the parameters of the distributions f x,k ( θ k ), denoted by θ k. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 53 / 63

54 Model-based clustering Then, for a data matrix, X, with observations x i = (x i1,..., x ip ), the likelihood function is given by: ( n n K ) l (θ X ) = f x (x i θ k ) = π k f x,k (x i θ k ) i=1 i=1 k=1 while the log-likelihood is given by: ( n K ) L (θ X ) = log π k f x,k (x i θ k ) i=1 k=1 Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 54 / 63

55 Model-based clustering Derivation of closed form expressions of the MLE of the mixture parameters is not possible, even in the case of the multivariate Gaussian distribution. Moreover, although it is possible to apply a Newton-Raphson type algorithm to solve the equalities provided by the MLE method, the usual approach is to use the EM algorithm to obtain the MLEs (see the references). Then, let π 1,..., π G and θ 1,..., θ G, be the MLE of the weights and the parameters of the group distributions, respectively, obtained with the EM algorithm. The estimated posterior probabilities that observation x i belongs to population k are obtained by applying the Bayes Theorem: ( ) π k f x,k x i θ k Pr (k x i ) = ) G g=1 π g f x,g (x i θ g The observations are assigned to the density (cluster) k with maximum Pr (k x i ). Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 55 / 63

56 Model-based clustering In model-based clustering, it is possible to select the number of groups, K, from the data set. The idea is to compare solutions with different values of K = 1, 2,... and choosing the best result. For that, we can rely on model selection criteria such as the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC). For instance, the BIC selects the number of clusters that minimizes: BIC (k) = 2 L k ( θ X ) + log (n) q ) where L k ( θ X denotes the maximized log-likelihood assuming k groups and q is the number of parameters of the model. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 56 / 63

57 Model-based clustering M-clust is a popular method to perform model-based clustering. M-clust assumes Gaussian densities and selects the optimal model according to BIC. To reduce the number of parameters to fit, M-clust works with the spectral decomposition of the covariance matrices Σ k, given by: Σ k = λ 1,k V k Λk V k, where λ 1 is the largest eigenvalue, V k is the matrix that contains the eigenvectors of Σ k and Λ k is the diagonal matrix of eigenvalues divided by λ 1. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 57 / 63

58 Model-based clustering The decompostion allows for different configurations: 1 spherical and equal volume, 2 spherical and unequal volume, 3 diagonal and equal volume and shape, 4 diagonal, varying volume and equal shape, 5 diagonal, equal volume and varying shape, 6 diagonal, varying volume and shape, 7 ellipsoidal, equal volume, shape, and orientation, 8 ellipsoidal, equal volume and equal shape, 9 ellipsoidal and equal shape, and 10 ellipsoidal, varying volume, shape, and orientation. Here (i) spherical, diagonal and ellipsoidal are relative to the covariance matrices; (ii) similar volume means that λ 1,1 = = λ 1,K ; (iii) equal shape means Λ 1 = = Λ K ; and (iv) equal orientation means V 1 = = V K. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 58 / 63

59 Illustrative example (I) For the states data set, Mclust selects a diagonal and equal shape model with 4 components. After estimating the model using the EM algorithm, the procedure compute the posterior probabilities for each country and population. The results are shown in the next two slides. The first one shows scatterplot matrices with the assignments made by the algorithm. The second one shows the first two principal components with the assignments made by the algorithm. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 59 / 63

60 Illustrative example (I) M clust solution e+00 2e+05 4e+05 Population Income Illiteracy Life.Exp Murder HS.Grad Frost e+00 5e Area Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 60 / 63

61 Illustrative example (I) Alaska Second principal component California Texas New York Florida Illinois Nevada Arizona Michigan Maryland Ohio Washington Colorado Georgia Virginia Pennsylvania New Jersey Hawaii Oregon Alabama Missouri Montana Wyoming Kansas Louisiana New Mexico Indiana Massachusetts Connecticut Minne North Carolina Delaware Tennessee Oklahoma Idaho Wisconsin Nebraska Utah Iowa North D South Carolina Mississippi Kentucky Arkansas New Hampshire South Dakota Rhode Island West Virginia Vermont Maine First principal component Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 61 / 63

62 Model-based clustering There are other alternatives procedures for model based clustering. For instance, very appealing methodologies for estimating mixtures have been given from the Bayesian point of view. These procedures include the number of groups as an additional parameter, and posterior probabilities are also provided for this number. Also, procedures based on the use of projections (projection pursuit methods) are also very popular. The idea is to project the data into different directions that separate the groups as much as possible and look for clusters in the univariate projected data. Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 62 / 63

63 Chapter outline 1 Introduction. 2 Proximity measures. 3 Hierarchical clustering. 4 Partition clustering. 5 Model-based clustering. We are ready now for: Chapter 6: Multidimensional scaling Pedro Galeano (Course 2015/2016) Multivariate Analysis - Chapter 5 Masters BAQM and ME 63 / 63

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 4: Factor analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering Pedro

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 3: Principal Component Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical

More information

New Educators Campaign Weekly Report

New Educators Campaign Weekly Report Campaign Weekly Report Conversations and 9/24/2017 Leader Forms Emails Collected Text Opt-ins Digital Journey 14,661 5,289 4,458 7,124 317 13,699 1,871 2,124 Pro 13,924 5,175 4,345 6,726 294 13,086 1,767

More information

Standard Indicator That s the Latitude! Students will use latitude and longitude to locate places in Indiana and other parts of the world.

Standard Indicator That s the Latitude! Students will use latitude and longitude to locate places in Indiana and other parts of the world. Standard Indicator 4.3.1 That s the Latitude! Purpose Students will use latitude and longitude to locate places in Indiana and other parts of the world. Materials For the teacher: graph paper, globe showing

More information

Jakarta International School 6 th Grade Formative Assessment Graphing and Statistics -Black

Jakarta International School 6 th Grade Formative Assessment Graphing and Statistics -Black Jakarta International School 6 th Grade Formative Assessment Graphing and Statistics -Black Name: Date: Score : 42 Data collection, presentation and application Frequency tables. (Answer question 1 on

More information

Correction to Spatial and temporal distributions of U.S. winds and wind power at 80 m derived from measurements

Correction to Spatial and temporal distributions of U.S. winds and wind power at 80 m derived from measurements JOURNAL OF GEOPHYSICAL RESEARCH, VOL. 109,, doi:10.1029/2004jd005099, 2004 Correction to Spatial and temporal distributions of U.S. winds and wind power at 80 m derived from measurements Cristina L. Archer

More information

Challenge 1: Learning About the Physical Geography of Canada and the United States

Challenge 1: Learning About the Physical Geography of Canada and the United States 60ºN S T U D E N T H A N D O U T Challenge 1: Learning About the Physical Geography of Canada and the United States 170ºE 10ºW 180º 20ºW 60ºN 30ºW 1 40ºW 160ºW 50ºW 150ºW 60ºW 140ºW N W S E 0 500 1,000

More information

Abortion Facilities Target College Students

Abortion Facilities Target College Students Target College Students By Kristan Hawkins Executive Director, Students for Life America Ashleigh Weaver Researcher Abstract In the Fall 2011, Life Dynamics released a study entitled, Racial Targeting

More information

, District of Columbia

, District of Columbia State Capitals These are the State Seals of each state. Fill in the blank with the name of each states capital city. (Hint: You may find it helpful to do the word search first to refresh your memory.),

More information

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially

More information

Preview: Making a Mental Map of the Region

Preview: Making a Mental Map of the Region Preview: Making a Mental Map of the Region Draw an outline map of Canada and the United States on the next page or on a separate sheet of paper. Add a compass rose to your map, showing where north, south,

More information

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially

More information

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee May 2018

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee May 2018 Cooperative Program Allocation Budget Receipts May 2018 Cooperative Program Allocation Budget Current Current $ Change % Change Month Month from from Contribution Sources 2017-2018 2016-2017 Prior Year

More information

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2017

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2017 Cooperative Program Allocation Budget Receipts October 2017 Cooperative Program Allocation Budget Current Current $ Change % Change Month Month from from Contribution Sources 2017-2018 2016-2017 Prior

More information

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2018

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2018 Cooperative Program Allocation Budget Receipts October 2018 Cooperative Program Allocation Budget Current Current $ Change % Change Month Month from from Contribution Sources 2018-2019 2017-2018 Prior

More information

Intercity Bus Stop Analysis

Intercity Bus Stop Analysis by Karalyn Clouser, Research Associate and David Kack, Director of the Small Urban and Rural Livability Center Western Transportation Institute College of Engineering Montana State University Report prepared

More information

Hourly Precipitation Data Documentation (text and csv version) February 2016

Hourly Precipitation Data Documentation (text and csv version) February 2016 I. Description Hourly Precipitation Data Documentation (text and csv version) February 2016 Hourly Precipitation Data (labeled Precipitation Hourly in Climate Data Online system) is a database that gives

More information

Chapter. Organizing and Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

Chapter. Organizing and Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc. Chapter 2 Organizing and Summarizing Data Section 2.1 Organizing Qualitative Data Objectives 1. Organize Qualitative Data in Tables 2. Construct Bar Graphs 3. Construct Pie Charts When data is collected

More information

Printable Activity book

Printable Activity book Printable Activity book 16 Pages of Activities Printable Activity Book Print it Take it Keep them busy Print them out Laminate them or Put them in page protectors Put them in a binder Bring along a dry

More information

Additional VEX Worlds 2019 Spot Allocations

Additional VEX Worlds 2019 Spot Allocations Overview VEX Worlds 2019 Spot s Qualifying spots for the VEX Robotics World Championship are calculated twice per year. On the following table, the number in the column is based on the number of teams

More information

QF (Build 1010) Widget Publishing, Inc Page: 1 Batch: 98 Test Mode VAC Publisher's Statement 03/15/16, 10:20:02 Circulation by Issue

QF (Build 1010) Widget Publishing, Inc Page: 1 Batch: 98 Test Mode VAC Publisher's Statement 03/15/16, 10:20:02 Circulation by Issue QF 1.100 (Build 1010) Widget Publishing, Inc Page: 1 Circulation by Issue Qualified Non-Paid Circulation Qualified Paid Circulation Individual Assoc. Total Assoc. Total Total Requester Group Qualified

More information

Online Appendix: Can Easing Concealed Carry Deter Crime?

Online Appendix: Can Easing Concealed Carry Deter Crime? Online Appendix: Can Easing Concealed Carry Deter Crime? David Fortunato University of California, Merced dfortunato@ucmerced.edu Regulations included in institutional context measure As noted in the main

More information

Outline. Administrivia and Introduction Course Structure Syllabus Introduction to Data Mining

Outline. Administrivia and Introduction Course Structure Syllabus Introduction to Data Mining Outline Administrivia and Introduction Course Structure Syllabus Introduction to Data Mining Dimensionality Reduction Introduction Principal Components Analysis Singular Value Decomposition Multidimensional

More information

Club Convergence and Clustering of U.S. State-Level CO 2 Emissions

Club Convergence and Clustering of U.S. State-Level CO 2 Emissions Methodological Club Convergence and Clustering of U.S. State-Level CO 2 Emissions J. Wesley Burnett Division of Resource Management West Virginia University Wednesday, August 31, 2013 Outline Motivation

More information

A. Geography Students know the location of places, geographic features, and patterns of the environment.

A. Geography Students know the location of places, geographic features, and patterns of the environment. Learning Targets Elementary Social Studies Grade 5 2014-2015 A. Geography Students know the location of places, geographic features, and patterns of the environment. A.5.1. A.5.2. A.5.3. A.5.4. Label North

More information

2005 Mortgage Broker Regulation Matrix

2005 Mortgage Broker Regulation Matrix 2005 Mortgage Broker Regulation Matrix Notes on individual states follow the table REG EXEMPTIONS LIC-EDU LIC-EXP LIC-EXAM LIC-CONT-EDU NET WORTH BOND MAN-LIC MAN-EDU MAN-EXP MAN-EXAM Alabama 1 0 2 0 0

More information

Multivariate Classification Methods: The Prevalence of Sexually Transmitted Diseases

Multivariate Classification Methods: The Prevalence of Sexually Transmitted Diseases Multivariate Classification Methods: The Prevalence of Sexually Transmitted Diseases Summer Undergraduate Mathematical Sciences Research Institute (SUMSRI) Lindsay Kellam, Queens College kellaml@queens.edu

More information

North American Geography. Lesson 2: My Country tis of Thee

North American Geography. Lesson 2: My Country tis of Thee North American Geography Lesson 2: My Country tis of Thee Unit Overview: As students work through the activities in this unit they will be introduced to the United States in general, different regions

More information

Last time: PCA. Statistical Data Mining and Machine Learning Hilary Term Singular Value Decomposition (SVD) Eigendecomposition and PCA

Last time: PCA. Statistical Data Mining and Machine Learning Hilary Term Singular Value Decomposition (SVD) Eigendecomposition and PCA Last time: PCA Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml

More information

RELATIONSHIPS BETWEEN THE AMERICAN BROWN BEAR POPULATION AND THE BIGFOOT PHENOMENON

RELATIONSHIPS BETWEEN THE AMERICAN BROWN BEAR POPULATION AND THE BIGFOOT PHENOMENON RELATIONSHIPS BETWEEN THE AMERICAN BROWN BEAR POPULATION AND THE BIGFOOT PHENOMENON ETHAN A. BLIGHT Blight Investigations, Gainesville, FL ABSTRACT Misidentification of the American brown bear (Ursus arctos,

More information

What Lies Beneath: A Sub- National Look at Okun s Law for the United States.

What Lies Beneath: A Sub- National Look at Okun s Law for the United States. What Lies Beneath: A Sub- National Look at Okun s Law for the United States. Nathalie Gonzalez Prieto International Monetary Fund Global Labor Markets Workshop Paris, September 1-2, 2016 What the paper

More information

Office of Special Education Projects State Contacts List - Part B and Part C

Office of Special Education Projects State Contacts List - Part B and Part C Office of Special Education Projects State Contacts List - Part B and Part C Source: http://www.ed.gov/policy/speced/guid/idea/monitor/state-contactlist.html Alabama Customer Specialist: Jill Harris 202-245-7372

More information

SUPPLEMENTAL NUTRITION ASSISTANCE PROGRAM QUALITY CONTROL ANNUAL REPORT FISCAL YEAR 2008

SUPPLEMENTAL NUTRITION ASSISTANCE PROGRAM QUALITY CONTROL ANNUAL REPORT FISCAL YEAR 2008 SUPPLEMENTAL NUTRITION ASSISTANCE PROGRAM QUALITY CONTROL ANNUAL REPORT FISCAL YEAR 2008 U.S. DEPARTMENT OF AGRICULTURE FOOD AND NUTRITION SERVICE PROGRAM ACCOUNTABILITY AND ADMINISTRATION DIVISION QUALITY

More information

Meteorology 110. Lab 1. Geography and Map Skills

Meteorology 110. Lab 1. Geography and Map Skills Meteorology 110 Name Lab 1 Geography and Map Skills 1. Geography Weather involves maps. There s no getting around it. You must know where places are so when they are mentioned in the course it won t be

More information

JAN/FEB MAR/APR MAY/JUN

JAN/FEB MAR/APR MAY/JUN QF 1.100 (Build 1010) Widget Publishing, Inc Page: 1 Circulation Breakdown by Issue Qualified Non-Paid Qualified Paid Previous This Previous This Total Total issue Removals Additions issue issue Removals

More information

Alpine Funds 2016 Tax Guide

Alpine Funds 2016 Tax Guide Alpine s 2016 Guide Alpine Dynamic Dividend ADVDX 01/28/2016 01/29/2016 01/29/2016 0.020000000 0.017621842 0.000000000 0.00000000 0.017621842 0.013359130 0.000000000 0.000000000 0.002378158 0.000000000

More information

Alpine Funds 2017 Tax Guide

Alpine Funds 2017 Tax Guide Alpine s 2017 Guide Alpine Dynamic Dividend ADVDX 1/30/17 1/31/17 1/31/17 0.020000000 0.019248130 0.000000000 0.00000000 0.019248130 0.013842273 0.000000000 0.000000000 0.000751870 0.000000000 0.00 0.00

More information

Summary of Natural Hazard Statistics for 2008 in the United States

Summary of Natural Hazard Statistics for 2008 in the United States Summary of Natural Hazard Statistics for 2008 in the United States This National Weather Service (NWS) report summarizes fatalities, injuries and damages caused by severe weather in 2008. The NWS Office

More information

Crop Progress. Corn Mature Selected States [These 18 States planted 92% of the 2017 corn acreage]

Crop Progress. Corn Mature Selected States [These 18 States planted 92% of the 2017 corn acreage] Crop Progress ISSN: 00 Released October, 0, by the National Agricultural Statistics Service (NASS), Agricultural Statistics Board, United s Department of Agriculture (USDA). Corn Mature Selected s [These

More information

Osteopathic Medical Colleges

Osteopathic Medical Colleges Osteopathic Medical Colleges Matriculants by U.S. States and Territories Entering Class 0 Prepared by the Research Department American Association of Colleges of Osteopathic Medicine Copyright 0, AAM All

More information

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984.

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 984. y ˆ = a + b x + b 2 x 2K + b n x n where n is the number of variables Example: In an earlier bivariate

More information

OUT-OF-STATE 965 SUBTOTAL OUT-OF-STATE U.S. TERRITORIES FOREIGN COUNTRIES UNKNOWN GRAND TOTAL

OUT-OF-STATE 965 SUBTOTAL OUT-OF-STATE U.S. TERRITORIES FOREIGN COUNTRIES UNKNOWN GRAND TOTAL Report ID: USSR8072-V3 Page No. 1 Jurisdiction: ON-CAMPUS IL Southern Illinois University - Carb 1 0 0 0 Black Hawk College Quad-Cities 0 0 1 0 John A Logan College 1 0 0 0 Rend Lake College 1 0 0 0 Aurora

More information

LABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574)

LABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574) LABORATORY REPORT If you have any questions concerning this report, please do not hesitate to call us at (800) 332-4345 or (574) 233-4777. This report may not be reproduced, except in full, without written

More information

LABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574)

LABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574) LABORATORY REPORT If you have any questions concerning this report, please do not hesitate to call us at (800) 332-4345 or (574) 233-4777. This report may not be reproduced, except in full, without written

More information

BlackRock Core Bond Trust (BHK) BlackRock Enhanced International Dividend Trust (BGY) 2 BlackRock Defined Opportunity Credit Trust (BHL) 3

BlackRock Core Bond Trust (BHK) BlackRock Enhanced International Dividend Trust (BGY) 2 BlackRock Defined Opportunity Credit Trust (BHL) 3 MUNICIPAL FUNDS Arizona (MZA) California Municipal Income Trust (BFZ) California Municipal 08 Term Trust (BJZ) California Quality (MCA) California Quality (MUC) California (MYC) Florida Municipal 00 Term

More information

Grand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional

Grand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional s by Location of Permanent Home Address and Degree Level Louisiana Acadia 19 13 0 3 0 3 0 0 0 Allen 5 5 0 0 0 0 0 0 0 Ascension 307 269 2 28 1 6 0 1 0 Assumption 14 12 0 1 0 1 0 0 0 Avoyelles 6 4 0 1 0

More information

An Analysis of Regional Income Variation in the United States:

An Analysis of Regional Income Variation in the United States: Modern Economy, 2017, 8, 232-248 http://www.scirp.org/journal/me ISSN Online: 2152-7261 ISSN Print: 2152-7245 An Analysis of Regional Income Variation in the United States: 1969-2013 Orley M. Amos Jr.

More information

High School World History Cycle 2 Week 2 Lifework

High School World History Cycle 2 Week 2 Lifework Name: Advisory: Period: High School World History Cycle 2 Week 2 Lifework This packet is due Monday, November 7 Complete and turn in on Friday for 10 points of EXTRA CREDIT! Lifework Assignment Complete

More information

Non-iterative, regression-based estimation of haplotype associations

Non-iterative, regression-based estimation of haplotype associations Non-iterative, regression-based estimation of haplotype associations Benjamin French, PhD Department of Biostatistics and Epidemiology University of Pennsylvania bcfrench@upenn.edu National Cancer Center

More information

MINERALS THROUGH GEOGRAPHY

MINERALS THROUGH GEOGRAPHY MINERALS THROUGH GEOGRAPHY INTRODUCTION Minerals are related to rock type, not political definition of place. So, the minerals are to be found in a variety of locations that doesn t depend on population

More information

Grand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional

Grand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional s by Location of Permanent Home Address and Degree Level Louisiana Acadia 26 19 0 6 1 0 0 0 0 Allen 7 7 0 0 0 0 0 0 0 Ascension 275 241 3 23 1 6 0 1 0 Assumption 13 12 0 1 0 0 0 0 0 Avoyelles 15 11 0 3

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

extreme weather, climate & preparedness in the american mind

extreme weather, climate & preparedness in the american mind extreme weather, climate & preparedness in the american mind Extreme Weather, Climate & Preparedness In the American Mind Interview dates: March 12, 2012 March 30, 2012. Interviews: 1,008 Adults (18+)

More information

Office of Budget & Planning 311 Thomas Boyd Hall Baton Rouge, LA Telephone 225/ Fax 225/

Office of Budget & Planning 311 Thomas Boyd Hall Baton Rouge, LA Telephone 225/ Fax 225/ Louisiana Acadia 20 17 3 0 0 0 Allen 2 2 0 0 0 0 Ascension 226 185 37 2 1 1 Assumption 16 15 1 0 0 0 Avoyelles 20 19 1 0 0 0 Beauregard 16 11 4 0 0 1 Bienville 2 2 0 0 0 0 Bossier 22 18 4 0 0 0 Caddo 91

More information

Insurance Department Resources Report Volume 1

Insurance Department Resources Report Volume 1 2014 Insurance Department Resources Report Volume 1 201 Insurance Department Resources Report Volume One 201 The NAIC is the authoritative source for insurance industry information. Our expert solutions

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Linear Model Selection and Regularization Recall the linear model Y = 0 + 1 X 1 + + p X p +. In the lectures

More information

Chapter 11 : State SAT scores for 1982 Data Listing

Chapter 11 : State SAT scores for 1982 Data Listing EXST3201 Chapter 12a Geaghan Fall 2005: Page 1 Chapter 12 : Variable selection An example: State SAT scores In 1982 there was concern for scores of the Scholastic Aptitude Test (SAT) scores that varied

More information

MINERALS THROUGH GEOGRAPHY. General Standard. Grade level K , resources, and environmen t

MINERALS THROUGH GEOGRAPHY. General Standard. Grade level K , resources, and environmen t Minerals through Geography 1 STANDARDS MINERALS THROUGH GEOGRAPHY See summary of National Science Education s. Original: http://books.nap.edu/readingroom/books/nses/ Concept General Specific General Specific

More information

National Organization of Life and Health Insurance Guaranty Associations

National Organization of Life and Health Insurance Guaranty Associations National Organization of and Health Insurance Guaranty Associations November 21, 2005 Dear Chief Executive Officer: Consistent with prior years, NOLHGA is providing the enclosed data regarding insolvency

More information

JAN/FEB MAR/APR MAY/JUN

JAN/FEB MAR/APR MAY/JUN QF 1.100 (Build 1010) Widget Publishing, Inc Page: 1 Circulation Breakdown by Issue Analyzed Nonpaid and Verified Paid Previous This Previous This Total Total issue Removals Additions issue issue Removals

More information

United States Geography Unit 1

United States Geography Unit 1 United States Geography Unit 1 I WANT YOU TO STUDY YOUR GEORGAPHY Name: Period: Due Date: Geography Key Terms Absolute Location: Relative Location: Demographic Map: Population Density: Sun-Belt: Archipelago:

More information

KS PUBL 4YR Kansas State University Pittsburg State University SUBTOTAL-KS

KS PUBL 4YR Kansas State University Pittsburg State University SUBTOTAL-KS Report ID: USSR8072-V3 Page No. 1 Jurisdiction: ON-CAMPUS IL PUBL TCH DeVry University Addison 1 0 0 0 Eastern Illinois University 1 0 0 0 Illinois State University 0 0 2 0 Northern Illinois University

More information

MO PUBL 4YR 2090 Missouri State University SUBTOTAL-MO

MO PUBL 4YR 2090 Missouri State University SUBTOTAL-MO Report ID: USSR8072-V3 Page No. 1 Jurisdiction: ON-CAMPUS IL American Intercontinental Universit 0 0 1 0 Northern Illinois University 0 0 4 0 Southern Illinois Univ - Edwardsvil 2 0 2 0 Southern Illinois

More information

GIS use in Public Health 1

GIS use in Public Health 1 Geographic Information Systems (GIS) use in Public Health Douglas Morales, MPH Epidemiologist/GIS Coordinator Office of Health Assessment and Epidemiology Epidemiology Unit Objectives Define GIS and justify

More information

DOWNLOAD OR READ : USA PLANNING MAP PDF EBOOK EPUB MOBI

DOWNLOAD OR READ : USA PLANNING MAP PDF EBOOK EPUB MOBI DOWNLOAD OR READ : USA PLANNING MAP PDF EBOOK EPUB MOBI Page 1 Page 2 usa planning map usa planning map pdf usa planning map Printable USA Blank Map, USA Blank Map PDF, Blank US State Map. Thursday, 19

More information

Rank University AMJ AMR ASQ JAP OBHDP OS PPSYCH SMJ SUM 1 University of Pennsylvania (T) Michigan State University

Rank University AMJ AMR ASQ JAP OBHDP OS PPSYCH SMJ SUM 1 University of Pennsylvania (T) Michigan State University Rank University AMJ AMR ASQ JAP OBHDP OS PPSYCH SMJ SUM 1 University of Pennsylvania 4 1 2 0 2 4 0 9 22 2(T) Michigan State University 2 0 0 9 1 0 0 4 16 University of Michigan 3 0 2 5 2 0 0 4 16 4 Harvard

More information

Pima Community College Students who Enrolled at Top 200 Ranked Universities

Pima Community College Students who Enrolled at Top 200 Ranked Universities Pima Community College Students who Enrolled at Top 200 Ranked Universities Institutional Research, Planning and Effectiveness Project #20170814-MH-60-CIR August 2017 Students who Attended Pima Community

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

A Summary of State DOT GIS Activities

A Summary of State DOT GIS Activities A Summary of State DOT GIS Activities Prepared for the 2006 AASHTO GIS-T Symposium Columbus, OH Introduction This is the 11 th year that the GIS-T Symposium has conducted a survey of GIS activities at

More information

Infant Mortality: Cross Section study of the United State, with Emphasis on Education

Infant Mortality: Cross Section study of the United State, with Emphasis on Education Illinois State University ISU ReD: Research and edata Stevenson Center for Community and Economic Development Arts and Sciences Fall 12-15-2014 Infant Mortality: Cross Section study of the United State,

More information

FLOOD/FLASH FLOOD. Lightning. Tornado

FLOOD/FLASH FLOOD. Lightning. Tornado 2004 Annual Summaries National Oceanic and Atmospheric Administration National Environmental Satellite Data Information Service National Climatic Data Center FLOOD/FLASH FLOOD Lightning Tornado Hurricane

More information

October 2016 v1 12/10/2015 Page 1 of 10

October 2016 v1 12/10/2015 Page 1 of 10 State Section S s Effective October 1, 2016 Overview The tables list the Section S items that will be active on records with a target date on or after October 1, 2016. The active on each item subset code

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Stem-and-Leaf Displays

Stem-and-Leaf Displays 3.2 Displaying Numerical Data: Stem-and-Leaf Displays 107 casts in your area? (San Luis Obispo Tribune, June 15, 2005). The responses are summarized in the table below. Extremely 4% Very 27% Somewhat 53%

More information

112th U.S. Senate ACEC Scorecard

112th U.S. Senate ACEC Scorecard 112th U.S. Senate ACEC Scorecard HR 658 FAA Funding Alaska Alabama Arkansas Arizona California Colorado Connecticut S1 Lisa Murkowski R 100% Y Y Y Y Y Y Y Y Y S2 Mark Begich D 89% Y Y Y Y N Y Y Y Y S1

More information

Package ZIM. R topics documented: August 29, Type Package. Title Statistical Models for Count Time Series with Excess Zeros. Version 1.

Package ZIM. R topics documented: August 29, Type Package. Title Statistical Models for Count Time Series with Excess Zeros. Version 1. Package ZIM August 29, 2013 Type Package Title Statistical Models for Count Time Series with Excess Zeros Version 1.0 Date 2013-06-15 Author Ming Yang, Gideon K. D. Zamba, and Joseph E. Cavanaugh Maintainer

More information

Green Building Criteria in Low-Income Housing Tax Credit Programs Analysis

Green Building Criteria in Low-Income Housing Tax Credit Programs Analysis Green Building Criteria in Low-Income Housing Tax Credit Programs 2010 Analysis www.globalgreen.org September 2010 Green Building Criteria in State Low-Income Housing Tax Credit Programs 2 Introduction

More information

Office of Budget & Planning 311 Thomas Boyd Hall Baton Rouge, LA Telephone 225/ Fax 225/

Office of Budget & Planning 311 Thomas Boyd Hall Baton Rouge, LA Telephone 225/ Fax 225/ Louisiana Acadia 25 19 4 2 0 0 Allen 8 7 1 0 0 0 Ascension 173 143 26 1 0 3 Assumption 14 12 2 0 0 0 Avoyelles 51 41 9 0 0 1 Beauregard 18 14 3 0 0 1 Bienville 5 0 4 0 1 0 Bossier 28 27 0 1 0 0 Caddo 95

More information

Introducing North America

Introducing North America Introducing North America I. Quick Stats Includes U.S. & Canada U.S consists of 50 States Federal Government Democracy 4 th in world w/ land area 3 rd in population Economic leader of free world II. Major

More information

Fungal conservation in the USA

Fungal conservation in the USA The following supplements accompany the article Fungal conservation in the USA Jessica L. Allen*, James C. Lendemer *Corresponding author: jlendemer@nybg.org Endangered Species Research 28: 33 42 (2015)

More information

A GUIDE TO THE CARTOGRAPHIC PRODUCTS OF

A GUIDE TO THE CARTOGRAPHIC PRODUCTS OF A GUIDE TO THE CARTOGRAPHIC PRODUCTS OF THE FEDERAL DEPOSITORY LIBRARY PROGRAM (FDLP) This guide was designed for use as a collection development tool by map selectors of depository libraries that participate

More information

CCC-A Survey Summary Report: Number and Type of Responses

CCC-A Survey Summary Report: Number and Type of Responses CCC-A Survey Summary Report: Number and Type of Responses Suggested Citation: American Speech-Language-Hearing Association. (2011). 2011 Membership survey. CCC-A survey summary report: Number and type

More information

This project is supported by a National Crime Victims' Right Week Community Awareness Project subgrant awarded by the National Association of VOCA

This project is supported by a National Crime Victims' Right Week Community Awareness Project subgrant awarded by the National Association of VOCA TABLE OF CONTENTS ACTIVITY PAGES Coloring Pages Younger Children (6) Coloring Pages Older (2) Connect the Dots (1 English, 1 Spanish) Word Search 1 (1 English, 1 Spanish) Younger Children Word Search 2

More information

State Section S Items Effective October 1, 2017

State Section S Items Effective October 1, 2017 State Section S Items Effective October 1, 2017 Overview The tables list the Section S items that will be active on records with a target date on or after October 1, 2017. The active item on each item

More information

All-Time Conference Standings

All-Time Conference Standings All-Time Conference Standings Pac 12 Conference Conf. Matches Sets Overall Matches Team W L Pct W L Pct. Score Opp Last 10 Streak Home Away Neutral W L Pct. Arizona 6 5.545 22 19.537 886 889 6-4 W5 4-2

More information

Evidence for increasingly extreme and variable drought conditions in the contiguous United States between 1895 and 2012

Evidence for increasingly extreme and variable drought conditions in the contiguous United States between 1895 and 2012 Evidence for increasingly extreme and variable drought conditions in the contiguous United States between 1895 and 2012 Sierra Rayne a,, Kaya Forest b a Chemologica Research, 318 Rose Street, PO Box 74,

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Division I Sears Directors' Cup Final Standings As of 6/20/2001

Division I Sears Directors' Cup Final Standings As of 6/20/2001 1 Stanford (Cal.) 1359 2 90 9 75 20 62.5 0 0 0 0 0 0 3 75 1 100 5 60 8 76 4 80 0 0 2 70 2 California-Los Angeles 1138 0 0 5 78.5 17 67 0 0 0 0 0 0 2 90 9 50 5 60 2 0 33 48.5 2 70 1 100 3 Georgia 890.5

More information

BOWL - APRIL 27, QUESTIONS 45 MINUTES

BOWL - APRIL 27, QUESTIONS 45 MINUTES PHYSICSBOWL AAPT/Metrologic High School Physics Contest PHYSICS BOWL - APRIL 27, 1995 40 QUESTIONS 45 MINUTES This contest is sponsored by the American Association of Physics Teachers (AAPT) and Metrologic

More information

PHYSICS BOWL - APRIL 22, 1998

PHYSICS BOWL - APRIL 22, 1998 AAPT/Metrologic High School Physics Contest PHYSICS BOWL - APRIL 22, 1998 40 QUESTIONS 45 MINUTES This contest is sponsored by the American Association of Physics Teachers (AAPT) and Metrologic Instruments

More information

14. Where in the World is Wheat?

14. Where in the World is Wheat? 14. Where in the World is Wheat? Overview Every year thousands of acres of land are planted with wheat, which provides food for people and animals around the world. However, wheat cannot be grown in all

More information

The Heterogeneous Effects of the Minimum Wage on Employment Across States

The Heterogeneous Effects of the Minimum Wage on Employment Across States The Heterogeneous Effects of the Minimum Wage on Employment Across States Wuyi Wang a, Peter C.B. Phillips b, Liangjun Su c a Institute for Economic and Social Research, Jinan University b Yale University,

More information

Physical Features of Canada and the United States

Physical Features of Canada and the United States I VIUAL Physical Features of Canada and the United tates 170 ARCTIC OCA Aleutian s 1 Bering ea ALAKA Yukon R. Mt. McKinley (20,320 ft. 6,194 m) Gulf of Alaska BROOK RAG RAG Queen Charlotte s R Vancouver

More information

Physical Features of Canada and the United States

Physical Features of Canada and the United States Physical Features of Canada and the United tates 170 ARCTIC OCA Aleutian s 1 1 Bering ea ALAKA Yukon R. Mt. McKinley (20,320 ft. 6,194 m) Gulf of Alaska BROOK RAG RAG Queen Charlotte s R Vancouver O C

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Chapter 5 Linear least squares regression

Chapter 5 Linear least squares regression Chapter 5 Linear least squares regression Consider first simple linear regression. For now, the model of interest is only the model of the conditional expectations; the distribution of the residuals is

More information

Summary of Terminal Master s Degree Programs in Philosophy

Summary of Terminal Master s Degree Programs in Philosophy Summary of Terminal Master s Degree Programs in Philosophy Faculty and Student Demographics All data collected by the ican Philosophical Association. The data in this publication have been provided by

More information

University of Florida CISE department Gator Engineering. Clustering Part 1

University of Florida CISE department Gator Engineering. Clustering Part 1 Clustering Part 1 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville What is Cluster Analysis? Finding groups of objects such that the objects

More information

HP-35s Calculator Program Lambert 1

HP-35s Calculator Program Lambert 1 Convert Latitude and Longitude to Lambert Conformal Conic Projection Co-ordinates (SPCS) Programmer: Dr. Bill Hazelton Date: July, 2010. Version: 1.2 Line Instruction Display User Instructions N001 LBL

More information