Multivariate Analysis, Clustering, and Classification

Size: px
Start display at page:

Download "Multivariate Analysis, Clustering, and Classification"

Transcription

1 Multivariate Analysis, Clustering, and Classification Jessi Cisewski Yale University Astrostatistics Summer School

2 Multivariate Analysis Statistical analysis of data containing observations each with > 1 variable measured. Examples: 1 Measurements on a star: luminosity, color, environment, metallicity, number of exoplanets 2 Functions such as light curves and spectra 3 Images 2

3 Common goals 1 Describe the p-dimensional distribution Multivariate means, variances, and covariances Multivariate probability distributions 2 Reduce the number of variables without losing significant information Linear functions of variables (principal components) 3 Investigate dependence between variables 4 Statistical inference Confidence regions, multivariate regression, hypothesis testing 5 Clustering and Classification 3

4 Organizing the data p = number of variables n = number of observations x ij = i th observation of the j th variable Observations Variables 1 2 p 1 x 11 x 12 x 1p 2 x 21 x 22 x 2p n x n1 x n2 x np 4

5 Data matrix: X = x 11 x 12 x 1p x 21 x 22 x 2p x n1 x n2 x np This can also be written as n rows or as p columns x T 1 x T X = 2. x T n = [ ] x 1, x 2,, x p 5

6 Descriptive statistics Sample mean of the j th variable x j = 1 n n i=1 x ij Sample mean vector x = x 1 x 2. x p 6

7 Sample covariance of variables i and j s ij = s ji = 1 n 1 n (x ki x i )(x kj x j ) k=1 Sample variance of the j th variable is s jj Sample covariance matrix S = s 11 s 12 s 1p s 21 s 22 s 2p s p1 s p2 s pp 7

8 Sample correlation coefficient of variables i and j r ij = s ij sii s jj r ii = 1 and r ij = r ji Sample correlation matrix R = 1 r 12 r 1p r 21 1 r 2p r p1 r p2 1 8

9 Caution Correlations measure the strength of the linear relationships between variables if such relationships are valid. 9

10 Multivariate Probability Distributions Goal: make probability statements about random vectors p-dimensional random vector: X = X 1 X 2. X p where X 1,..., X p are random variables. X is a continuous random vector if X 1,..., X p are all continuous random variables We ll focus on continuous random vectors 10

11 Three important properties of X s probability density function, f 1 f (x) 0 for all x R p (or wherever the x s take values) 2 The total area below the function f is 1: R p f (x)dx = 1 3 For all t 1, t 2,..., t p, P(X 1 t 1, X 2 t 2,..., X p t p ) = t1 t2 tp f (x)dx 11

12 Recall: an expected value is an average over the entire population Mean vector: µ = µ 1 µ 2. µ p where µ i = E(X i ) = x i f (x)d(x) R p is the mean of the ith component of X 12

13 Covariance between X i and X j : σ ij = E(X i µ i )(X j µ j ) = E(X i X j ) µ i µ j And the variance of each X i is: σ ii = E(X i µ i ) 2 = E(X 2 i ) µ 2 i Covariance matrix of X: Σ = σ 11 σ 12 σ 1p σ 21 σ 22 σ 2p σ p1 σ p2 σ pp Σ = E(X µ)(x µ) T = E(XX) T = µµ T Going forward, let s assume Σ is nonsingular 13

14 Multivariate Normal Distribution Consider the following random vector whose possible values range over all of R p : X 1 X 2 X =. X p X has a multivariate normal distribution if it has a pdf of the form [ 1 f (X) = exp 1 ] (2π) p 2 Σ (X µ)t Σ 1 (X µ) X N p (µ, Σ) 14

15 Image: 15

16 Special case: diagonal Σ σ σ2 2 0 Σ = σp 2 In this case, X 1,..., X p are mutually independent and normally distributed with density { p 1 1 2σ f (x) = (2πσj 2 e 2 (x j µ j ) 2} j )1/2 j=1 16

17 χ 2 Distribution X N p (µ, Σ) with pdf 1 f (X) = (2π) p 2 Σ 1 2 [ exp 1 ] 2 (X µ)t Σ 1 (X µ) Then (X µ) T Σ 1 (X µ) χ 2 p 17

18 Tests for multivariate normality If the data contain a substantial number of outliers then it goes against the hypothesis of multivariate normality If one variable is not normally distributed, then the full set of variables does not have a multivariate normal distribution A possible resolution is to transform the original variables to produce new variables which are normally distributed Example: Box-Cox transformations When datasets arise from a multivariate normal distribution, we can perform accurate inference on its mean vector and covariance matrix 18

19 Variables (random vector): X N p (µ, Σ) The parameters µ and Σ are unknown Data (measurements): x 1, x 2,..., x n Goal: estimate µ and Σ x is an unbiased estimator of µ (and the MLE of µ) The MLE of Σ is n 1 n S this is not unbiased (i.e. it is biased) The sample covariance matrix, S, is an unbiased estimator of Σ 19

20 Principal Components Analysis (PCA) Data: X = p-dimensional random vector with covariance matrix Σ PCA is an unsupervised approach to learning about X Principal components find directions of variability in X Can be used for visualization, dimension reduction, regression, etc. Image: 20

21 Statistical Learning Learning from data 1 Unsupervised learning 2 Supervised learning

22 One schematic for addressing problems in machine learning... Image: Andy s Computer Vision and Machine Learning Blog 22

23 Clustering Find subtypes or groups that are not defined a priori based on measurements Unsupervised learning or Learning without labels Classification Use a priori group labels in analysis to assign new observations to a particular group or class Supervised learning or Learning with labels Some content and notation used throughout derived from notes by Rebecca Nugent (CMU), Ryan Tibshirani (CMU), and textbooks Hastie et al. (2009) and James et al. (2013). 23

24 Clustering and Classification 24

25 Sample Data data1 ng <- 50 set.seed(321) #ensures same dataset g1 <- cbind(rnorm(ng,-3,.1), rnorm(ng,1,1)) g2 <- cbind(rnorm(ng,3,.1), rnorm(ng,1,1)) gtemp <- cbind(rnorm(ng*2,0,.75), rnorm(ng*2,0,.75)) rad1 <- sqrt(gtemp[,1]^2+gtemp[,2]^2) g3 <- gtemp[order(rad1)[1:ng],] g4 <- gtemp[order(rad1)[(ng+1):(2*ng)],] g5.1<-seq(-2.75,2.75, length.out = ng) g5.2 <- (g5.1/2)^2-4 g5 <- cbind(g5.1,g5.2 + rnorm(ng,0,.5)) data1<-rbind(g1,g2,g3,g4,g5) labels1<-c(rep(1,ng),rep(2,ng),rep(3,ng),rep(4,ng),rep(5,ng)) This dataset will be used to illustrate clustering and classification methodologies throughout the lecture. 25

26 Good references An Introduction to Statistical Learning (James et al. 2013) good introduction to get started using methods/not technical The Elements of Statistical Learning (Hastie et al. 2009) very thorough and technical coverage of statistical learning All of Statistics (Wasserman 2004) great overview of statistics 26

27 CLUSTERING Grouping of similar objects (unsupervised learning) members of the same cluster are close in some sense Pattern recognition Data segmentation 27

28 Clustering in Astronomy Use properties of GRBs (e.g. location in the sky, arrival time, duration, fluence, spectral hardness) to find subtypes/classes of events Image: Mukherjee et al. (1998) Regions with an excess of gamma rays that correspond to locations of dwarf galaxies could be evidence of particle dark matter Image: 28

29 Clustering set-up Notation: given vectors X = {X 1, X 2,..., X n } R p n observations in p - dimensional space Variables/features/attributes indexed by j = 1,..., p: jth variable is X j Observations indexed by i = 1,..., n: ith variable is X i Want to learn properties about the joint distribution P(X) of these vectors: organize, summarize, categorize, explain No direct measure of success (e.g. no notion of a misclassification rate) Successful if true structure is captured... Image: 29

30 General goals of clustering Partition observations such that 1 Observations within a cluster are similar Compactness Property 2 Observations in different clusters are non similar Closeness Property Typically want compact clusters that are well-separated 30

31 Dissimilarity Measure Characterizes degree of closeness Dissimilarity matrix D = {d ii } such that d ii = 0 and d j ii = d(x ij, x i j) Some examples of d j ii are (x ij x i j) 2 or x ij x i j D ii = D(X i, X i ) = p j=1 w j d j ii where p j=1 w j = 1 31

32 Dissimilarity Measure: within cluster variation Total cluster variability = 1 2 = 1 2 n n i=1 i =1 K D ii k=1 C(i)=k D ii + C(i )=k C(i ) k D ii where C(i) = k is the assignment of observation i to cluster k Total within cluster variability: 1 K 2 k=1 C(i)=k C(i )=k D ii Total between cluster variability: 1 K 2 k=1 C(i)=k C(i ) k D ii 32

33 Clustering methods 1 Combinatorial algorithms 1 K - means clustering 2 Hierarchical clustering 2 Mixture modeling/statistical clustering (parametric) 3 Mode seeking/bump Hunting/Statistical clustering (nonparametric) 33

34 K-means Clustering Image: 34

35 K-means clustering Main idea: partition observations in K separate clusters that do not overlap 35

36 K-means clustering - procedure Goal: minimize total within-cluster scatter using D ii = p j=1 (x ij x i j) 2 = X i X i 2. Then the within-cluster scatter is written as 1 2 K k=1 C(i)=k C(i )=k X i X i 2 = K C k k=1 C(i)=k X i X k 2 C k = number of observations in cluster C k X k = ( X k 1,..., X k p ) 36

37 K-means clustering - recipe Pick K (number of clusters) Select K centers Alternate between the following: 1 Assign each observation to closest center 2 Recalculate centers R: kmeans(data, K, nstart = 20) 37

38 Simple code: K-means set.seed(321) g1 <- cbind(rnorm(50,0,1), rnorm(50,-1,.1)) g2 <- cbind(rnorm(50,0,1), rnorm(50,1,.1)) g3 <- cbind(rnorm(50,0,1), rnorm(50,3,.1)) data.points <- rbind(g1, g2, g3) km.out <- kmeans(data.points,3, nstart = 20) plot(data.points, pch = 16, xlab = "X1", ylab = "X2", col = km.out$cluster) points(km.out$centers[,1], km.out$centers[,2],pch=4,lwd=4, col="blue", cex = 2) Try moving the clusters closer together (in terms of X 2) and see what happens with the K-means clusters. 38

39 K-means clustering - determining K Choose the k that has the last significant reduction in the within groups sum of squares (i.e. find the elbow ) set.seed(123); wss <- c(); number.k <- c(1:15) for(ii in number.k){km0 <- kmeans(data1,ii,nstart = 20) wss[ii] <- km0$tot.withinss} plot(number.k, wss, xlab="k",ylab="wss", type = "b", pch = 19) 39

40 K-means clustering - tips Can be unstable; solution depends on starting set of centers finds local optima, but want global optima start with different centers kmeans(x, K, nstart = 20) Cluster assignments are strict no notion of degree or strength of cluster membership Possible lack of interpretability of centers centers are averages: - OK for clustering things like apartment prices based on price, square footage, distance to nearest grocery store - But what if observations are images of faces? Images: //mrnussbaum.com, 40

41 K-means clustering - tips, continued. Influenced by outliers use medoids - the observation in the data set whose average distance to all other observations is minimal medoids is more computationally intensive centers are actual observations leading to better interpretability R: pam(data, K) ( partitioning around medoids ) in cluster package 41

42 Hierarchical Clustering Image: 42

43 Hierarchical vs. Flat Partitions 1 Flat partitioning (e.g. K-means clustering) partitions data into K clusters; K determined by user no sense of the relationships among the clusters 2 Hierarchical partitioning Generates a hierarchy of partitions; user selects the partition P 1 = 1 cluster,..., P n = n clusters (agglomerative clustering) Partition P i is the union of one or more clusters from Partition P i+1 43

44 Hierarchical clustering - recipe Define a dissimilarity d kk = d(c k, C k ) between clusters C k and C k as a function of a distance between points in the clusters 1 Start with every observation in its own cluster 2 Find min d(c k, C k ) merge C k and C k (Minimum is across all pairs of clusters) 3 Repeat until only one cluster remains R: hclust(distances) 44

45 Hierarchical clustering - distances 1 Friends-of-friends = Single-linkage clustering: intergroup distance is smallest possible distance d(c k, C k ) = min d(x, y) x C k,y C k 2 Complete-linkage clustering: intergroup distance is largest possible distance d(c k, C k ) = max d(x, y) x C k,y C k 3 Average-linkage clustering: average intergroup distance 4 Ward s clustering d(c k, C k ) = Ave x Ck,y C k d(x, y) d(c k, C k ) = 2 ( C k C k ) X Ck X Ck 2 C k + C k 45

46 Single-linkage clustering 46

47 data1 47

48 data1 (K = 4 ) 47

49 Clustering Recap 1 Algorithmic clustering (no statistical assumptions) 1 K-means 2 Hierarchical linkage 2 Statistical clustering 1 Parametric - associates a specific model with the density (e.g. Gaussian, Poisson) parameters associated with each cluster 2 Nonparametric - looks at contours of the density to find cluster information (e.g. kernel density estimate) 48

50 Mixture Modeling/ Parametric Statistical Clustering Image: Li and Henning (2011) 49

51 Model-based clustering (parametric) Assumes that each subgroup/cluster/component in the population has its own density K p(x ) = π k p k (X ; θ k ) k=1 where K k=1 π k = 1 and 0 π k 1. Entire dataset is modeled by a mixture of these distributions. p(x) = 0.5φ(x; 4; 1) + 0.5φ(x; 0; 1) (middle plot) p(x) = 0.75φ(x; 4; 1) φ(x; 0; 1) (right plot) 50

52 Model-based clustering: Gaussian example Suppose there are K clusters - each cluster is modeled by a particular distribution (e.g. a Gaussian distribution with parameters µ k, Σ k ) The density of each cluster k is p k (X ) = φ(x µ k, Σ k ) ( 1 = (2π) p Σ k exp (X µ k) T Σ 1 k (X µ ) k) 2 Letting π k be the weight of cluster k, the mixture density is p(x ) = K π k p k (X ) = k=1 K π k φ(x µ k, Σ k ) k=1 51

53 Model-based clustering: advantages 1 Well-studied statistical inference techniques available 2 Flexibility in choosing the component distributions 3 Obtain a density estimate for each cluster 4 Soft classification is available 52

54 Model-based clustering: fitting the model Suppose there are K clusters - each cluster is modeled by a particular distribution (e.g. a Gaussian distribution with parameters µ k, Σ k ) Expectation - Maximization (EM) algorithm Finds maximum likelihood estimates in incomplete data (e.g. missing cluster labels). Alternates between expectation step and maximization steps: E-step: compute conditional expectation of the cluster labels M-step: maximize the likelihood and estimate parameters given the current labels; update parameter estimates 53

55 Model-based clustering: EM algorithm Observations {X 1, X 2,..., X n } are incomplete (i.e. no labels) Complete observations: {(X 1, Y 1 ), (X 2, Y 2 ),..., (X n, Y n )} (where Y i are the labels) The collections of parameters, θ are (π k, µ k, Σ k ) for k = 1,..., K The log-likelihood function is ( n K ) l(x 1,..., X n θ) = log π k φ(x i µ k, Σ k ) i=1 k=1 l(x 1,..., X n θ) is the objective function of the EM algorithm. Numerical difficulty comes from the sum inside the log. Goal: get MLEs for (π k, µ k, Σ k ), k = 1,..., K 54

56 Nonparametric Clustering Image: Feigelson et al. (2011) 55

57 Nonparametric clustering Associate groups with high frequency areas groups in the population correspond to modes of the density p(x) Goal: find the modes of the density p(x), or ˆp(x). Assign observations to the domain of attraction of a mode. 56

58 Nonparametric clustering Associate groups with high frequency areas groups in the population correspond to modes of the density p(x) Goal: find the modes of the density p(x), or ˆp(x). Assign observations to the domain of attraction of a mode. 56

59 Nonparametric clustering Associate groups with high frequency areas groups in the population correspond to modes of the density p(x) Goal: find the modes of the density p(x), or ˆp(x). Assign observations to the domain of attraction of a mode. NP clustering is very dependent on the density estimate ˆp(x) 56

60 Final comment on selecting K Slide from Ryan Tibshirani s lecture, which was from George Cassella s CMU seminar on 1/16/

61 Final comment on selecting K Slide from Ryan Tibshirani s lecture, which was from George Cassella s CMU seminar on 1/16/

62 Concluding Remarks: clustering Clustering - unsupervised/no labels find structure 1 K - means 2 Agglomerative hierarchical clustering 3 Parametric/Nonparametric 58

63 CLASSIFICATION Build a model, classifier, etc. to separate data into known groups/classes (supervised learning) Response variable is not continuous want to predict labels 1 Bayes classifiers 2 K Nearest Neighbors (KNN) classifiers 3 Logistic regression 4 Linear Discriminant Analysis (LDA) 5 Support Vector Machines (SVM) 6 Tree classifiers 59

64 Classification in Astronomy Stars can be classified into OBFGKMLTY Image: Classification of Active galactic nuclei (e.g. Starburst, Seyfert (I or II), Quasars, Blazars, BL Lac, OVV, Radio) Galaxies can be classified into Hubble morphological types (E/S0/S/Irr), clustering environments, and by star formation activity Image: 60

65 Classification set-up Notation: given vectors X = {X 1, X 2,..., X n } R p and class labels Y = {y 1, y 2,..., y n } the y i s are qualitative let ŷ i be the predicted class label for observation i main interest is P(Y X) The classification training error rate is often estimated using a training dataset as 1 n I(y i ŷ i ) n i=1 where I( ) is the indicator function. The classification test error rate is often estimated using a test dataset, (x test, y test ) as E (I(y test ŷ test )) good classifiers have small test errors 61

66 Bayes Classifiers Test error is minimized by assigning observations with predictors x to the class that has the largest probability: for classes j = 1,..., J P(Y = j X = x) If there are two classes (J = 2), the Bayes decision rule given predictors x is { class 1 if P(Y = 1 X = x) > 0.50 ŷ i = class 2 if P(Y = 2 X = x) > 0.50 In general, intractable because the distribution of Y X is unknown. 62

67 Image: 63

68 K Nearest Neighbors (KNN) Main idea: An observation is classified based on the K observations in the training set that are nearest to it A probability of each class can be estimated by P(Y = j X = x) = K 1 I(y i = j) i N(x) where j = 1,..., #classes in training set, and I = indicator function. K = 3 nearest neighbors to the X are within the circle. The predicted class of X would be blue because there are more blue observations than green among the 3 NN. 64

69 KNN: data1 R: knn(training.set, test.set, training.set.labels, K) in class package 65

70 KNN: data1 decision boundary Using all observations of data1, K = 1 66

71 Linear Classifiers Decision boundary is linear Logistic regression Linear Discriminant Analysis Image: 67

72 Logistic Regression Predicting two groups: binary labels { 1 with probability π Y = 0 with probability 1 π E[Y ] = P(Y = 1) = π We assume the following model for logistic regression ( ) π logit(π) = log = β 0 + β T X 1 π where β 0 R, β R p 68

73 Logistic Regression, continued. ( ) π logit(π) = log = β 0 + β T X 1 π = π = eβ 0+β T X 1 + e β 0+β T X Can fit β 0,..., β p via MLE l(β 0, β) = n i=1 log (P(Y = y i X = x i )) = ( ˆβ 0, ˆβ) = argmax β 0 R,β R p n i=1 [ ] y i (β 0 + β T x i ) log(1 + e β 0+β T x i ) 69

74 Logistic Regression for data1 logit.labels = matrix(0, nrow = length(labels1)) logit.labels[c(which(labels1 == 2), which(labels1 == 5))] <- 1 logit.fit <- glm(logit.labels ~ data1[,1]+data1[,2],family = binomial) summary(logit.fit) #<---provides details of the fit model logit.fit.probs <- predict(logit.fit, type = "response") logit.class = matrix(0,nrow = length(logit.fit.probs)) logit.class[logit.fit.probs >=.5] = 1 par(mfrow = c(1,2)) plot(data1, xlim = c(-4,4), ylim = c(-5,4), xlab = "X1", ylab = "X2", col = logit.labels+1, pch = logit.labels+1, lwd = 3, main = "True classes") plot(data1, xlim = c(-4,4), ylim = c(-5,4), xlab = "X1", ylab = "X2", col = logit.class+1, pch = logit.class+1, lwd = 3, main = "Predicted classes") 70

75 Logistic Regression for data1: fit model logit(ˆπ ) = ˆβ 0 + ˆβ 1 x 1 + ˆβ 2 x 2 = x x 2 71

76 Multiple classes What do we do if our response is not binary (i.e. J > 2)? There are extensions of logistic regression for multiple classes: Multiple - class logistic regression Linear Discriminant Analysis (LDA) is another option 72

77 Discriminant Analysis Goal: estimate a decision boundary that gives a classification rule Basic idea: estimate the posterior probabilities of class membership If an observation is in a particular location, what is the probability it belongs to a particular class Bayes Rule P(Y = j x) = π j p j (x) J j=1 π j p j (x) π j = prior probabilities of class j p j (x) = P(X = x Y = j) Need a way to estimate p j (x) in order to do classification 73

78 Example using Bayes Rule Consider X = 1.9, P(Y = 1) =

79 Linear Discriminant Analysis (LDA) Multivariate Gaussian p j (X ) = φ(x µ j, Σ j ) ( 1 = (2π) p Σ j exp (X µ j) T Σ 1 ) j (X µ j ) 2 LDA assumes all covariance matrices are equal (Σ 1 = = Σ J ) 75

80 LDA continued A predictor x is classified in the class j = 1,..., J according to the following: argmax {P(Y = j X = x)} = j = argmax {φ(x µ j, Σ)π j } j = argmax {log (φ(x µ j, Σ)π j )} j. = argmax j = argmax j { 1 } 2 (x µ j) T Σ 1 (x µ j ) + log(π j ) { x T Σ 1 µ j 1 } 2 µt j Σ 1 µ j + log(π j ) 76

81 LDA for data1 R: lda(y x) 77

82 Quadratic Discriminant Analysis (QDA) Assuming a common covariance matrix, Σ, is not always reasonable. Allowing for different covariance matrices, Σ j, a predictor x is classified in the class j = 1,..., J according to the following: argmax {P(Y = j X = x)} = j = argmax {φ(x µ j, Σ j )π j } j = argmax {log (φ(x µ j, Σ j )π j )} j. = argmax j = argmax j { 1 } 2 (x µ j) T Σ 1 j (x µ j ) + log(π j ) { 1 2 x T Σ 1 j x + x T Σ 1 µ j 1 2 µt j Σ 1 j } j µ j + log(π j ) 78

83 QDA for data1 R: qda(y x) 79

84 Support Vector Machines 80

85 Support Vector Machines (SVM) Goal: Find the hyperplane that best separates the two classes (i.e. maximize the margin between the classes) Data: {x i, y i }, i = 1,..., n, and y i = { 1, 1} Image: A separating hyperplane does not always exist incorporate a parameter to penalize misclassifications 81

86 (X i w + b)y i 1 ξ i, ξ i 0 for i = 1,, n 1 n ) minimize( 2 w 2 + C ξ i i=1 ξ i is capturing the degree to which observation i is misclassified C is a misclassification penalty Image: 82

87 SVM kernel trick Image: 83

88 SVM for data1 (linear) - A R: svm(x,y,kernel = linear, class.weights) in e1071 package 84

89 SVM for data1: fit model (linear) - A 85

90 SVM for data1 (linear) - B R: svm(x,y,kernel = linear, class.weights) in e1071 package 86

91 SVM for data1: fit model (linear) - B 87

92 SVM for data1 (radial basis) R: svm(x,y,kernel = radial, class.weights) in e1071 package 88

93 SVM for data1: fit model (radial basis) 89

94 Classification Trees Image: Image: 90

95 Classification trees Goal: determine which variables are best at separating observations into the labeled groups 1 Predictor space is partitioned into hyper-rectangles 2 Any observations in the hyper-rectangle would be predicted to have the same label 3 Next split is chosen to maximize purity of hyper-rectangles Tree-based methods are not typically the best classification methods based on prediction accuracy, but they are often more easily interpreted (James et al. 2013) CART = Classification and Regression Trees 91

96 Classification Tree for data1 92

97 Classification Trees - recipe Recipe Start at the top of the tree and use recursive binary splitting to grow the tree At each split, determine the classification error rate for the new region (i.e. the number of observations that are not of the majority class in their region) The Gini Index or cross-entropy are better for node purity. R: tree(y x) in tree package. 93

98 Classification Trees - remarks Tree pruning - the classification tree may be over fit, or too complex; pruning removes portions of the tree that are not useful for the classification goals of the tree. Bootstrap aggregation (aka bagging ) - there is a high variance in classification trees, and bagging (averaging over many trees) provides a means for variance reduction. (Boosting is another approach, but grows trees sequentially rather using a bootstrapped sample.) Random forest - similar idea to bagging except it incorporates a step that helps to decorrelate the trees. 94

99 Concluding Remarks Clustering - unsupervised/no labels find structure 1 K - means/ K - medoids 2 Agglomerative hierarchical clustering 3 Parametric/Non-parametric Classification - supervised/labels predict classes 1 KNN 2 Logistic regression 3 LDA/QDA 4 Support Vector Machines 5 Tree classifiers Clustering and classification are useful tools, but need to be familiar with assumptions that go into the methods 95

100 Bibliography Feigelson, E. D., Getman, K. V., Townsley, L. K., Broos, P. S., Povich, M. S., Garmire, G. P., King, R. R., Montmerle, T., Preibisch, T., Smith, N., et al. (2011), X-ray Star Clusters in the Carina Complex, The Astrophysical Journal Supplement Series, 194, 9. Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., and Tibshirani, R. (2009), The elements of statistical learning, vol. 2, Springer. James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013), An Introduction to Statistical Learning with Applications in R, vol. 1 of Springer Texts in Statistics, Springer. Li, H.-b. and Henning, T. (2011), The alignment of molecular cloud magnetic fields with the spiral arms in M33, Nature, 479, Mukherjee, S., Feigelson, E. D., Babu, G. J., Murtagh, F., Fraley, C., and Raftery, A. (1998), Three types of gamma-ray bursts, The Astrophysical Journal, 508, 314. Wasserman, L. (2004), All of statistics: a concise course in statistical inference, Springer. 96

Clustering and Classification in Astronomy

Clustering and Classification in Astronomy Clustering and Classification in Astronomy Jessi Cisewski Carnegie Mellon University Wednesday, June 4, 2014: 10:30 AM - 12 PM Statistical Learning Learning from data 1 Unsupervised learning 2 Supervised

More information

Lectures in AstroStatistics: Topics in Machine Learning for Astronomers

Lectures in AstroStatistics: Topics in Machine Learning for Astronomers Lectures in AstroStatistics: Topics in Machine Learning for Astronomers Jessi Cisewski Yale University American Astronomical Society Meeting Wednesday, January 6, 2016 1 Statistical Learning - learning

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes Week 10 Based in part on slides from textbook, slides of Susan Holmes Part I Linear regression & December 5, 2012 1 / 1 2 / 1 We ve talked mostly about classification, where the outcome categorical. If

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Course in Data Science

Course in Data Science Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Discriminant analysis and classification Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 Consider the examples: An online banking service

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13 Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Machine Learning: A Statistics and Optimization Perspective

Machine Learning: A Statistics and Optimization Perspective Machine Learning: A Statistics and Optimization Perspective Nan Ye Mathematical Sciences School Queensland University of Technology 1 / 109 What is Machine Learning? 2 / 109 Machine Learning Machine learning

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes. Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average

More information

day month year documentname/initials 1

day month year documentname/initials 1 ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Classification using stochastic ensembles

Classification using stochastic ensembles July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics

More information

Unsupervised clustering of COMBO-17 galaxy photometry

Unsupervised clustering of COMBO-17 galaxy photometry STScI Astrostatistics R tutorials Eric Feigelson (Penn State) November 2011 SESSION 2 Multivariate clustering and classification ***************** ***************** Unsupervised clustering of COMBO-17

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Does Modeling Lead to More Accurate Classification?

Does Modeling Lead to More Accurate Classification? Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang

More information

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems

More information

A Study of Relative Efficiency and Robustness of Classification Methods

A Study of Relative Efficiency and Robustness of Classification Methods A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics

More information

Lecture 5: Classification

Lecture 5: Classification Lecture 5: Classification Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical Sciences Binghamton

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14 STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

More on Unsupervised Learning

More on Unsupervised Learning More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your

More information

Classification: The rest of the story

Classification: The rest of the story U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

More information

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Decision trees COMS 4771

Decision trees COMS 4771 Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 55 Applied Machine Learning Lecture 5: Generative models for linear classification Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material

More information

UVA CS 4501: Machine Learning

UVA CS 4501: Machine Learning UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course

More information