High-dimensional data: Exploratory data analysis

Size: px
Start display at page:

Download "High-dimensional data: Exploratory data analysis"

Transcription

1 High-dimensional data: Exploratory data analysis Mark van de Wiel Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University Contributions by Wessel van Wieringen

2 Intro High-Dimensional Data

3 High-dimensional data: Definition High-dimensional data: Data for which the number of variables, p, exceeds the number of observations, n. Examples Genomics data. E.g. measurements on all human genes, p=25,000, for say n=00 individuals Imaging data (fmri). Thousands or millions of pixels (or voxels) for hundreds of individuals Astronomy. Terabytes of data for a limited number of galaxies / black holes / etc.

4 High-dimensional data: practical challenges Inference Which genes are differentially expressed between cancer and normally tissue? Visualization How to visualize high-dimensional observations (samples) and discover subgroups? Prediction Probability of tumor recurrence given the genomic profile of the primary tumor (baseline) Functional relationships Which brain regions interact functionally? Expression refers to a (relative) quantification of the gene in a cell/tissue/sample

5 High-dimensional data: statistical challenges Inference Statistical models; Multiple testing; shrinkage: borrowing information across features Visualization Clustering, principle component analysis Prediction Fitting ordinary regression models is not feasible: penalized regression; machine learning approaches Functional relationships Construction on networks which describe such relationships

6 Slides: Wessel van Wieringen Exploratory analysis I: Hierarchical clustering

7 Hierarchical clustering Objective of cluster analysis Cluster analysis seeks meaningful data-determined groupings of samples, s.t. samples are more similar within than across groups, this similarity in gene expression profiles is assumed to imply some form of phenotypic similarity of the samples. Cluster analysis is also known as: unsupervised learning, unsupervised classification, class discovery, and data segmentation

8 Hierarchical clustering Hierarchical clustering produces a nested sequence of clusters. It start with all objects apart, and at each step two clusters are merged until only one is left. The nested sequence can be represented by a dendogram. A dendogram is a twodimension diagram, a tree. Each fusion of clusters is plotted at a height equal to the dissimilarity of the two clusters which are joined.

9 Hierarchical clustering Building a dendogram (loosely): Find samples that have most similar gene expression profiles. expression gene expression gene sample 2 sample 6 sample 5 sample 4 sample 3 sample

10 Hierarchical clustering Building a dendogram (loosely): Samples and 3 have most similar gene expression profiles. Let these samples form a cluster. Repeat this exercise. expression gene expression gene sample 2 sample 6 sample 5 sample 4 sample 3 sample

11 Hierarchical clustering Building a dendogram (loosely): Look for samples or clusters that have most similar gene expression profiles. expression gene expression gene sample 2 sample 6 sample 5 sample 4 sample 3 sample

12 Hierarchical clustering Building a dendogram (loosely): New clusters may form: samples 2 and 6. expression gene expression gene sample 2 sample 6 sample 5 sample 4 sample 3 sample

13 Hierarchical clustering Building a dendogram (loosely). expression gene expression gene sample 2 sample 6 sample 5 sample 4 sample 3 sample

14 Hierarchical clustering Building a dendogram (loosely): Finally, all samples/clusters are merged into one big cluster. expression gene expression gene sample 2 sample 6 sample 5 sample 4 sample 3 sample

15 Hierarchical clustering Heatmap A dendrogram is often used in combination with heatmap. Heatmap of a expression matrix A heat map is a graphical representation of data where the values taken by a variable in a two-dimensional map are represented as colors

16 Hierarchical clustering Expression matrix Heatmap S_ S_i S_50 g_ g_ g_j g_ expression of gene j in sample i

17 Hierarchical clustering samples Visualization of hierarchical clustering results: dendrogram and heatmap combined genes

18 Hierarchical clustering Hierarchical clustering of genes Genes that cluster together are believed to be functionally related (modules / pathway / GO node). This may help to characterize unknown genes. samples May also cluster samples and genes simultaneously. genes

19 Hierarchical clustering Distance Central to cluster analysis is the notion of distance (or dissimilarity) between objects being clustered. Distance measures take on values between 0 and : 0 reflects maximum similarity between two samples, means that two samples are not similar at all, and values inbetween indicate various degrees of resemblance.

20 Hierarchical clustering Some distance measures (for continuous data) Data: Y ij, column vectors (samples): Y.j Euclidean distance: Manhattan distance: d d p 2 E (Y.j,Y.k ) = (Y i ij - Yik ) = p M (Y.j,Y.k ) = Y i ij - Yik = (Pearson) correlation: d C (Y.j,Y.k ) = p i= (Y ij - Y.j )(Y ik p 2 p (Y i ij - Y.j ) = i= - (Y Y ik.k ) - Y.k ) 2

21 Hierarchical clustering Distance between clusters Distance measures are defined between two samples. In hierarchical clustering, also the distance between groups of samples (clusters) needs to be assessed. Linkage tells us how to do that. Cluster A Cluster B

22 Hierarchical clustering Cluster A Cluster B Single linkage Minimum distance Average linkage Average distance Complete linkage Maximum distance

23 Hierarchical clustering Effects of linkage Complete yields a more compact clustering. Complete Single Average

24 Exploratory analysis II: Principle Component Analysis (PCA)

25 Principle component analysis (PCA) Samples/Individuals, e.g n=8 Features/Genes, e.g. p = Group Group 2 Now suppose we pretend to not see the groups...

26 Principle component analysis (PCA) Samples/Individuals, e.g n= Features/Genes, e.g. p = Challenge: if the genomics data is relevant for the underlying grouping we should be able to observe this after visualization. Solution : Clustering (but depends a lot on ad hoc choices...) Solution 2: Principle Component Analysis (PCA)

27 Principle component analysis (PCA) Two-gene world Gene 2 Gene 2 Gene Gene But how to obtain a similar visualisation for gene dimension p=25,000?

28 Principle component analysis (PCA) Principle components (PC) Y ij : data k th PC: Z j k : linear combination p i= k w i Yij First PC: argmax s.t. w w 2 Var( = p p i= i= w Y (w i i ) 2 ij ) = k th PC: as above, but additional orthogonality constraint: argmax s.t. I) w II)w w k k k Var( 2.w h = p p i= i= k w Y (w i k i = = 0, h =,...,k- ) ij 2 )

29 Principle component analysis (PCA) Var( p i = w i Y ij ) = w T Σ pxp w max(w T Σ pxp w), s.t. w T w = Introduce Lagrange multiplier to deal with constraint: L(w) = dl(w) dw w T Σ pxp = 0 w-(λ w 2w T Σ T T pxp w-) - 2λw T = 0 Σ pxp w = λw Eigenvectors z= w are the solutions, z max corresponding to maximum eigenvalue λ max renders global maximum (simply substitute: Σ pxp = λ).

30 Principle component analysis (PCA) Efficient computation of PCs () Required: eigenvalues of Σ and orthonomal eigenvectors z. Solution: singular value decomposition (SVD). First, X = Y T, X = nxp X T X = YY T = (n -) pxp Then, SVD is a factorisation of X into U: orthonormal nxn matrix, D: rectangular nxp diagonal matrix 2, W: orthonormal pxp matrix X = UDW T X T X = WD 2 W T Assume wlog each column of Y is centered: mean(y) = 0 2 matrix consisting of (p-n) 0 n -columns and nxn diagonal matrix

31 Principle component analysis (PCA) Efficient computation of PCs (2) X = UDW T X T X = WD 2 W T The latter is the eigenvalue decomposition of the symmetric pxp matrix X T X. Problem: p is large. However: Y = X T = (UDW T ) T = WD T U T Y T Y = U(D T ) 2 U T Y T Y is of dimension nxn, n is small. So solution:. Eigenvalue decompostion of Y T Y renders D and U 2. YU = WD T or k th PC: column W.k = [YU].k / D kk, where k corresponds to the k th largest eigenvalue. using standard algorithms for finding eigenvalues and eigenvectors

32 Principle component analysis (PCA) Principle components (PC) Y ij : data k th PC: Z j k : linear combination p i= k w i Yij First PC: argmax s.t. w w 2 Var( = p p i= i= w Y (w i i ) 2 ij ) = k th PC: as above, but additional orthogonality constraint: argmax s.t. I) w II)w w k k k Var( 2.w h = p p i= i= k w Y (w i k i = = 0, h =,...,k- ) ij 2 )

33 Principle component analysis (PCA) Visualization k th principle component for j: PC k (j) = where Y ij : data for individual j p i= w k Y i ij Plot PC (j) vs PC 2 (j) for all individuals j=,..., n In words: For each individuals plot the coordinates of those two orthogonal summaries of the p-dimensional data that explain most of variation between individuals If a group label associates strongly with the p- dimensional data one may expect to observe that the groups are separated by (a combination of) the two PCs

34 Principle component analysis (PCA) Application: Colon Cancer. Black: Healthy colon tissue Green: Tumor colon tissue Measurements: ~ 2000 microrna expressions small pieces of mrna, which can degrade mrna genes

35 Efficient parameter estimation in p models: shrinkage Mark van de Wiel mark.vdwiel@vumc.nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University Many slides courtesy of Wessel van Wieringen

36 Data: Setting Samples/Individuals, e.g n=8 X T = Features/Genes, e.g. p = Group Group 2 Model per gene : X ij = β 0j + β j Z i (i=sample, j=gene, Z i = 0 when sample i in group and, otherwise) How to efficiently estimate β j?

37 James-Stein estimator

38 James-Stein estimator JS example Let X i = (X i,, X ip ) T be a p-variate normally distributed random variable: The mean vector μ is estimated from a random sample of size n using the quadratic loss function: Then, the least squares (LS) estimate of μ:

39 James-Stein estimator JS example (continued) The (total) mean squared error (MSE) of this estimator: for independent X j s. This does not involve μ. Recall, in general, for any estimator of μ: Hence, the MSE is a measure of the quality of the estimator.

40 James-Stein estimator The James-Stein (JS) estimator is an estimator that outperforms the ML estimator, in the sense that it yields a smaller MSE. The JS estimator is of the form: where is the original estimator, is a target estimator, and is the shrinkage parameter, which determines how much the two estimators are pooled.

41 James-Stein estimator The JS estimator is of the form: e.g. :sample variance for a given gene and :a pooled variance estimate across all genes λ Estimate Pooled estimate Shrinkage Estimate

42 James-Stein estimator The MSE of the JS estimator can be expressed as MSE(θˆ(λ)) = MSE(θˆ MSE(θˆ 2 λ E[(θˆ j j (λ)) = E[ θˆ p j= target,j j (λ)), ( ) 2 (θˆ j θ j) λ(θˆ j θˆ target,j) ] = MSE(θˆ j) + ) ] 2λ{ Eθˆ Eθˆ θˆ θ E[θˆ θˆ ]} 2 2 j j target,j j j target,j This is a parabola in, whose parameters are determined by the first two moments of both estimators.

43 James-Stein estimator MSE = f(λ) interval leading to MSE decrease optimal shrinkage no shrinkage full shrinkage

44 James-Stein estimator Simulation n samples p genes X ij ~ N(μ j, ) μ j ~ N(0, τ 2 ) Investigate shrinkage effect under 3 different scenario s: I : vary τ p = 00, n = 40, τ = 0., 0.2, 0.4 II : vary n p = 0*n, n = 0, 00, 200, τ = 0. III : vary p/n p = 000, n = 20, 50, 300, τ = 0.

45 James-Stein estimator Simulation (continued) Estimators: Now study MSE of JS-estimator in relation to.

46 James-Stein estimator Simulation (continued): scenario I Shrinkage yields more if genes are more alike.

47 James-Stein estimator Simulation (continued): scenario II Shrinkage yields more if n is small.

48 James-Stein estimator Simulation (continued): scenario III Shrinkage yields more with larger p/n ratios

49 James-Stein estimator Crucial question: how to determine λ in Remember the simulation: Simulation n samples p genes X ij ~ N(μ j, ) μ j ~ N(0, τ 2 ) The latter can be regarded as a prior. We need to know this prior to estimate λ empirical Bayes

50 Empirical Bayes

51 Empirical Bayes The JS estimator can be motivated from an empirical Bayes perspective. Empirical Bayes methods are Bayesian methods with a twist. In an empirical Bayes setting, the parameters at the top level of a hierarchical model are set to their optimal values (as determined from the data), instead of being integrated out. Roughly: the priors are estimated rather than assumed.

52 Empirical Bayes JS example (continued) Recall: The are a sample from a prior distribution: The Bayes estimator of the given the data: is their posterior mean,

53 Empirical Bayes JS example (continued) The posterior mean is given by: * Of the same form as the JS estimator... * Standard Bayesian calculations

54 Empirical Bayes Rewrite θ j = θ tttttt and θ = X j * is of the James-Stein form θ j + [ (nτ 2 +) - ] (X j - θ j ) = X j (nτ 2 +) - (X j - θ j ) * with λ = (nτ 2 +) - If n or τ is large, there is little shrinkage towards the target

55 Empirical Bayes JS example (continued) Remember: Typically,use θ j = θ. The prior mean θ j plays role of target.? How to estimate θ and τ?

56 Empirical Bayes Marginal likelihood Marginal likelihood: likelihood integrated w.r.t. all prior(s). p( X;α) = p( X λ)p α (λ)dλ Parametric empirical Bayes: maximize p(x; α) w.r.t. parameters α. Example X ij ~ iid N(μ j, ), μ j ~ iid N(θ,τ 2 ), so α = {θ,τ}. p( X;θ,τ) = j i N(x ij ;μ j,)n(μ j ;θ,τ)dμ j

57 Empirical Bayes p( X;θ,τ) = p( X = j;θ,τ) j j i N(x ij ;μ j,)n(μ j ;θ,τ)dμ j The integral reduces to a product Gaussian form (conjugacy) Example X ij ~ iid N(μ j, ), μ j ~ iid N(θ,τ 2 ) What is the unconditional density of X j = (X j,, X nj ): p(x j ) = p(x j ; θ,τ)?

58 Bayesian inference, conjugate priors, example P X = P X μ P(μ)dμ * n = C exp ( X i μ 2 )/2 exp((μ θ) 2 )/2τ 2 dμ i= n = C exp ( X i A 2 )/B exp((μ D) 2 )/E dμ i= Where A, B do not depend on μ (but do depend on {θ,τ}). Gaussian form for μ: integral has to integrate to. First exponential also a Gaussian form: product of Gaussians * dropping index j

59 Empirical Bayes p( X;θ,τ) = p( X = j;θ,τ) j p(x j ; θ,τ) reduces to a product of Gaussians Outer product: Product of a product of Gaussians Hence, solving argmax θ,τ p(x; θ,τ) reduces to max lik estimation, which is equivalent to moment estimation in a Gaussian setting: j i N(x ij ;μ j,)n(μ j ;θ,τ)dμ j E[X ij V[X ] = E ij ] = μ V j μ {E[X j ij {E[X ij μ j μ ]} = E{μ j ]} + E μ j j } = θ = {V[X ij μ set j ]} X = τ 2 + = set pn - i,j (X ij -X) 2

60 Back to James-Stein estimator Moreover, we derived: λ = (nτ 2 +) - We obtain an estimate of λ by substituting the empirical Bayes estimate: 2 2 τˆ = (Xij -X) pn - i,j -

61 Beneficial effect of shrinkage 5 repeated studies. Estimates of parameter of interest +/- sd. Solid: no shrinkage; dashed: shrinkage. (a): n=5, (b): n=40.

62 Beneficial effects of shrinkage (more to come ) Better testing in a multiple testing setting Shrinkage causes bias but under selection pressure (e.g. pick the 5 genes with largest parameter): bias generally smaller than for unshrunken estimate In a regression setting shrinking a nuisance parameter can render higher power for the parameter of interest To be continued.

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Outline lecture 2 2(30)

Outline lecture 2 2(30) Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Tales from fmri Learning from limited labeled data. Gae l Varoquaux

Tales from fmri Learning from limited labeled data. Gae l Varoquaux Tales from fmri Learning from limited labeled data Gae l Varoquaux fmri data p 100 000 voxels per map Heavily correlated + structured noise Low SNR: 5% 13 db Brain response maps (activation) n Hundreds,

More information

Dimension Reduc-on. Example: height of iden-cal twins. PCA, SVD, MDS, and clustering [ RI ] Twin 2 (inches away from avg)

Dimension Reduc-on. Example: height of iden-cal twins. PCA, SVD, MDS, and clustering [ RI ] Twin 2 (inches away from avg) Dimension Reduc-on PCA, SVD, MDS, and clustering Example: height of iden-cal twins Twin (inches away from avg) 0 5 0 5 0 5 0 5 0 Twin (inches away from avg) Expression between two ethnic groups Frequency

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Neuroscience Introduction

Neuroscience Introduction Neuroscience Introduction The brain As humans, we can identify galaxies light years away, we can study particles smaller than an atom. But we still haven t unlocked the mystery of the three pounds of matter

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

A Framework for Feature Selection in Clustering

A Framework for Feature Selection in Clustering A Framework for Feature Selection in Clustering Daniela M. Witten and Robert Tibshirani October 10, 2011 Outline Problem Past work Proposed (sparse) clustering framework Sparse K-means clustering Sparse

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Data Exploration and Unsupervised Learning with Clustering

Data Exploration and Unsupervised Learning with Clustering Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a

More information

An Introduction to Spectral Learning

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Dimensionality reduction

Dimensionality reduction Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2016

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2016 CPSC 340: Machine Learning and Data Mining More PCA Fall 2016 A2/Midterm: Admin Grades/solutions posted. Midterms can be viewed during office hours. Assignment 4: Due Monday. Extra office hours: Thursdays

More information

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Principal Component Analysis (PCA) CSC411/2515 Tutorial Principal Component Analysis (PCA) CSC411/2515 Tutorial Harris Chan Based on previous tutorial slides by Wenjie Luo, Ladislav Rampasek University of Toronto hchan@cs.toronto.edu October 19th, 2017 (UofT)

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Multiple testing: Intro & FWER 1

Multiple testing: Intro & FWER 1 Multiple testing: Intro & FWER 1 Mark van de Wiel mark.vdwiel@vumc.nl Dep of Epidemiology & Biostatistics,VUmc, Amsterdam Dep of Mathematics, VU 1 Some slides courtesy of Jelle Goeman 1 Practical notes

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Estimation of Parameters

Estimation of Parameters CHAPTER Probability, Statistics, and Reliability for Engineers and Scientists FUNDAMENTALS OF STATISTICAL ANALYSIS Second Edition A. J. Clark School of Engineering Department of Civil and Environmental

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

Announcements (repeat) Principal Components Analysis

Announcements (repeat) Principal Components Analysis 4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long

More information

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

More information

Sparse statistical modelling

Sparse statistical modelling Sparse statistical modelling Tom Bartlett Sparse statistical modelling Tom Bartlett 1 / 28 Introduction A sparse statistical model is one having only a small number of nonzero parameters or weights. [1]

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Overview of clustering analysis. Yuehua Cui

Overview of clustering analysis. Yuehua Cui Overview of clustering analysis Yuehua Cui Email: cuiy@msu.edu http://www.stt.msu.edu/~cui A data set with clear cluster structure How would you design an algorithm for finding the three clusters in this

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information