A Framework for Feature Selection in Clustering

Size: px
Start display at page:

Download "A Framework for Feature Selection in Clustering"

Transcription

1 A Framework for Feature Selection in Clustering Daniela M. Witten and Robert Tibshirani October 10, 2011

2 Outline Problem Past work Proposed (sparse) clustering framework Sparse K-means clustering Sparse hierarchical clustering Applications to simulated and real data

3 Problem Clustering given data X n p identify K clusters unsupervised learning

4 Problem Clustering given data X n p identify K clusters unsupervised learning Sparse Clustering clusters only differ based on q < p features benefits: improved accuracy and interpretability cheaper prediction

5 eduction for clustering fall in this -based clustering can be found in as a model selection problem: models containing nested subsets of variables are compared. The nested models are sparse Problem - Motivating Example X 1,..., X 500 indep. bivariate normal clusters differ only in variable 1 Journal of the American Statistical Association, June 2010 l example, two classes differ only with respect to the first feature. Sparse 2-means clustering selects only the a superior result.

6 Problem - set-up Measure of dissimilarity additive assumption: d(x i, x i ) = p j=1 d i,i,j squared Euclidean distance: d i,i,j = (x ij x i j) 2

7 Past Work - matrix decomposition approximate X n p A n q B q p and cluster along rows of A examples: principal component analysis (PCA); Ghosh and Chinnaiyan (2002), Liu et al. (2003) nonnegative matrix factorization; Tamayo et al. (2007) not truly sparse - function of full p A not guaranteed to contain signal of interest PCA not justified in clustering; Chang (1983)

8 Past Work - model-based clustering model rows as independent multivariate observations from a mixture model LL = n [ K ] log π k f k (X i ; µ k, Σ k ) i=1 k=1 K: number of components f k : density, typically Gaussian use EM algorithm to fit model problems estimating Σ k when p n or p >> n

9 Past Work - model-based clustering maximize the log-likelihood subject to a sparsity inducing penalty Pan and Shen (2007): assume the features are centered at 0 PLL = n [ K ] log π k f k (X i ; µ k, Σ k ) λ i=1 k=1 K k=1 j=1 p µ kj sparsity induced when µ kj = 0 for all k

10 Past Work - model-based clustering Raftery and Dean (2006): recast variable selection problem as a model selection problem choose between models with nested subsets of variables

11 Past Work - model-based clustering Friedman and Meulman (2004): clustering objects on subsets of attributes (COSA) { K minimize C 1,...,C K,w a k k=1 i,i C k j=1 subject to p } (w j d i,i,j + λw j log w j ) p w j = 1, w j 0 j=1 C k : indices of observations in cluster k a k : function of C k w R p : feature weights λ 0 tuning parameter above is a simplified version, full COSA allows for w k

12 Proposed Framework Clustering maximize Θ D { p j=1 } f j (X j, Θ) X j R n : feature j f j : function of only the jth feature

13 Proposed Framework Sparse Clustering maximize Θ D,w { p j=1 } w j f j (X j, Θ) subject to w 2 1, w 1 s, w j 0 1 s p: tuning parameter

14 Proposed Framework maximize Θ D,w { p j=1 } w j f j (X j, Θ) subject to w 2 1, w 1 s, w j 0 Sparse Clustering: observations w 1 = = w p reduces to usual clustering L 1 penalty induces sparsity without L 2 penalty solution is trivial w j is the contribution of feature j to the clustering in general, f j (X j, Θ) > 0 for some or all j

15 Proposed Framework Sparse Clustering: optimization (iterative) 1 holding w fixed, optimize w.r.t. Θ clustering on weighted data 2 holding Θ fixed, optimize w.r.t. w rewrite as: maximize {w T a} a j = f j (X j, Θ) w proposition: by KKT, w = (a + ) + (a + ) + 2, = { 0 if it results in w 1 s > 0 otherwise to yield w 1 = s

16 K-means Clustering find C 1,, C K to... minimize the within-cluster sum of squares (WCSS) minimize C 1,,C K K 1 n k k=1 p i,i C k j=1 d i,i,j or maximize the between-cluster sum of squares (BCSS) maximize C 1,,C K maximize Θ D p ( 1 n j=1 n i=1 i =1 p f j (X j, Θ) j=1 n d i,i,j K 1 d i,i n,j k k=1 i,i C k )

17 Sparse K-means Clustering maximize C 1,,C K,w { p ( 1 w j n j=1 n n K 1 d i,i,j i=1 i =1 n k k=1 i,i C k d i,i,j )} subject to w 2 1, w 1 s, w j 0 Sparse K-means Clustering: optimization solve using earlier algorithm (1) K-means using di,i,j = w jd i,i,j to find C 1,, C K (2) proposition to find w iterate until j w j r r 1 wj r 1 j wj < 10 4 tends to converge within 5 to 10 iterations generally will not converge to global optimum (non-convex)

18 Sparse K-means Clustering Sparse K-means Clustering: tuning s objective function increases with s permutation approach is used: 1 obtain permuted datasets X 1,..., X B by independently permute the observations within each feature 2a compute O(s), the objective obtained by performing sparse K-means with tuning parameter s 2b similarly compute O b (s) for b = 1,..., B b log(o b(s)) 2c calculate Gap(s) = log(o(s)) 1 B { } 3 choose s = argmax s Gap(s) alternately, choose s to equal the smallest value for which Gap(s ) is within one standard deviation of log(o b (s )) of the largest value of Gap(s) Intuition: gap statistic measures the strength of the clustering obtained on the real data relative to the clustering obtained on null data that does not contain subgroups

19 Sparse Hierarchical Clustering produce a dendrogram that represents a nested set of clusters from dissimilarity matrix U n n and cut at desired height linkage: between cluster measure complete: d(c, C ) = max { d(x i, x i ); x i C, x i C } average: d(c, C ) = 1 C C x i,x d(x i i, x i ) single: d(c, C ) = min { d(x i, x i ); x i C, x i C }

20 Sparse Hierarchical Clustering place selection of dissimilarity matrix U in the proposed framework dissimilarity matrix U = { j d } i,i,j results in usual i,i hierarchical clustering U can be shown to be proportional to the solution of maximize U { p j=1 i,i d i,i,ju i,i subject to i,i U 2 i,i 1 } shape of dendrogram is invariant to scaling of features

21 Sparse Hierarchical Clustering maximize w,u { p w j j=1 subject to i,i U 2 i,i 1, i,i d i,i,ju i,i } w 2 1, w 1 s, w j 0 solution is proportional to U = { j w jd i,i,j } i,i clustering by reweighted dissimilarity matrix U produces sparsity

22 Spares Hierarchical Clustering maximize w,u { p w j j=1 i,i d i,i,ju i,i Sparse Hierarchical Clustering: optimization objective is bi-convex in U and w notation: D R n2 p : {d i,i,j} i,i strung out into columns u R n2 : U strung out into a vector } rewrite: maximize w,u {u T Dw}

23 Spares Hierarchical Clustering maximize w,u { p w j j=1 i,i d i,i,ju i,i Sparse Hierarchical Clustering: optimization solve using earlier algorithm (1) update u = Dw Dw 2 (2) proposition to find w iterate until j w j r r 1 wj r 1 j wj < 10 4 perform hierarchical clustering using U tends to converge within 10 iterations not guaranteed to converge to global optimum }

24 Spares Hierarchical Clustering Sparse Hierarchical Clustering: a simple underlying model consider simple model: two classes C 1, C 2 only differ with respect to first q features. X ij indep. { N(µ j + c, σ 2 ), if j q, i C 1 N(µ j, σ 2 ), otherwise for i i, X ij X i j { N(±c, 2σ 2 ), N(0, 2σ 2 ), if j q, i, i in different classes otherwise d i,i,j = { (X ij X i j) 2, 2σ 2 χ 2 ( c 2 ) 1 d i,i,j 2σ, 2 2σ 2 χ 1, if j q, i, i in different classes otherwise

25 Spares Hierarchical Clustering Sparse Hierarchical Clustering: a simple underlying model consider overall dissimilarity j d i,i {,j, j d 2σ 2 χ 2 ( qc 2 ) p i,i,j 2σ, if j q, i, i in different classes 2 2σ 2 χ p, otherwise consider weighted dissimilarity j w jd i,i,j under ideal situation where { w j 1 j q, j w 2σ 2 χ 2 ( qc 2 ) q jd i,i,j 2σ, j q, i, i in different classes 2 2σ 2 χ q, otherwise dissimilarity matrix for sparse clustering is a denoised version of the dissimilarity matrix used for standard clustering

26 Spares Hierarchical Clustering Sparse Hierarchical Clustering: a simple underlying model standard clustering, E ( j d i,i,j) = { 2σ 2 p + c 2 q, if j q, i, i in different classes 2σ 2 p, otherwise sparse clustering, E ( j w jd i,i,j) = { 2σ 2 j w j + c 2 j q w j, 2σ 2 j w j, j q, i, i in different classes otherwise note: 1 w taken to be first SPC of E(D), not D 2 w 1 = = w q > w q+1 = = w p

27 Spares Hierarchical Clustering Sparse Hierarchical Clustering: tuning s same as sparse K-means approach: permutation method

28 Spares Hierarchical Clustering Sparse Hierarchical Clustering: complementary clustering finding a secondary clustering after removing the signal found in primary clustering identify U 1, w 1 solve for U 2, w 2 : maximize u 2,w 2 {u T 2 Dw 2 } subject to u 2 2 1, u T 1 u 2 = 0, w 2 2 1, w 2 1 s, w j 0

29 Sparse K-means - simulated data simulation 1: comparison of sparse and standard K-means K = 3 q = 50 X ij N(µ ij, 1), µ ij = µ(1 i C1,j q 1 i C2,j q) 20 observations per class vary values of p and µ classification error rate (CER): i>i 1 P(i,i ) 1 Q(i,i ) / ( ) n 2

30 cribed in Section 3.2. Sparse K-means - simulated data simulation 1: comparison of sparse et al. 2007). and standard K-means imulation 2: A Comparison With Other Approaches. re the performance of sparse K-means to a number itors: sidered the first principal component contained the si This is similar to several proposals in the literature e.g., Ghosh and Chinnaiyan 2002; Liu et al. 2003; Tam Table 1. Standard 3-means results for Simulation 1. The reported values are the mean (and standard error) of the CER over 20 simulations. The µ/p combinations for which the CER of standard 3-means is significantly less than that of sparse 3-means (at level α = 0.05) are shown in bold p = 50 p = 200 p = 500 p = 1000 µ = (0.01) (0.015) 0.22 (0.009) (0.006) µ = (0.005) (0.009) 0.16 (0.012) (0.01) µ = (0.004) (0.007) 0.08 (0.005) (0.01) µ = (0.001) (0.005) (0.008) (0.013) µ = (0.002) 0.004(0.002) (0.004) 0.05(0.006) Journal of the American Statistical Association, June Table 2. Sparse 3-means results for Simulation 1. The reported values are the mean (and standard error) of the CER over 20 simulations. The µ/p combinations for which the CER of sparse 3-means is significantly less than that of standard 3-means (at level α = 0.05) are shown in bold p = 50 p = 200 p = 500 p = 1000 µ = (0.014) (0.016) (0.015) (0.017) µ = (0.011) (0.008) (0.013) (0.013) µ = (0.008) (0.007) (0.005) (0.006) µ = (0.006) (0.003) (0.004) (0.004) µ = (0.004) (0.002) (0.001) (0.002)

31 lted in a low CER and sparsity. The tuning parameter to maximize the gap statistic; however, greater sparhave been achieved by choosing the smallest tuning value within one standard deviation of the maximal c, as described in Section 3.2. Ourproposalalsohas age of generalizing to other types of clustering, as n the next section. subject to Sparse K-means - simulated data simulation 1: comparison of sparse and standard K-means i,i U 2 i,i 1. Let U optimize (14). It is not hard to show that Ui,i j d and so performing hierarchical clustering on U results in dard hierarchical clustering. So we can think of standard hi chical clustering as resulting from the criterion (14). To ob Table 3. Sparse 3-means results for Simulation 1. The mean number of nonzero feature weights resulting from the method for tuning parameter selection of Section 3.2 is shown; standard errors are given in parentheses. Note that 50 features differ between the three classes p = 50 p = 200 p = 500 p = 1000 µ = (0.895) (7.147) (31.726) (41.259) µ = (0.642) (2.514) (19.995) (17.007) µ = (0.651) (0.654) (13.491) (10.988) µ = (0.719) 200 (0) (19.96) 83.7 (9.271) µ = (0.478) 200 (0) (20.247) (14.573) fewer features with nonzero weights would result from selecting the tuning parameter at the smallest value that is within one standard deviation of the maximal gap statistic

32 Sparse K-means - simulated data simulation 2: comparison of sparse and other approaches K = 3 X ij N(µ ij, 1), µ ij = µ(1 i C1,j q 1 i C2,j q) two scenarios: small simulation: p = 25, q = 5, 10 observations per class larger simulation: p = 500, q = 50, 20 observations per class other methods: 1 COSA + PAM/heirarchical clustering, Friedman and Meulman (2004) 2 model-based clustering, Raftery and Dean (2006) 3 penalized log-likelihood, Pan and Sen (2007) 4 PCA + 3-means clustering

33 Sparse K-means - simulated data simulation 2: comparison of sparse and other approaches ibshirani: A Framework for Feature Selection in Clustering Table 4. Results for Simulation 2. The quantities reported are the mean and standard error (given in parentheses) of the CER, and of the number of nonzero coefficients, over 25 simulated datasets Simulation Method CER Num. nonzero coef. Small simulation: Sparse K-means (0.019) 8.2 (0.733) p = 25, q = 5, K-means (0.011) 25 (0) 10 obs. per class Pan and Shen (0.017) 6.72 (0.334) COSA w/hier. Clust (0.016) 25 (0) COSA w/k-medoids (0.012) 25 (0) Raftery and Dean (0.031) 22 (0.86) PCA w/k-means 0.16 (0.012) 25 (0) Large simulation: Sparse K-means (0.019) (9.561) p = 500, q = 50, K-means (0.011) 500 (0) 20 obs. per class Pan and Shen (0.013) 76 (3.821) COSA w/hier. Clust (0.011) 500 (0) COSA w/k-medoids (0.004) 500 (0) PCA w/k-means (0.006) 500 (0) the features, we modify (14) bymultiplyingeach the summation over j by a weight w j,subjecttocon- 2. Iterate until convergence: (a) Update u = Dw Dw 2.

34 Sparse Hierarchical - breast cancer data, Perou et al. (2000) 65 surgical specimens of human breast tumors 1753 genes, 496 intrinsic genes hierarchical clustering using Eigen linkage 4 classes identified, 3 samples unassigned Journal of the American Statistical Association, June Using the intrinsic gene set, hierarchical clustering was performed on all 65 observations (left panel) and on only the 62 observa ssigned to one of the four classes (right panel). Note that the classes identified using all 65 observations are largely lost i obtained using just 62 observations. The four classes are basal-like (dashed gray), Erb-B2 (solid gray), normal breast-like (d

35 mor before and after chemotherapy. The dendrogram obtained using the intrinsic gene set was used to identify four classes basal-like, Sparse Erb-B2, Hierarchical normal breast-like, and ER+ to - breast which 62 assigned cancer to the four data, classes: Perou et al. of the 65 samples belong. It was determined that the remaining three observations did not belong to any of the four classes. (2000) These four classes are not visible in the dendrogram obtained using the full set of genes, and the authors concluded that the the intrinsic genes. We performed four versions of hierarchical clustering with Eisen linkage on the 62 observations that were 1. Sparse hierarchical clustering of all 1753 genes, with the tuning parameter chosen to yield 496 nonzero genes. 2. Standard hierarchical clustering using all 1753 genes. Figure 5. Four hierarchical clustering methods were used to cluster the 62 observations that were assigned to one of four classes in Perou et al. (2000). Sparse clustering results in the best separation between the four classes. The classes are indicated as in Figure 4. The online version of this figure is in color.

36 Sparse Hierarchical - breast cancer data, Perou et al. (2000) Witten and Tibshirani: A Framework for Feature Selection in Clustering 723 Figure 6. The gap statistic was used to determine the optimal value of the tuning parameter for sparse hierarchical clustering. Left: The largest value of the gap statistic corresponds to 93 genes with nonzero weights. Right: The dendrogram corresponding to 93 nonzero weights. The classes are indicated as in Figure 4.Theonlineversionofthisfigureisincolor. 3. Standard hierarchical clustering using the 496 genes with highest marginal variance. 4. COSA hierarchical clustering using all 1753 genes. The resulting dendrograms are shown in Figure 5. Sparseclustering of all 1753 genes with the tuning parameter chosen to the high-variance genes (Figure 5). The reason for this lies in the form of the criterion (15). Though the nonzero w j s tend to correspond to genes with high marginal variances, sparse clustering does not simply cluster the genes with highest marginal variances. Rather, it weights each gene-wise dissimilarity ma-

37 abitbetterthanclusteringbasedontheintrinsicgenesonly! Figure 6 displays the result of performing the automated tuning parameter selection method. This resulted in 93 genes having Sparse Hierarchical nonzero weights. - breast cancer data, Perou et al. (2000) Figure 7 shows that the gene weights obtained using sparse clustering are highly correlated with the marginal variances of the genes. However, the results obtained from sparse clustering are different from the results obtained by simply clustering on ing parameters for the initial and complem terings were selected to yield 496 genes wi The complementary sparse clustering dend Figure 8, alongwithaplotofw 1 and w 2 ( for the initial and complementary cluster gram obtained using complementary sparse apreviouslyunknownpatterninthedata. drogram for the initial sparse clustering ca ure 5. In response to a reviewer s inquiry abou sparse hierarchical clustering, we repeated observations that belong to one of the fo formed sparse hierarchical clustering on the Asmalljitterwasaddedtotheresampledo to avoid samples having correlation one w found that for the most part, the resulting reflected the true class labels. Figure 7. For each gene, the sparse clustering weight is plotted against the marginal variance. 6. SNP DATA EXAMPL We wondered whether one could use spa der to identify distinct populations in sing morphism (SNP) data, and also to identify fer between the populations. A SNP is a in a DNA sequence at which genetic vari population. We used the publicly availab ( HapMap ) data of the International H (International HapMap Consortium 2005, Phase III SNP data for chromosome 22. We sis to three of the populations: African an U.S.A., Utah residents with European anc nese from Beijing. We used the SNPs for w

38 Sparse Hierarchical - SNP data Phase III SNP data from chromosome 22 3 populations: African ancestry in southwest U.S.A. Utah residents with European ancestry Han Chinese from Beijing n = 315, p = 17, 026

39 198 and 2809 SNPs with nonzero weights, sparse clustering resulted in slightly lower CER than standard 3-means clustering. The Sparse main improvement Hierarchical of sparse clustering over - standard SNPclus-data tering is in interpretability, since the nonzero elements of w determine the SNPs involved in the sparse clustering. We can use deviation rule described in Section 3.2 results in a tuning parameter that yields 7160 genes with nonzero weights. The fact that the gap statistic seemingly overestimates the number of features with nonzero weights may reflect the need for a more accurate method for tuning parameter selection, or it may suggest Figure 9. Left: The gap statistics obtained as a function of the number of SNPs with nonzero weights.center: The CERs obtained using sparse and standard 3-means clustering, for a range of values of the tuning parameter. Right: Sparse clustering was performed using the tuning parameter that yields 198 nonzero SNPs (this is the smallest number of SNPs that resulted in minimal CER in the center panel). Chromosome 22 was split into 500 segments of equal length. The average weights of the SNPs in each segment are shown, as a function of the nucleotide position of the segments.

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Correlate. A method for the integrative analysis of two genomic data sets

Correlate. A method for the integrative analysis of two genomic data sets Correlate A method for the integrative analysis of two genomic data sets Sam Gross, Balasubramanian Narasimhan, Robert Tibshirani, and Daniela Witten February 19, 2010 Introduction Sparse Canonical Correlation

More information

Data Exploration and Unsupervised Learning with Clustering

Data Exploration and Unsupervised Learning with Clustering Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Structural Learning and Integrative Decomposition of Multi-View Data

Structural Learning and Integrative Decomposition of Multi-View Data Structural Learning and Integrative Decomposition of Multi-View Data, Department of Statistics, Texas A&M University JSM 2018, Vancouver, Canada July 31st, 2018 Dr. Gen Li, Columbia University, Mailman

More information

Penalized versus constrained generalized eigenvalue problems

Penalized versus constrained generalized eigenvalue problems Penalized versus constrained generalized eigenvalue problems Irina Gaynanova, James G. Booth and Martin T. Wells. arxiv:141.6131v3 [stat.co] 4 May 215 Abstract We investigate the difference between using

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018 CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,

More information

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Jérôme Thai 1 Timothy Hunter 1 Anayo Akametalu 1 Claire Tomlin 1 Alex Bayen 1,2 1 Department of Electrical Engineering

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes Week 10 Based in part on slides from textbook, slides of Susan Holmes Part I Linear regression & December 5, 2012 1 / 1 2 / 1 We ve talked mostly about classification, where the outcome categorical. If

More information

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Clusters. Unsupervised Learning. Luc Anselin.   Copyright 2017 by Luc Anselin, All Rights Reserved Clusters Unsupervised Learning Luc Anselin http://spatial.uchicago.edu 1 curse of dimensionality principal components multidimensional scaling classical clustering methods 2 Curse of Dimensionality 3 Curse

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

More information

Sparse Clustering of Functional Data

Sparse Clustering of Functional Data arxiv:1501.04755v1 [stat.me] 20 Jan 2015 Sparse Clustering of Functional Data Davide Floriello 1 and Valeria Vitelli 2 1 Computer Science and Software Engineering Department, University of Canterbury,

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Computer Vision Group Prof. Daniel Cremers. 14. Clustering

Computer Vision Group Prof. Daniel Cremers. 14. Clustering Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it

More information

High-dimensional data: Exploratory data analysis

High-dimensional data: Exploratory data analysis High-dimensional data: Exploratory data analysis Mark van de Wiel mark.vdwiel@vumc.nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University Contributions by Wessel

More information

Sparse Additive Functional and kernel CCA

Sparse Additive Functional and kernel CCA Sparse Additive Functional and kernel CCA Sivaraman Balakrishnan* Kriti Puniyani* John Lafferty *Carnegie Mellon University University of Chicago Presented by Miao Liu 5/3/2013 Canonical correlation analysis

More information

Generalized Power Method for Sparse Principal Component Analysis

Generalized Power Method for Sparse Principal Component Analysis Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work

More information

Sparse statistical modelling

Sparse statistical modelling Sparse statistical modelling Tom Bartlett Sparse statistical modelling Tom Bartlett 1 / 28 Introduction A sparse statistical model is one having only a small number of nonzero parameters or weights. [1]

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Chapter 10. Semi-Supervised Learning

Chapter 10. Semi-Supervised Learning Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Module Master Recherche Apprentissage et Fouille

Module Master Recherche Apprentissage et Fouille Module Master Recherche Apprentissage et Fouille Michele Sebag Balazs Kegl Antoine Cornuéjols http://tao.lri.fr 19 novembre 2008 Unsupervised Learning Clustering Data Streaming Application: Clustering

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee May 13, 2005

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Clustering and Gaussian Mixtures

Clustering and Gaussian Mixtures Clustering and Gaussian Mixtures Oliver Schulte - CMPT 883 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15 2 25 5 1 15 2 25 detected tures detected

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Sparse canonical correlation analysis, with applications to genomic data

Sparse canonical correlation analysis, with applications to genomic data Sparse canonical correlation analysis, with applications to genomic data Daniela M. Witten and Robert Tibshirani June 15, 2009 The framework Suppose that we have n observations on p 1 + p 2 variables,

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 - Clustering Lorenzo Rosasco UNIGE-MIT-IIT About this class We will consider an unsupervised setting, and in particular the problem of clustering unlabeled data into coherent groups. MLCC 2018

More information

Principal component analysis (PCA) for clustering gene expression data

Principal component analysis (PCA) for clustering gene expression data Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical

More information

A note on the group lasso and a sparse group lasso

A note on the group lasso and a sparse group lasso A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 05 Semi-Discrete Decomposition Rainer Gemulla, Pauli Miettinen May 16, 2013 Outline 1 Hunting the Bump 2 Semi-Discrete Decomposition 3 The Algorithm 4 Applications SDD alone SVD

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Cluster Validation by Prediction Strength

Cluster Validation by Prediction Strength Cluster Validation by Prediction Strength Robert TIBSHIRANI and Guenther WALTHER This article proposes a new quantity for assessing the number of groups or clusters in a dataset. The key idea is to view

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2016

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2016 CPSC 340: Machine Learning and Data Mining More PCA Fall 2016 A2/Midterm: Admin Grades/solutions posted. Midterms can be viewed during office hours. Assignment 4: Due Monday. Extra office hours: Thursdays

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14 STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations Joint work with Karim Oualkacha (UQÀM), Yi Yang (McGill), Celia Greenwood

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Statistical Significance for Hierarchical Clustering

Statistical Significance for Hierarchical Clustering Statistical Significance for Hierarchical Clustering arxiv:1411.5259v1 [stat.me] 19 Nov 2014 Patrick K. Kimes 1 Yufeng Liu 1,2,3 D. Neil Hayes 4 J. S. Marron 1,2 Abstract: Cluster analysis has proved to

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts Journal of Multivariate Analysis 5 (03) 37 333 Contents lists available at SciVerse ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Consistency of sparse PCA

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Multivariate Analysis

Multivariate Analysis Multivariate Analysis Chapter 5: Cluster analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2015/2016 Master in Business Administration and

More information

Statistical Methods for Analysis of Genetic Data

Statistical Methods for Analysis of Genetic Data Statistical Methods for Analysis of Genetic Data Christopher R. Cabanski A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

EM for Spherical Gaussians

EM for Spherical Gaussians EM for Spherical Gaussians Karthekeyan Chandrasekaran Hassan Kingravi December 4, 2007 1 Introduction In this project, we examine two aspects of the behavior of the EM algorithm for mixtures of spherical

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12 STATS 306B: Unsupervised Learning Spring 2014 Lecture 13 May 12 Lecturer: Lester Mackey Scribe: Jessy Hwang, Minzhe Wang 13.1 Canonical correlation analysis 13.1.1 Recap CCA is a linear dimensionality

More information

Discovering molecular pathways from protein interaction and ge

Discovering molecular pathways from protein interaction and ge Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why

More information

L 2,1 Norm and its Applications

L 2,1 Norm and its Applications L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.

More information

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature

More information

Learning Gaussian Graphical Models with Unknown Group Sparsity

Learning Gaussian Graphical Models with Unknown Group Sparsity Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation

More information

Mixtures of Gaussians continued

Mixtures of Gaussians continued Mixtures of Gaussians continued Machine Learning CSE446 Carlos Guestrin University of Washington May 17, 2013 1 One) bad case for k-means n Clusters may overlap n Some clusters may be wider than others

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information