MACHINE LEARNING 2012 MACHINE LEARNING

Size: px
Start display at page:

Download "MACHINE LEARNING 2012 MACHINE LEARNING"

Transcription

1 1 MACHINE LEARNING kernel CCA, kernel Kmeans Spectral Clustering

2 2 Change in timetable: We have practical session net week!

3 3 Structure of toda s and net week s class 1) Briefl go through some etension or variants on the principle of kernel PCA, namel kernel CCA. 2) Look at one particular set of clustering algorithms for structure discover, kernel K-Means 3) Describe the general concept of Spectral Clustering, highlighting equivalenc between kernel PCA, ISOMAP, etc and their use for clustering. 4) Introduce the notion of unsupervised and semi-supervised learning and how this can be used to evaluate clustering methods 5) Compare kernel K-means and Gaussian Miture Model for unsupervised and semi-supervised clustering

4 Canonical Correlation Analsis (CCA) N P, 1 1,, w 2 2 ma T, T corr w w w Video description Audio description Determine features in two (or more) separate descriptions of the dataset that best eplain each datapoint. Etract hidden structure that maimize correlation across two different projections. 4

5 5 Canonical Correlation Analsis (CCA) Pair of multidimensional zero mean variables We have M instances of the pairs. M i N i q, X Y i1 i1, 1 1 M Search two projections w and w : T T X ' w X and Y ' w Y solutions of: ma ma corr X',Y' w, w,, w 2 2 ma T, T corr w X w Y w

6 6 Canonical Correlation Analsis (CCA) ma ma corr X',Y' w, w ma w, w ma w, w w T w w T T E XY X w T T w C C w T w w Y w C w T Crosscovariance matri C is N q Measure crosscorrelation between X and Y. Covariance matrices T C =E XX : N N T C =E YY : q q

7 7 Canonical Correlation Analsis (CCA) ma ma corr X',Y' w, w ma w, w ma w, w w T w w T T E XY X w T w C C w T T w w Y w C w T Correlation not affected b rescaling the norm of the vectors, we can ask that w C w w C w 1 T T ma ma w C w T w, w T u. c. w C w w C w 1 T

8 8 Canonical Correlation Analsis (CCA) To determine the optimum (maimum) of, solve b Lagrange: T wc w ma T T w, 1 2 w wcwwc w CCCw Cw T T u.c. w Cw w Cw 1 Generalized Eigenvalue Problem; can be reduced to a classical eigenvalue problem if C is invertible

9 9 Kernel Canonical Correlation Analsis - CCA finds basis vectors, s.t. the correlation between the projections is mutuall maimized. generalized version of PCA for two or more multi-dimensional datasets. CCA depends on the coordinate sstem in which the variables are described. Even if strong linear relationship between variables, depending on the coordinate sstem used, this relationship might not be visible as a correlation Kernel CCA.

10 Principle of Kernel Methods Determine a metric which brings out features of the data so as to make subsequent computation easier Original Space After Lifting Data in Feature Space 2 1 Data becomes linearl separable when using a rbf kernel and projecting onto first 2 PC of kernelpca. 10

11 Principle of Kernel Methods (Recall) Idea: Send the data X into a feature space H through the nonlinear map f. In feature space, perform classical linear computation i1... M i N 1,..., M f f f X X Original Space f H Performs linear transformation in feature space 2 e 2 1 e 1 11

12 12 Kernel CCA M, i N i q X Y M M i1 i1 Project into a feature space M M M i i i i f and f, with f 0 and f 0 i1 i1 i1 i1 Construct associated kernel matrices: f K F F, K F F, columns of F, F are f, T T i i The projection vectors can be epressed as a linear combination in feature space: w F and w F

13 13 Kernel CCA Kernel CCA can then become an optimization problem of the ma ma,, KK T 1/2 1/2 T 2 T 2 K K T 2 T 2 u.c K K 1 In Linear CCA, we were solving for: ma w, w w T T w T C C w w w C w T u.c. w C w w C w 1 T

14 14 Kernel CCA Kernel CCA can then become an optimization problem of the ma ma,, KK T 1/2 1/2 T 2 T 2 K K T 2 T 2 u.c K K 1 In practice, the intersection between the spaces spanned b K, K is non-zero, then the problem has a trivial solution, as ~ cos K, K 1.

15 15 Kernel CCA Generalized eigenvalue problem: 2 0 KK K 0 2 KK 0 0 K Add a regularization term to increase the rank of the matri and make it invertible K 2 K M + I 2 2 Several methods have been proposed to choose carefull the regularizing term so as to get projections that are as close as possible to the true projections.

16 16 Kernel CCA 2 M K + I 0 0 KK 2 2 KK 0 M 0 K + I A 2 T Set: B C C and C Becomes a classical eigenvalue problem T 1 1 B C AC

17 17 Kernel CCA M, i N i q X Y Two datasets case 1 M i1 i1 Can be etended to multiple datasets: L datasets: X,..., X with M observations each L Dimensions N,... N :, i.e. X : N M 1 L i i Appling a non-linear transformation f to X,... X construct L Gram matrices: K,..., K 1 L 1 L

18 18 Kernel CCA M, i N i q X Y Two datasets case i1 i1 Was formulated as generalized eigenvalue problem: 2 M K + I 0 0 KK 2 2 KK 0 M 0 K + I A 2 B Can be etended to multiple datasets: 2 2 M K1 + I 0 0 K1K2... K1K L K2K K2K L KLK1 KLK2... KLK L 2 L 2 M L 0 KL + I 2 M

19 19 Kernel CCA M. Kuss and T, Graepel,The Geometr Of Kernel Canonical Correlation Analsis Tech. Report, Ma Planck Institute, 2003

20 20 Kernel CCA Matlab Eample File: demo_cca-kcca.m

21 Applications of Kernel CCA Goal: To measure correlation between heterogeneous datasets and to etract sets of genes which share similarities with respect to multiple biological attributes Kernel matrices K1, K2 and K3 correspond to gene-gene similarities in pathwas, genome position, and microarra epression data resp. Use RBF kernel with fied kernel width. Correlation scores in MKCCA: pathwa vs. genome vs. epression. Y Yamanishi, JP Vert, A Nakaa, M Kanehisa - Etraction of correlated gene clusters from multiple genomic data b generalized kernel canonical correlation analsis, Bioinformatics,

22 Applications of Kernel CCA Goal: To measure correlation between heterogeneous datasets and to etract sets of genes which share similarities with respect to multiple biological attributes Gives pairwise correlation between K1, K2 Gives pairwise correlation between K1, K3 Two clusters correspond to genes close to each other with respect to their positions in the pathwas, in the genome, and to their epression A readout of the entries with equal projection onto the first canonical vectors give the genes which belong to each cluster Correlation scores in MKCCA: pathwa vs. genome vs. epression. Y Yamanishi, JP Vert, A Nakaa, M Kanehisa - Etraction of correlated gene clusters from multiple genomic data b generalized kernel canonical correlation analsis, Bioinformatics,

23 Applications of Kernel CCA Goal: To construct appearance models for estimating an object s pose from raw brightness images X: Set of images Y: pose parameters (pan and tilt angle of the object w.r.t. the camera in degrees) Eample of two image datapoints with different poses Use linear kernel on X and RBF kernel on Y and compare performance to appling PCA on the (X, Y) dataset directl T. Melzer, M. Reiter and H. Bischof, Appearance models based on kernel canonical correlation analsis, Pattern Recognition 36 (2003) p

24 Applications of Kernel CCA Goal: To construct appearance models for estimating an object s pose from raw brightness images Testing/training ratio kernel-cca performs better than PCA, especiall for small k=testing/training ratio (i.e., for larger training sets). The kernel-cca estimators tend to produce less outliers, i.e., gross errors, and consequentl ield a smaller standard deviation of the pose estimation error than their PCA-based counterparts. For ver small training sets, the performance of both approaches becomes similar T. Melzer, M. Reiter and H. Bischof, Appearance models based on kernel canonical correlation analsis, Pattern Recognition 36 (2003) p

25 26 Kernel K-means Spectral Clustering

26 27 Structure Discover: Clustering Groups pair of points according to how similar these are. Densit based clustering methods (Soft K-means, Kernel K-means, Gaussian Miture Models) compare the relative distributions.

27 28 K-means K-means is a hard partitioning of the space through K clusters equidistant according to norm 2 measure. The distribution of data within each cluster is encapsulated in a sphere. *m2 *m1 *m3

28 29 K-means Algorithm k m, k 1... K,

29 30 K-means Algorithm Iterative Method (variant on Epectation-Maimization) 2. Calculate the distance from each data point to each centroid. 3. Assignment Step: Assign the responsibilit of each data point to its closest centroid (E-step). If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest winning centroid). d, m m j k j k p j k arg min d, m k

30 31 K-means Algorithm Iterative Method (variant on Epectation-Maimization) 2. Calculate the distance from each data point to each centroid. 3. Assignment Step: Assign the responsibilit of each data point to its closest centroid (E-step). If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest winning centroid). d, m m j k j k p j k arg min d, m k 4. Update Step: Adjust the centroids to be the means of all data points assigned to them (M-step) 5. Go back to step 2 and repeat the process until the clusters are stable.

31 32 K-means Clustering: Weaknesses Two hperparameters: number of clusters K and power p of the metric Ver sensitive to the choice of the number of clusters K and the initialization.

32 K-means Clustering: Hperparameters Two hperparameters: number of clusters K and power p of the metric Choice of power determines the form of the decision boundaries P=1 P=2 P=3 P=4 33

33 Kernel K-means K-Means algorithm consists of minimization of: J m k K 1 K j k p k j C m,..., m m with m k1 C k : number of datapoints in cluster j k C k k m j M i f i 1 Project into a feature space J K 1 m,..., m K f j f m k k1 C j k p j C k f m k j We cannot observe the mean in feature space. Construct the mean in feature space using image of points in same cluster 34

34 35 Kernel K-means J K 1 m,..., m K f j f m k = k1 C K j j f 2 2 l j j l f f f i j j l k,, C k1 C k k K = k, j k k k f j j l k C 2 k, j k m i i C j l k, C k1 C k k f m m j l k 2 m 2

35 36 Kernel K-means Kernel K-means algorithm is also an iterative procedure: 1. Initialization: pick K clusters 2. Assignment Step: Assign each data point to its closest centroid (E-step). If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest winning centroid) b computing the distance in feature space. 2 k, i k min, min k i i, j k C d C k k mk i j k j l, j l k, C 3. Update Step: Update the list of points belonging to each centroid (M-step) 4. Go back to step 2 and repeat the process until the clusters are stable. m k 2

36 37 Kernel K-means With a RBF kernel Cst of value 1 If i is close to all points in cluster k, this is close to 1. If the points are well grouped in cluster k, this sum is close to 1. 2 k, i k min, min k i i, j k C d C k k mk i j k j l, j l k, C m k 2 With homogeneous polnomial kernel?

37 38 Kernel K-means With a polnomial kernel Positive value Some of the terms change sign depending on their position with respect to the origin. If the points are aligned in the same Quadran, the sum is maimal 2 k, i k min, min k i i, j k C d C k k mk i j k j l, j l k, C m k 2

38 39 Kernel K-means: eamples Rbf Kernel, 2 Clusters

39 40 Kernel K-means: eamples Rbf Kernel, 2 Clusters

40 41 Kernel K-means: eamples Rbf Kernel, 2 Clusters Kernel width: 0.5 Kernel width: 0.05

41 42 Kernel K-means: eamples Polnomial Kernel, 2 Clusters

42 43 Kernel K-means: eamples Polnomial Kernel (p=8), 2 Clusters

43 Kernel K-means: eamples Polnomial Kernel, 2 Clusters Order 2 Order 4 Order 6 44

44 45 Kernel K-means: eamples Polnomial Kernel, 2 Clusters The separating line will alwas be perpendicular to the line passing b the origin (which is located at the mean of the datapoints) and parrallel to the ais of the ordinates (because of the change in sign of the cosine function in the inner product. No better than linear K-means!

45 46 Kernel K-means: eamples Polnomial Kernel, 4 Clusters Solutions found with Kernel K-means Solutions found with K-means Can onl group datapoints that do not overlap across quadrans with respect to the origin (careful, data are centered!). No better than linear K-means! (ecept less sensitive to random initialization)

46 47 Kernel K-means: Limitations Choice of number of Clusters in Kernel K-means is important

47 48 Kernel K-means: Limitations Choice of number of Clusters in Kernel K-means is important

48 49 Kernel K-means: Limitations Choice of number of Clusters in Kernel K-means is important

49 50 Limitations of kernel K-means Raw Data

50 51 Limitations of kernel K-means kernel K-means with K=3, RBF kernel

51 52 From Non-Linear Manifolds Laplacian Eigenmaps, Isomaps To Spectral Clustering

52 53 Non-Linear Manifolds PCA and Kernel PCA belong to a more general class of methods to create non-linear manifolds based on spectral decomposition. (Spectral decomposition of matrices is more frequentl referred to as an eigenvalue decomposition.) Depending on which matri we decompose, we get a different set of projections. PCA decomposes the covariance matri of the dataset generate rotation and projection in the original space kernel PCA decomposes the Gram matri partition or regroup the datapoints The Laplacian matri is a matri representation of a graph. Its spectral decomposition can be used for clustering.

53 54 Embed Data in a Graph Original dataset Graph representation of the dataset Build a similarit graph Each verte on the graph is a datapoint

54 55 Measure Distances in Graph S Construct the similarit matri S to denote whether points are close or far awa to weight the edges of the graph:

55 56 Disconnected Graphs Disconnected Graph: Two data-points are connected if: a) the similarit between them is higher than a threshold. b) or if the are k-nearest neighbors (according to the similarit metric) S

56 57 Graph Laplacian Given the similarit matri S Construct the diagonal matri D composed of the sum on each line of K: S1 i i 0 S2...0 i D i, S Mi i and then, build the Laplacian matri : L D S L is positive semi-definite spectral decomposition possible

57 58 Graph Laplacian Eigenvalue decomposition of the Laplacian matri: L UU T All eigenvalues of L are positive and the smallest eigenvalue of L is zero: If we order the eigenvalue b increasing order: M If the graph has k connected components, then the eigenvalue =0 has multiplicit k.

58 59 Spectral Clustering The multiplicit of the eigenvalue 0 determines the number of connected components in a graph. The associated eigenvectors identif these connected components. For an eigenvalue i i 0, the corresponding eigenvector e has the same value for all vertices in a component,and a different value for each one of their i 1 components. Identifing the clusters is then trivial when the similarit matri is composed of zeros and ones (as when using k-nearest neighbor). What happens when the similarit matri is full?

59 60 Spectral Clustering N N Similarit map S : S can either be binar (k-nearest neighboor) or continuous with Gaussia kernel i j 2 2 S, e i j S

60 61 Spectral Clustering 1) Build the Laplacian matri : L D S 2) Do eigenvalue decomposition of the Laplacian matri: 3) Order the eigenvalue b increasing order: M L UU T The first eigenvalue is still zero but with multiplicit 1 onl (full connected graph)! Idea: the smallest eigenvalues are close to zero and hence provide also information on the partioning of the graph (see eercise session) N N Similarit map S : S can either be binar (k-nearest neighboor) or continuous with Gaussia kernel i j 2 2 S, e i j

61 62 Spectral Clustering Eigenvalue decomposition of the Laplacian matri: L UU U T e e... e 1 e2... e e... e 1 2 K K M 1 1 i e... e 1 i K i Construct an embedding of each of the i i M datapoints through. Reduce dimensionalit b picking k<m i projections, i 1... K. With a clear partitioning of the graph, the entries in are split into sets of equal values. Each group of points with same value belong to the same partition (cluster).

62 63 Spectral Clustering Eigenvalue decomposition of the Laplacian matri: L UU U T e e... e 1 e2... e e... e 1 2 K K M 1 1 i e... e 1 i K i Construct an embedding of each of the i i M datapoints through. Reduce dimensionalit b picking k<m i projections, i 1... K. When we have a full connected graph, the entries in Y take an real value.

63 Spectral Clustering Eample: 3 datapoints in a graph composed of 2 partitions The similarit matri is S L has eigenvalue =0 with multiplicit two One solution for the two associated eigenvectors is: e e Anothersolution for the two associated eigenvectors is: e e Entries in the eigenvector for the two first datapoins are equal for the 1st set of eigenvectors for the 2nd set of eigenvectors 64

64 65 Spectral Clustering Eample: 3 datapoints in a full connected graph The similarit matri is S L has eigenvalue =0 with multiplicit 1. The second eigenvalue is small 0.04, whereas the 3rd one is large, with associated eigenvectors : e 2 1, e 0.404, e Entries in the 2 nd eigenvector for the two first datapoins are almost equal , The first two points have almost the same coordinates on the embedding. Reduce the dimensionalit b considering the smallest eigenvalue 3 1 2

65 66 Spectral Clustering Eample: 3 datapoints in a full connected graph The similarit matri is S L has eigenvalue =0 with multiplicit 1. The second and third eigenvalues are both large 2.23, with associated eigenvectors : e 2 1, e 0.57, e Entries in the 2 nd eigenvector for the two first datapoins are no longer equal , The 3 rd point is now closer to the two other points The first two points have no longer the same coordinates on the embedding

66 67 Spectral Clustering 1 w w Step 1: Embedding in 5 Idea: Points close to one another have almost the same coordinate on the eigenvectors of L with small eigenvalues. Step1: Do an eigenvalue decomposition of the Lagrange matri L and project the datapoints onto the first K eigenvectors with smallest eigenvalue (hence reducing the dimensionalit).

67 68 Spectral Clustering 1 w w Step 2: 1 M Perform K-Means on the set of,... vectors Cluster datapoints according to their clustering in.

68 70 Equivalenc to other non-linear Embeddings Spectral deomposition of the similarit matri (which is alread positive semi-definite) 1 1 e i. i.. K Kei In Isomap, the embedding is normalized b the eigenvalues and uses geodesic distance to build the similarit matri, see supplementar material.

69 Laplacian Eigenmaps Solve the generalized eigenvalue problem: L T T Solution to optimization problem: min L such that D 1. Ensures minimal distorsion while preventing arbitrar scaling. D Swissroll eample Projections on each Laplacian eigenvector i The vectors, i 1... M, form an embedding of the datapoints. Image courtes from A. Singh 71

70 72 Equivalenc to other non-linear Embeddings kernel PCA: Eigenvalue decomposition of the matri of similarit S S UDU T The choice of parameters in kernel K-Means can be initialized b doing a readout of the Gram matri after kernel PCA.

71 73 Kernel K-means and Kernel PCA The optimization problem of kernel K-means is equivalent to: T ma tr H KH, H YD H 1 2 See paper b M. Welling, supplementar document on website T Since tr H KH i, i1 : M eigenvalues, resulting from the eigenvalue decomposition i of the Gram Matri M Look at the eigenvalues to determine optimal number of clusters. Y : M K D : K K Each entr of Y is 1 if the datapoint belongs to cluster k, otherwise zero D is diagonal. Element on the diagonal is sum of the datapoint in cluster k

72 74 Kernel PCA projections can also help determine the kernel width From top to bottom Kernel width of 0.8, 1.5, 2.5

73 75 Kernel PCA projections can help determine the kernel width The sum of eigenvalue grows as we get a better clustering

74 76 Quick Recap of Gaussian Miture Model

75 77 Clustering with Miture of Gaussians Alternative to K-means; soft partitioning with elliptic clusters instead of spheres Clustering with Mitures of Gaussians using spherical Gaussians (left) and non spherical Gaussians (i.e. with full covariance matri) (right). Notice how the clusters become elongated along the direction of the clusters (the gre circles represent the first and second variances of the distributions).

76 78 Gaussian Miture Model (GMM) Using a set of M N-dimensional training datapoints i i 1,... j The pdf of X will be modeled through a miture of K Gaussians: X M j1,... N K i i i i,, with, i i, i m m m p X p X p X N i1 i i m, : mean and covariance matri of Gaussian i Miing Coefficients K i1 1 Probabilit that the data was eplained b Gaussian i: i i M j p i p i j1

77 79 Gaussian Miture Modeling The parameters of a GMM are the means, covariance matrices and prior pdf: m,... m,,...,,... 1 K 1 K 1 K Estimation of all the parameters can be done through Epectation- Maimization (E-M). E-M tries to find the optimum of the likelihood of the model given the data, i.e.: ma L X ma p X See lecture notes for details

78 80 Gaussian Miture Model

79 81 Gaussian Miture Model GMM using 4 Gaussians with random initialization

80 82 Gaussian Miture Model Epectation Maimization is ver sensitive to initial conditions: GMM using 4 Gaussians with new random initialization

81 83 Gaussian Miture Model Ver sensitive to choice of number of Gaussians. Number of Gaussians can be optimized iterativel using AIC or BIC, like for K-means: Here, GMM using 8 Gaussians

82 84 Evaluation of Clustering Methods

83 85 Evaluation of Clustering Methods Clustering methods rel on hper parameters Number of clusters Kernel parameters Need to determine the goodness of these choices Clustering is unsupervised classification Do not know the real number of clusters and the data labels Difficult to evaluate these choice without ground truth

84 86 Evaluation of Clustering Methods Two tpes of measures: Internal versus eternal measures Internal measures rel on measure of similarit (e.g. intra-cluster distance versus inter-cluster distances) E.g.: Residual Sum of Square is an internal measure (available in mldemos); Gives the squared distance of each vector from its centroid summed over all vectors. RSS= K k1 C k k m 2 Internal measures are problematic as the metric of similarit is often alread optimized b clustering algorithm

85 87 Evaluation of Clustering Methods K-Means, soft-k-means and GMM have several hperparameters: (Fied number of clusters, beta, number of Gaussian functions) Measure to determine how well the choice of hperparameters fit the dataset (maimum-likelihood measure) X : dataset; M : number of datapoints; B: number of free parameters - Aikaike Information Criterion: AIC= 2 ln L 2B - Baesian Information Criterion: BIC 2ln L Bln M L: maimum likelihood of the model given B parameters Choosing AIC versus BIC depends on the application: Is the purpose of the analsis to make predictions, or to decide which model best represents realit? AIC ma have better predictive abilit than BIC, but BIC finds a computationall more efficient solution. Lower BIC implies either fewer eplanator variables, better fit, or both. As the number of datapoints (observations) increase, BIC assigns more weights to simpler models than AIC. Penalt for increase in computational costs

86 88 Evaluation of Clustering Methods Two tpes of measures: Internal versus eternal measures Eternal measures assume that a subset of datapoints have class label and measures how well these datapoints are clustered. Needs to have an idea of the class and have labeled some datapoints Interesting onl in cases when labeling is highl time-consuming when the data is ver large (e.g. in speech recognition)

87 Evaluation of Clustering Methods Raw Data 89

88 90 Semi-Supervised Learning Clustering F-Measure: (careful: similar but not the same F-measure as the F-measure we will see for classification!) Tradeoff between clustering correctl all datapoints of the same class in the same cluster and making sure that each cluster contains points of onl one class. M : nm of datapoints, C c : the set of classes K i, ma F c, k : nm of clusters, n : nm of members of class c and of cluster k ik F C K F c, k i R c, k i P c, k i cc k i Rci k P ci k, P c, k 2,, ik i ik i c M R c k n c n k i i i i Picks for each class the cluster with the maimal number of datapoints Recall: proportion of datapoints correctl classified/clusterized Precision: proportion of datapoints of the same class in the cluster

89 91 Evaluation of Clustering Methods RSS with K-means can find the true optimal number of clusters but ver sensitive to random initialization (left and right: two different runs). RSS finds an optimum for K=4 and K=5 for the right run

90 Evaluation of Clustering Methods BIC (left) and AIC (right) perform ver poorl here, splitting some clusters into two halves. 92

91 Evaluation of Clustering Methods BIC (left) and AIC (right) perform much better for picking the right number of clusters in GMM. 93

92 94 Evaluation of Clustering Methods Raw Data

93 95 Evaluation of Clustering Methods Optimization with BIC using K-means

94 Evaluation of Clustering Methods Optimization with AIC using K-means AIC tends to find more clusters 96

95 97 Evaluation of Clustering Methods Raw Data

96 98 Evaluation of Clustering Methods Optimization with AIC using kernel K-means with RBF

97 99 Evaluation of Clustering Methods Optimization with BIC using kernel K-means with RBF

98 100 Semi-Supervised Learning Raw Data: 3 classes

99 101 Semi-Supervised Learning Clustering with RBF kernel K-Means after optimization with BIC

100 102 Semi-Supervised Learning After semi-supervised learning

101 103 Summar We have seen several methods for etracting structure in data using the notion of kernel and discussed their similarit, differences and complementarit: Kernel CCA is a generalization of kernel PCA to determine partial grouping in different dimensions. Kernel PCA can be used too bootstrap choice of hperparameters in kernel K-means We have compared the geometrical division of space ielded b Rbf and Polnomial kernels. We have seen that simpler techniques than kernel K-means such as K- means with norm-p and miture of Gaussians can ield comple non-linear clustering.

102 104 Summar When to use what? E.g. k-means versus kernel K-means versus GMM When using an machine learning algorithm, ou have to balance a number of factors: Computing time at training and at testing Number of open parameters Curse of dimensionalit (order of growth with number of datapoints, dimension, etc) Robustness to initial condition, optimalit of solution

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete

More information

Kernel PCA, clustering and canonical correlation analysis

Kernel PCA, clustering and canonical correlation analysis ernel PCA, clustering and canonical correlation analsis Le Song Machine Learning II: Advanced opics CSE 8803ML, Spring 2012 Support Vector Machines (SVM) 1 min w 2 w w + C j ξ j s. t. w j + b j 1 ξ j,

More information

INF Introduction to classifiction Anne Solberg

INF Introduction to classifiction Anne Solberg INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300

More information

Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis

Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analsis Matthew B. Blaschko, Christoph H. Lampert, & Arthur Gretton Ma Planck Institute for Biological Cbernetics Tübingen, German

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging

More information

Interaction Network Analysis

Interaction Network Analysis CSI/BIF 5330 Interaction etwork Analsis Young-Rae Cho Associate Professor Department of Computer Science Balor Universit Biological etworks Definition Maps of biochemical reactions, interactions, regulations

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.

INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image. INF 4300 700 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter -6 6inDuda and Hart: attern Classification 303 INF 4300 Introduction to classification One of the most challenging

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

15. Eigenvalues, Eigenvectors

15. Eigenvalues, Eigenvectors 5 Eigenvalues, Eigenvectors Matri of a Linear Transformation Consider a linear ( transformation ) L : a b R 2 R 2 Suppose we know that L and L Then c d because of linearit, we can determine what L does

More information

LESSON #42 - INVERSES OF FUNCTIONS AND FUNCTION NOTATION PART 2 COMMON CORE ALGEBRA II

LESSON #42 - INVERSES OF FUNCTIONS AND FUNCTION NOTATION PART 2 COMMON CORE ALGEBRA II LESSON #4 - INVERSES OF FUNCTIONS AND FUNCTION NOTATION PART COMMON CORE ALGEBRA II You will recall from unit 1 that in order to find the inverse of a function, ou must switch and and solve for. Also,

More information

Eigenvectors and Eigenvalues 1

Eigenvectors and Eigenvalues 1 Ma 2015 page 1 Eigenvectors and Eigenvalues 1 In this handout, we will eplore eigenvectors and eigenvalues. We will begin with an eploration, then provide some direct eplanation and worked eamples, and

More information

Metric-based classifiers. Nuno Vasconcelos UCSD

Metric-based classifiers. Nuno Vasconcelos UCSD Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major

More information

CSE 546 Midterm Exam, Fall 2014

CSE 546 Midterm Exam, Fall 2014 CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,

More information

Lecture 10: Dimension Reduction Techniques

Lecture 10: Dimension Reduction Techniques Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Issues and Techniques in Pattern Classification

Issues and Techniques in Pattern Classification Issues and Techniques in Pattern Classification Carlotta Domeniconi www.ise.gmu.edu/~carlotta Machine Learning Given a collection of data, a machine learner eplains the underlying process that generated

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

Section 3.1. ; X = (0, 1]. (i) f : R R R, f (x, y) = x y

Section 3.1. ; X = (0, 1]. (i) f : R R R, f (x, y) = x y Paul J. Bruillard MATH 0.970 Problem Set 6 An Introduction to Abstract Mathematics R. Bond and W. Keane Section 3.1: 3b,c,e,i, 4bd, 6, 9, 15, 16, 18c,e, 19a, 0, 1b Section 3.: 1f,i, e, 6, 1e,f,h, 13e,

More information

1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure

1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure Overview Breiman L (2001). Statistical modeling: The two cultures Statistical learning reading group Aleander L 1 Histor of statistical/machine learning 2 Supervised learning 3 Two approaches to supervised

More information

Biostatistics in Research Practice - Regression I

Biostatistics in Research Practice - Regression I Biostatistics in Research Practice - Regression I Simon Crouch 30th Januar 2007 In scientific studies, we often wish to model the relationships between observed variables over a sample of different subjects.

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Introduction Consider a zero mean random vector R n with autocorrelation matri R = E( T ). R has eigenvectors q(1),,q(n) and associated eigenvalues λ(1) λ(n). Let Q = [ q(1)

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

For each problem, place the letter choice of your answer in the spaces provided on this page.

For each problem, place the letter choice of your answer in the spaces provided on this page. Math 6 Final Exam Spring 6 Your name Directions: For each problem, place the letter choice of our answer in the spaces provided on this page...... 6. 7. 8. 9....... 6. 7. 8. 9....... B signing here, I

More information

Unit 12 Study Notes 1 Systems of Equations

Unit 12 Study Notes 1 Systems of Equations You should learn to: Unit Stud Notes Sstems of Equations. Solve sstems of equations b substitution.. Solve sstems of equations b graphing (calculator). 3. Solve sstems of equations b elimination. 4. Solve

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal

More information

Machine Learning - MT Clustering

Machine Learning - MT Clustering Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:

More information

6. Vector Random Variables

6. Vector Random Variables 6. Vector Random Variables In the previous chapter we presented methods for dealing with two random variables. In this chapter we etend these methods to the case of n random variables in the following

More information

Unsupervised Learning: K- Means & PCA

Unsupervised Learning: K- Means & PCA Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu May 2, 2017 Announcements Homework 2 due later today Due May 3 rd (11:59pm) Course project

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

1 Kernel methods & optimization

1 Kernel methods & optimization Machine Learning Class Notes 9-26-13 Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions

More information

Bayes Decision Theory - I

Bayes Decision Theory - I Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

Mathematics 309 Conic sections and their applicationsn. Chapter 2. Quadric figures. ai,j x i x j + b i x i + c =0. 1. Coordinate changes

Mathematics 309 Conic sections and their applicationsn. Chapter 2. Quadric figures. ai,j x i x j + b i x i + c =0. 1. Coordinate changes Mathematics 309 Conic sections and their applicationsn Chapter 2. Quadric figures In this chapter want to outline quickl how to decide what figure associated in 2D and 3D to quadratic equations look like.

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Statistical and Computational Analysis of Locality Preserving Projection

Statistical and Computational Analysis of Locality Preserving Projection Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637

More information

5. Zeros. We deduce that the graph crosses the x-axis at the points x = 0, 1, 2 and 4, and nowhere else. And that s exactly what we see in the graph.

5. Zeros. We deduce that the graph crosses the x-axis at the points x = 0, 1, 2 and 4, and nowhere else. And that s exactly what we see in the graph. . Zeros Eample 1. At the right we have drawn the graph of the polnomial = ( 1) ( 2) ( 4). Argue that the form of the algebraic formula allows ou to see right awa where the graph is above the -ais, where

More information

What is semi-supervised learning?

What is semi-supervised learning? What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

MRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet

MRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet MRC: The Maimum Rejection Classifier for Pattern Detection With Michael Elad, Renato Keshet 1 The Problem Pattern Detection: Given a pattern that is subjected to a particular type of variation, detect

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction. Jose A. Costa Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time

More information

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Statistical Learning. Dong Liu. Dept. EEIS, USTC Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle

More information

f x, y x 2 y 2 2x 6y 14. Then

f x, y x 2 y 2 2x 6y 14. Then SECTION 11.7 MAXIMUM AND MINIMUM VALUES 645 absolute minimum FIGURE 1 local maimum local minimum absolute maimum Look at the hills and valles in the graph of f shown in Figure 1. There are two points a,

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Ch 3 Alg 2 Note Sheet.doc 3.1 Graphing Systems of Equations

Ch 3 Alg 2 Note Sheet.doc 3.1 Graphing Systems of Equations Ch 3 Alg Note Sheet.doc 3.1 Graphing Sstems of Equations Sstems of Linear Equations A sstem of equations is a set of two or more equations that use the same variables. If the graph of each equation =.4

More information

Dimensionality Reduc1on

Dimensionality Reduc1on Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the

More information

Machine Learning. 1. Linear Regression

Machine Learning. 1. Linear Regression Machine Learning Machine Learning 1. Linear Regression Lars Schmidt-Thieme Information Sstems and Machine Learning Lab (ISMLL) Institute for Business Economics and Information Sstems & Institute for Computer

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

LESSON #12 - FORMS OF A LINE COMMON CORE ALGEBRA II

LESSON #12 - FORMS OF A LINE COMMON CORE ALGEBRA II LESSON # - FORMS OF A LINE COMMON CORE ALGEBRA II Linear functions come in a variet of forms. The two shown below have been introduced in Common Core Algebra I and Common Core Geometr. TWO COMMON FORMS

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

4 The Cartesian Coordinate System- Pictures of Equations

4 The Cartesian Coordinate System- Pictures of Equations The Cartesian Coordinate Sstem- Pictures of Equations Concepts: The Cartesian Coordinate Sstem Graphs of Equations in Two Variables -intercepts and -intercepts Distance in Two Dimensions and the Pthagorean

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

APPENDIX D Rotation and the General Second-Degree Equation

APPENDIX D Rotation and the General Second-Degree Equation APPENDIX D Rotation and the General Second-Degree Equation Rotation of Aes Invariants Under Rotation After rotation of the - and -aes counterclockwise through an angle, the rotated aes are denoted as the

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Polynomial and Rational Functions

Polynomial and Rational Functions Polnomial and Rational Functions Figure -mm film, once the standard for capturing photographic images, has been made largel obsolete b digital photograph. (credit film : modification of work b Horia Varlan;

More information

NON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS. Balaji Lakshminarayanan and Raviv Raich

NON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS. Balaji Lakshminarayanan and Raviv Raich NON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS Balaji Lakshminarayanan and Raviv Raich School of EECS, Oregon State University, Corvallis, OR 97331-551 {lakshmba,raich@eecs.oregonstate.edu}

More information

The Nyström Extension and Spectral Methods in Learning

The Nyström Extension and Spectral Methods in Learning Introduction Main Results Simulation Studies Summary The Nyström Extension and Spectral Methods in Learning New bounds and algorithms for high-dimensional data sets Patrick J. Wolfe (joint work with Mohamed-Ali

More information

Clustering, K-Means, EM Tutorial

Clustering, K-Means, EM Tutorial Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means

More information

20 Unsupervised Learning and Principal Components Analysis (PCA)

20 Unsupervised Learning and Principal Components Analysis (PCA) 116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008 Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections

More information

Review Topics for MATH 1400 Elements of Calculus Table of Contents

Review Topics for MATH 1400 Elements of Calculus Table of Contents Math 1400 - Mano Table of Contents - Review - page 1 of 2 Review Topics for MATH 1400 Elements of Calculus Table of Contents MATH 1400 Elements of Calculus is one of the Marquette Core Courses for Mathematical

More information

Learning From Data Lecture 7 Approximation Versus Generalization

Learning From Data Lecture 7 Approximation Versus Generalization Learning From Data Lecture 7 Approimation Versus Generalization The VC Dimension Approimation Versus Generalization Bias and Variance The Learning Curve M. Magdon-Ismail CSCI 4100/6100 recap: The Vapnik-Chervonenkis

More information

5.6 RATIOnAl FUnCTIOnS. Using Arrow notation. learning ObjeCTIveS

5.6 RATIOnAl FUnCTIOnS. Using Arrow notation. learning ObjeCTIveS CHAPTER PolNomiAl ANd rational functions learning ObjeCTIveS In this section, ou will: Use arrow notation. Solve applied problems involving rational functions. Find the domains of rational functions. Identif

More information

Overview. IAML: Linear Regression. Examples of regression problems. The Regression Problem

Overview. IAML: Linear Regression. Examples of regression problems. The Regression Problem 3 / 38 Overview 4 / 38 IAML: Linear Regression Nigel Goddard School of Informatics Semester The linear model Fitting the linear model to data Probabilistic interpretation of the error function Eamples

More information