MACHINE LEARNING 2012 MACHINE LEARNING
|
|
- Veronica Chandler
- 6 years ago
- Views:
Transcription
1 1 MACHINE LEARNING kernel CCA, kernel Kmeans Spectral Clustering
2 2 Change in timetable: We have practical session net week!
3 3 Structure of toda s and net week s class 1) Briefl go through some etension or variants on the principle of kernel PCA, namel kernel CCA. 2) Look at one particular set of clustering algorithms for structure discover, kernel K-Means 3) Describe the general concept of Spectral Clustering, highlighting equivalenc between kernel PCA, ISOMAP, etc and their use for clustering. 4) Introduce the notion of unsupervised and semi-supervised learning and how this can be used to evaluate clustering methods 5) Compare kernel K-means and Gaussian Miture Model for unsupervised and semi-supervised clustering
4 Canonical Correlation Analsis (CCA) N P, 1 1,, w 2 2 ma T, T corr w w w Video description Audio description Determine features in two (or more) separate descriptions of the dataset that best eplain each datapoint. Etract hidden structure that maimize correlation across two different projections. 4
5 5 Canonical Correlation Analsis (CCA) Pair of multidimensional zero mean variables We have M instances of the pairs. M i N i q, X Y i1 i1, 1 1 M Search two projections w and w : T T X ' w X and Y ' w Y solutions of: ma ma corr X',Y' w, w,, w 2 2 ma T, T corr w X w Y w
6 6 Canonical Correlation Analsis (CCA) ma ma corr X',Y' w, w ma w, w ma w, w w T w w T T E XY X w T T w C C w T w w Y w C w T Crosscovariance matri C is N q Measure crosscorrelation between X and Y. Covariance matrices T C =E XX : N N T C =E YY : q q
7 7 Canonical Correlation Analsis (CCA) ma ma corr X',Y' w, w ma w, w ma w, w w T w w T T E XY X w T w C C w T T w w Y w C w T Correlation not affected b rescaling the norm of the vectors, we can ask that w C w w C w 1 T T ma ma w C w T w, w T u. c. w C w w C w 1 T
8 8 Canonical Correlation Analsis (CCA) To determine the optimum (maimum) of, solve b Lagrange: T wc w ma T T w, 1 2 w wcwwc w CCCw Cw T T u.c. w Cw w Cw 1 Generalized Eigenvalue Problem; can be reduced to a classical eigenvalue problem if C is invertible
9 9 Kernel Canonical Correlation Analsis - CCA finds basis vectors, s.t. the correlation between the projections is mutuall maimized. generalized version of PCA for two or more multi-dimensional datasets. CCA depends on the coordinate sstem in which the variables are described. Even if strong linear relationship between variables, depending on the coordinate sstem used, this relationship might not be visible as a correlation Kernel CCA.
10 Principle of Kernel Methods Determine a metric which brings out features of the data so as to make subsequent computation easier Original Space After Lifting Data in Feature Space 2 1 Data becomes linearl separable when using a rbf kernel and projecting onto first 2 PC of kernelpca. 10
11 Principle of Kernel Methods (Recall) Idea: Send the data X into a feature space H through the nonlinear map f. In feature space, perform classical linear computation i1... M i N 1,..., M f f f X X Original Space f H Performs linear transformation in feature space 2 e 2 1 e 1 11
12 12 Kernel CCA M, i N i q X Y M M i1 i1 Project into a feature space M M M i i i i f and f, with f 0 and f 0 i1 i1 i1 i1 Construct associated kernel matrices: f K F F, K F F, columns of F, F are f, T T i i The projection vectors can be epressed as a linear combination in feature space: w F and w F
13 13 Kernel CCA Kernel CCA can then become an optimization problem of the ma ma,, KK T 1/2 1/2 T 2 T 2 K K T 2 T 2 u.c K K 1 In Linear CCA, we were solving for: ma w, w w T T w T C C w w w C w T u.c. w C w w C w 1 T
14 14 Kernel CCA Kernel CCA can then become an optimization problem of the ma ma,, KK T 1/2 1/2 T 2 T 2 K K T 2 T 2 u.c K K 1 In practice, the intersection between the spaces spanned b K, K is non-zero, then the problem has a trivial solution, as ~ cos K, K 1.
15 15 Kernel CCA Generalized eigenvalue problem: 2 0 KK K 0 2 KK 0 0 K Add a regularization term to increase the rank of the matri and make it invertible K 2 K M + I 2 2 Several methods have been proposed to choose carefull the regularizing term so as to get projections that are as close as possible to the true projections.
16 16 Kernel CCA 2 M K + I 0 0 KK 2 2 KK 0 M 0 K + I A 2 T Set: B C C and C Becomes a classical eigenvalue problem T 1 1 B C AC
17 17 Kernel CCA M, i N i q X Y Two datasets case 1 M i1 i1 Can be etended to multiple datasets: L datasets: X,..., X with M observations each L Dimensions N,... N :, i.e. X : N M 1 L i i Appling a non-linear transformation f to X,... X construct L Gram matrices: K,..., K 1 L 1 L
18 18 Kernel CCA M, i N i q X Y Two datasets case i1 i1 Was formulated as generalized eigenvalue problem: 2 M K + I 0 0 KK 2 2 KK 0 M 0 K + I A 2 B Can be etended to multiple datasets: 2 2 M K1 + I 0 0 K1K2... K1K L K2K K2K L KLK1 KLK2... KLK L 2 L 2 M L 0 KL + I 2 M
19 19 Kernel CCA M. Kuss and T, Graepel,The Geometr Of Kernel Canonical Correlation Analsis Tech. Report, Ma Planck Institute, 2003
20 20 Kernel CCA Matlab Eample File: demo_cca-kcca.m
21 Applications of Kernel CCA Goal: To measure correlation between heterogeneous datasets and to etract sets of genes which share similarities with respect to multiple biological attributes Kernel matrices K1, K2 and K3 correspond to gene-gene similarities in pathwas, genome position, and microarra epression data resp. Use RBF kernel with fied kernel width. Correlation scores in MKCCA: pathwa vs. genome vs. epression. Y Yamanishi, JP Vert, A Nakaa, M Kanehisa - Etraction of correlated gene clusters from multiple genomic data b generalized kernel canonical correlation analsis, Bioinformatics,
22 Applications of Kernel CCA Goal: To measure correlation between heterogeneous datasets and to etract sets of genes which share similarities with respect to multiple biological attributes Gives pairwise correlation between K1, K2 Gives pairwise correlation between K1, K3 Two clusters correspond to genes close to each other with respect to their positions in the pathwas, in the genome, and to their epression A readout of the entries with equal projection onto the first canonical vectors give the genes which belong to each cluster Correlation scores in MKCCA: pathwa vs. genome vs. epression. Y Yamanishi, JP Vert, A Nakaa, M Kanehisa - Etraction of correlated gene clusters from multiple genomic data b generalized kernel canonical correlation analsis, Bioinformatics,
23 Applications of Kernel CCA Goal: To construct appearance models for estimating an object s pose from raw brightness images X: Set of images Y: pose parameters (pan and tilt angle of the object w.r.t. the camera in degrees) Eample of two image datapoints with different poses Use linear kernel on X and RBF kernel on Y and compare performance to appling PCA on the (X, Y) dataset directl T. Melzer, M. Reiter and H. Bischof, Appearance models based on kernel canonical correlation analsis, Pattern Recognition 36 (2003) p
24 Applications of Kernel CCA Goal: To construct appearance models for estimating an object s pose from raw brightness images Testing/training ratio kernel-cca performs better than PCA, especiall for small k=testing/training ratio (i.e., for larger training sets). The kernel-cca estimators tend to produce less outliers, i.e., gross errors, and consequentl ield a smaller standard deviation of the pose estimation error than their PCA-based counterparts. For ver small training sets, the performance of both approaches becomes similar T. Melzer, M. Reiter and H. Bischof, Appearance models based on kernel canonical correlation analsis, Pattern Recognition 36 (2003) p
25 26 Kernel K-means Spectral Clustering
26 27 Structure Discover: Clustering Groups pair of points according to how similar these are. Densit based clustering methods (Soft K-means, Kernel K-means, Gaussian Miture Models) compare the relative distributions.
27 28 K-means K-means is a hard partitioning of the space through K clusters equidistant according to norm 2 measure. The distribution of data within each cluster is encapsulated in a sphere. *m2 *m1 *m3
28 29 K-means Algorithm k m, k 1... K,
29 30 K-means Algorithm Iterative Method (variant on Epectation-Maimization) 2. Calculate the distance from each data point to each centroid. 3. Assignment Step: Assign the responsibilit of each data point to its closest centroid (E-step). If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest winning centroid). d, m m j k j k p j k arg min d, m k
30 31 K-means Algorithm Iterative Method (variant on Epectation-Maimization) 2. Calculate the distance from each data point to each centroid. 3. Assignment Step: Assign the responsibilit of each data point to its closest centroid (E-step). If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest winning centroid). d, m m j k j k p j k arg min d, m k 4. Update Step: Adjust the centroids to be the means of all data points assigned to them (M-step) 5. Go back to step 2 and repeat the process until the clusters are stable.
31 32 K-means Clustering: Weaknesses Two hperparameters: number of clusters K and power p of the metric Ver sensitive to the choice of the number of clusters K and the initialization.
32 K-means Clustering: Hperparameters Two hperparameters: number of clusters K and power p of the metric Choice of power determines the form of the decision boundaries P=1 P=2 P=3 P=4 33
33 Kernel K-means K-Means algorithm consists of minimization of: J m k K 1 K j k p k j C m,..., m m with m k1 C k : number of datapoints in cluster j k C k k m j M i f i 1 Project into a feature space J K 1 m,..., m K f j f m k k1 C j k p j C k f m k j We cannot observe the mean in feature space. Construct the mean in feature space using image of points in same cluster 34
34 35 Kernel K-means J K 1 m,..., m K f j f m k = k1 C K j j f 2 2 l j j l f f f i j j l k,, C k1 C k k K = k, j k k k f j j l k C 2 k, j k m i i C j l k, C k1 C k k f m m j l k 2 m 2
35 36 Kernel K-means Kernel K-means algorithm is also an iterative procedure: 1. Initialization: pick K clusters 2. Assignment Step: Assign each data point to its closest centroid (E-step). If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest winning centroid) b computing the distance in feature space. 2 k, i k min, min k i i, j k C d C k k mk i j k j l, j l k, C 3. Update Step: Update the list of points belonging to each centroid (M-step) 4. Go back to step 2 and repeat the process until the clusters are stable. m k 2
36 37 Kernel K-means With a RBF kernel Cst of value 1 If i is close to all points in cluster k, this is close to 1. If the points are well grouped in cluster k, this sum is close to 1. 2 k, i k min, min k i i, j k C d C k k mk i j k j l, j l k, C m k 2 With homogeneous polnomial kernel?
37 38 Kernel K-means With a polnomial kernel Positive value Some of the terms change sign depending on their position with respect to the origin. If the points are aligned in the same Quadran, the sum is maimal 2 k, i k min, min k i i, j k C d C k k mk i j k j l, j l k, C m k 2
38 39 Kernel K-means: eamples Rbf Kernel, 2 Clusters
39 40 Kernel K-means: eamples Rbf Kernel, 2 Clusters
40 41 Kernel K-means: eamples Rbf Kernel, 2 Clusters Kernel width: 0.5 Kernel width: 0.05
41 42 Kernel K-means: eamples Polnomial Kernel, 2 Clusters
42 43 Kernel K-means: eamples Polnomial Kernel (p=8), 2 Clusters
43 Kernel K-means: eamples Polnomial Kernel, 2 Clusters Order 2 Order 4 Order 6 44
44 45 Kernel K-means: eamples Polnomial Kernel, 2 Clusters The separating line will alwas be perpendicular to the line passing b the origin (which is located at the mean of the datapoints) and parrallel to the ais of the ordinates (because of the change in sign of the cosine function in the inner product. No better than linear K-means!
45 46 Kernel K-means: eamples Polnomial Kernel, 4 Clusters Solutions found with Kernel K-means Solutions found with K-means Can onl group datapoints that do not overlap across quadrans with respect to the origin (careful, data are centered!). No better than linear K-means! (ecept less sensitive to random initialization)
46 47 Kernel K-means: Limitations Choice of number of Clusters in Kernel K-means is important
47 48 Kernel K-means: Limitations Choice of number of Clusters in Kernel K-means is important
48 49 Kernel K-means: Limitations Choice of number of Clusters in Kernel K-means is important
49 50 Limitations of kernel K-means Raw Data
50 51 Limitations of kernel K-means kernel K-means with K=3, RBF kernel
51 52 From Non-Linear Manifolds Laplacian Eigenmaps, Isomaps To Spectral Clustering
52 53 Non-Linear Manifolds PCA and Kernel PCA belong to a more general class of methods to create non-linear manifolds based on spectral decomposition. (Spectral decomposition of matrices is more frequentl referred to as an eigenvalue decomposition.) Depending on which matri we decompose, we get a different set of projections. PCA decomposes the covariance matri of the dataset generate rotation and projection in the original space kernel PCA decomposes the Gram matri partition or regroup the datapoints The Laplacian matri is a matri representation of a graph. Its spectral decomposition can be used for clustering.
53 54 Embed Data in a Graph Original dataset Graph representation of the dataset Build a similarit graph Each verte on the graph is a datapoint
54 55 Measure Distances in Graph S Construct the similarit matri S to denote whether points are close or far awa to weight the edges of the graph:
55 56 Disconnected Graphs Disconnected Graph: Two data-points are connected if: a) the similarit between them is higher than a threshold. b) or if the are k-nearest neighbors (according to the similarit metric) S
56 57 Graph Laplacian Given the similarit matri S Construct the diagonal matri D composed of the sum on each line of K: S1 i i 0 S2...0 i D i, S Mi i and then, build the Laplacian matri : L D S L is positive semi-definite spectral decomposition possible
57 58 Graph Laplacian Eigenvalue decomposition of the Laplacian matri: L UU T All eigenvalues of L are positive and the smallest eigenvalue of L is zero: If we order the eigenvalue b increasing order: M If the graph has k connected components, then the eigenvalue =0 has multiplicit k.
58 59 Spectral Clustering The multiplicit of the eigenvalue 0 determines the number of connected components in a graph. The associated eigenvectors identif these connected components. For an eigenvalue i i 0, the corresponding eigenvector e has the same value for all vertices in a component,and a different value for each one of their i 1 components. Identifing the clusters is then trivial when the similarit matri is composed of zeros and ones (as when using k-nearest neighbor). What happens when the similarit matri is full?
59 60 Spectral Clustering N N Similarit map S : S can either be binar (k-nearest neighboor) or continuous with Gaussia kernel i j 2 2 S, e i j S
60 61 Spectral Clustering 1) Build the Laplacian matri : L D S 2) Do eigenvalue decomposition of the Laplacian matri: 3) Order the eigenvalue b increasing order: M L UU T The first eigenvalue is still zero but with multiplicit 1 onl (full connected graph)! Idea: the smallest eigenvalues are close to zero and hence provide also information on the partioning of the graph (see eercise session) N N Similarit map S : S can either be binar (k-nearest neighboor) or continuous with Gaussia kernel i j 2 2 S, e i j
61 62 Spectral Clustering Eigenvalue decomposition of the Laplacian matri: L UU U T e e... e 1 e2... e e... e 1 2 K K M 1 1 i e... e 1 i K i Construct an embedding of each of the i i M datapoints through. Reduce dimensionalit b picking k<m i projections, i 1... K. With a clear partitioning of the graph, the entries in are split into sets of equal values. Each group of points with same value belong to the same partition (cluster).
62 63 Spectral Clustering Eigenvalue decomposition of the Laplacian matri: L UU U T e e... e 1 e2... e e... e 1 2 K K M 1 1 i e... e 1 i K i Construct an embedding of each of the i i M datapoints through. Reduce dimensionalit b picking k<m i projections, i 1... K. When we have a full connected graph, the entries in Y take an real value.
63 Spectral Clustering Eample: 3 datapoints in a graph composed of 2 partitions The similarit matri is S L has eigenvalue =0 with multiplicit two One solution for the two associated eigenvectors is: e e Anothersolution for the two associated eigenvectors is: e e Entries in the eigenvector for the two first datapoins are equal for the 1st set of eigenvectors for the 2nd set of eigenvectors 64
64 65 Spectral Clustering Eample: 3 datapoints in a full connected graph The similarit matri is S L has eigenvalue =0 with multiplicit 1. The second eigenvalue is small 0.04, whereas the 3rd one is large, with associated eigenvectors : e 2 1, e 0.404, e Entries in the 2 nd eigenvector for the two first datapoins are almost equal , The first two points have almost the same coordinates on the embedding. Reduce the dimensionalit b considering the smallest eigenvalue 3 1 2
65 66 Spectral Clustering Eample: 3 datapoints in a full connected graph The similarit matri is S L has eigenvalue =0 with multiplicit 1. The second and third eigenvalues are both large 2.23, with associated eigenvectors : e 2 1, e 0.57, e Entries in the 2 nd eigenvector for the two first datapoins are no longer equal , The 3 rd point is now closer to the two other points The first two points have no longer the same coordinates on the embedding
66 67 Spectral Clustering 1 w w Step 1: Embedding in 5 Idea: Points close to one another have almost the same coordinate on the eigenvectors of L with small eigenvalues. Step1: Do an eigenvalue decomposition of the Lagrange matri L and project the datapoints onto the first K eigenvectors with smallest eigenvalue (hence reducing the dimensionalit).
67 68 Spectral Clustering 1 w w Step 2: 1 M Perform K-Means on the set of,... vectors Cluster datapoints according to their clustering in.
68 70 Equivalenc to other non-linear Embeddings Spectral deomposition of the similarit matri (which is alread positive semi-definite) 1 1 e i. i.. K Kei In Isomap, the embedding is normalized b the eigenvalues and uses geodesic distance to build the similarit matri, see supplementar material.
69 Laplacian Eigenmaps Solve the generalized eigenvalue problem: L T T Solution to optimization problem: min L such that D 1. Ensures minimal distorsion while preventing arbitrar scaling. D Swissroll eample Projections on each Laplacian eigenvector i The vectors, i 1... M, form an embedding of the datapoints. Image courtes from A. Singh 71
70 72 Equivalenc to other non-linear Embeddings kernel PCA: Eigenvalue decomposition of the matri of similarit S S UDU T The choice of parameters in kernel K-Means can be initialized b doing a readout of the Gram matri after kernel PCA.
71 73 Kernel K-means and Kernel PCA The optimization problem of kernel K-means is equivalent to: T ma tr H KH, H YD H 1 2 See paper b M. Welling, supplementar document on website T Since tr H KH i, i1 : M eigenvalues, resulting from the eigenvalue decomposition i of the Gram Matri M Look at the eigenvalues to determine optimal number of clusters. Y : M K D : K K Each entr of Y is 1 if the datapoint belongs to cluster k, otherwise zero D is diagonal. Element on the diagonal is sum of the datapoint in cluster k
72 74 Kernel PCA projections can also help determine the kernel width From top to bottom Kernel width of 0.8, 1.5, 2.5
73 75 Kernel PCA projections can help determine the kernel width The sum of eigenvalue grows as we get a better clustering
74 76 Quick Recap of Gaussian Miture Model
75 77 Clustering with Miture of Gaussians Alternative to K-means; soft partitioning with elliptic clusters instead of spheres Clustering with Mitures of Gaussians using spherical Gaussians (left) and non spherical Gaussians (i.e. with full covariance matri) (right). Notice how the clusters become elongated along the direction of the clusters (the gre circles represent the first and second variances of the distributions).
76 78 Gaussian Miture Model (GMM) Using a set of M N-dimensional training datapoints i i 1,... j The pdf of X will be modeled through a miture of K Gaussians: X M j1,... N K i i i i,, with, i i, i m m m p X p X p X N i1 i i m, : mean and covariance matri of Gaussian i Miing Coefficients K i1 1 Probabilit that the data was eplained b Gaussian i: i i M j p i p i j1
77 79 Gaussian Miture Modeling The parameters of a GMM are the means, covariance matrices and prior pdf: m,... m,,...,,... 1 K 1 K 1 K Estimation of all the parameters can be done through Epectation- Maimization (E-M). E-M tries to find the optimum of the likelihood of the model given the data, i.e.: ma L X ma p X See lecture notes for details
78 80 Gaussian Miture Model
79 81 Gaussian Miture Model GMM using 4 Gaussians with random initialization
80 82 Gaussian Miture Model Epectation Maimization is ver sensitive to initial conditions: GMM using 4 Gaussians with new random initialization
81 83 Gaussian Miture Model Ver sensitive to choice of number of Gaussians. Number of Gaussians can be optimized iterativel using AIC or BIC, like for K-means: Here, GMM using 8 Gaussians
82 84 Evaluation of Clustering Methods
83 85 Evaluation of Clustering Methods Clustering methods rel on hper parameters Number of clusters Kernel parameters Need to determine the goodness of these choices Clustering is unsupervised classification Do not know the real number of clusters and the data labels Difficult to evaluate these choice without ground truth
84 86 Evaluation of Clustering Methods Two tpes of measures: Internal versus eternal measures Internal measures rel on measure of similarit (e.g. intra-cluster distance versus inter-cluster distances) E.g.: Residual Sum of Square is an internal measure (available in mldemos); Gives the squared distance of each vector from its centroid summed over all vectors. RSS= K k1 C k k m 2 Internal measures are problematic as the metric of similarit is often alread optimized b clustering algorithm
85 87 Evaluation of Clustering Methods K-Means, soft-k-means and GMM have several hperparameters: (Fied number of clusters, beta, number of Gaussian functions) Measure to determine how well the choice of hperparameters fit the dataset (maimum-likelihood measure) X : dataset; M : number of datapoints; B: number of free parameters - Aikaike Information Criterion: AIC= 2 ln L 2B - Baesian Information Criterion: BIC 2ln L Bln M L: maimum likelihood of the model given B parameters Choosing AIC versus BIC depends on the application: Is the purpose of the analsis to make predictions, or to decide which model best represents realit? AIC ma have better predictive abilit than BIC, but BIC finds a computationall more efficient solution. Lower BIC implies either fewer eplanator variables, better fit, or both. As the number of datapoints (observations) increase, BIC assigns more weights to simpler models than AIC. Penalt for increase in computational costs
86 88 Evaluation of Clustering Methods Two tpes of measures: Internal versus eternal measures Eternal measures assume that a subset of datapoints have class label and measures how well these datapoints are clustered. Needs to have an idea of the class and have labeled some datapoints Interesting onl in cases when labeling is highl time-consuming when the data is ver large (e.g. in speech recognition)
87 Evaluation of Clustering Methods Raw Data 89
88 90 Semi-Supervised Learning Clustering F-Measure: (careful: similar but not the same F-measure as the F-measure we will see for classification!) Tradeoff between clustering correctl all datapoints of the same class in the same cluster and making sure that each cluster contains points of onl one class. M : nm of datapoints, C c : the set of classes K i, ma F c, k : nm of clusters, n : nm of members of class c and of cluster k ik F C K F c, k i R c, k i P c, k i cc k i Rci k P ci k, P c, k 2,, ik i ik i c M R c k n c n k i i i i Picks for each class the cluster with the maimal number of datapoints Recall: proportion of datapoints correctl classified/clusterized Precision: proportion of datapoints of the same class in the cluster
89 91 Evaluation of Clustering Methods RSS with K-means can find the true optimal number of clusters but ver sensitive to random initialization (left and right: two different runs). RSS finds an optimum for K=4 and K=5 for the right run
90 Evaluation of Clustering Methods BIC (left) and AIC (right) perform ver poorl here, splitting some clusters into two halves. 92
91 Evaluation of Clustering Methods BIC (left) and AIC (right) perform much better for picking the right number of clusters in GMM. 93
92 94 Evaluation of Clustering Methods Raw Data
93 95 Evaluation of Clustering Methods Optimization with BIC using K-means
94 Evaluation of Clustering Methods Optimization with AIC using K-means AIC tends to find more clusters 96
95 97 Evaluation of Clustering Methods Raw Data
96 98 Evaluation of Clustering Methods Optimization with AIC using kernel K-means with RBF
97 99 Evaluation of Clustering Methods Optimization with BIC using kernel K-means with RBF
98 100 Semi-Supervised Learning Raw Data: 3 classes
99 101 Semi-Supervised Learning Clustering with RBF kernel K-Means after optimization with BIC
100 102 Semi-Supervised Learning After semi-supervised learning
101 103 Summar We have seen several methods for etracting structure in data using the notion of kernel and discussed their similarit, differences and complementarit: Kernel CCA is a generalization of kernel PCA to determine partial grouping in different dimensions. Kernel PCA can be used too bootstrap choice of hperparameters in kernel K-means We have compared the geometrical division of space ielded b Rbf and Polnomial kernels. We have seen that simpler techniques than kernel K-means such as K- means with norm-p and miture of Gaussians can ield comple non-linear clustering.
102 104 Summar When to use what? E.g. k-means versus kernel K-means versus GMM When using an machine learning algorithm, ou have to balance a number of factors: Computing time at training and at testing Number of open parameters Curse of dimensionalit (order of growth with number of datapoints, dimension, etc) Robustness to initial condition, optimalit of solution
MACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete
More informationKernel PCA, clustering and canonical correlation analysis
ernel PCA, clustering and canonical correlation analsis Le Song Machine Learning II: Advanced opics CSE 8803ML, Spring 2012 Support Vector Machines (SVM) 1 min w 2 w w + C j ξ j s. t. w j + b j 1 ξ j,
More informationINF Introduction to classifiction Anne Solberg
INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300
More informationSemi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis
Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analsis Matthew B. Blaschko, Christoph H. Lampert, & Arthur Gretton Ma Planck Institute for Biological Cbernetics Tübingen, German
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More informationInteraction Network Analysis
CSI/BIF 5330 Interaction etwork Analsis Young-Rae Cho Associate Professor Department of Computer Science Balor Universit Biological etworks Definition Maps of biochemical reactions, interactions, regulations
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationINF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.
INF 4300 700 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter -6 6inDuda and Hart: attern Classification 303 INF 4300 Introduction to classification One of the most challenging
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationLECTURE NOTE #11 PROF. ALAN YUILLE
LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More information15. Eigenvalues, Eigenvectors
5 Eigenvalues, Eigenvectors Matri of a Linear Transformation Consider a linear ( transformation ) L : a b R 2 R 2 Suppose we know that L and L Then c d because of linearit, we can determine what L does
More informationLESSON #42 - INVERSES OF FUNCTIONS AND FUNCTION NOTATION PART 2 COMMON CORE ALGEBRA II
LESSON #4 - INVERSES OF FUNCTIONS AND FUNCTION NOTATION PART COMMON CORE ALGEBRA II You will recall from unit 1 that in order to find the inverse of a function, ou must switch and and solve for. Also,
More informationEigenvectors and Eigenvalues 1
Ma 2015 page 1 Eigenvectors and Eigenvalues 1 In this handout, we will eplore eigenvectors and eigenvalues. We will begin with an eploration, then provide some direct eplanation and worked eamples, and
More informationMetric-based classifiers. Nuno Vasconcelos UCSD
Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major
More informationCSE 546 Midterm Exam, Fall 2014
CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,
More informationLecture 10: Dimension Reduction Techniques
Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationIssues and Techniques in Pattern Classification
Issues and Techniques in Pattern Classification Carlotta Domeniconi www.ise.gmu.edu/~carlotta Machine Learning Given a collection of data, a machine learner eplains the underlying process that generated
More informationNonlinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationNonlinear Dimensionality Reduction
Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the
More informationNon-linear Dimensionality Reduction
Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)
More informationSection 3.1. ; X = (0, 1]. (i) f : R R R, f (x, y) = x y
Paul J. Bruillard MATH 0.970 Problem Set 6 An Introduction to Abstract Mathematics R. Bond and W. Keane Section 3.1: 3b,c,e,i, 4bd, 6, 9, 15, 16, 18c,e, 19a, 0, 1b Section 3.: 1f,i, e, 6, 1e,f,h, 13e,
More information1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure
Overview Breiman L (2001). Statistical modeling: The two cultures Statistical learning reading group Aleander L 1 Histor of statistical/machine learning 2 Supervised learning 3 Two approaches to supervised
More informationBiostatistics in Research Practice - Regression I
Biostatistics in Research Practice - Regression I Simon Crouch 30th Januar 2007 In scientific studies, we often wish to model the relationships between observed variables over a sample of different subjects.
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationPrincipal Component Analysis
Principal Component Analysis Introduction Consider a zero mean random vector R n with autocorrelation matri R = E( T ). R has eigenvectors q(1),,q(n) and associated eigenvalues λ(1) λ(n). Let Q = [ q(1)
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationFor each problem, place the letter choice of your answer in the spaces provided on this page.
Math 6 Final Exam Spring 6 Your name Directions: For each problem, place the letter choice of our answer in the spaces provided on this page...... 6. 7. 8. 9....... 6. 7. 8. 9....... B signing here, I
More informationUnit 12 Study Notes 1 Systems of Equations
You should learn to: Unit Stud Notes Sstems of Equations. Solve sstems of equations b substitution.. Solve sstems of equations b graphing (calculator). 3. Solve sstems of equations b elimination. 4. Solve
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal
More informationMachine Learning - MT Clustering
Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:
More information6. Vector Random Variables
6. Vector Random Variables In the previous chapter we presented methods for dealing with two random variables. In this chapter we etend these methods to the case of n random variables in the following
More informationUnsupervised Learning: K- Means & PCA
Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu May 2, 2017 Announcements Homework 2 due later today Due May 3 rd (11:59pm) Course project
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More information1 Kernel methods & optimization
Machine Learning Class Notes 9-26-13 Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions
More informationBayes Decision Theory - I
Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha
More informationMathematics 309 Conic sections and their applicationsn. Chapter 2. Quadric figures. ai,j x i x j + b i x i + c =0. 1. Coordinate changes
Mathematics 309 Conic sections and their applicationsn Chapter 2. Quadric figures In this chapter want to outline quickl how to decide what figure associated in 2D and 3D to quadratic equations look like.
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationStatistical and Computational Analysis of Locality Preserving Projection
Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637
More information5. Zeros. We deduce that the graph crosses the x-axis at the points x = 0, 1, 2 and 4, and nowhere else. And that s exactly what we see in the graph.
. Zeros Eample 1. At the right we have drawn the graph of the polnomial = ( 1) ( 2) ( 4). Argue that the form of the algebraic formula allows ou to see right awa where the graph is above the -ais, where
More informationWhat is semi-supervised learning?
What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationMRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet
MRC: The Maimum Rejection Classifier for Pattern Detection With Michael Elad, Renato Keshet 1 The Problem Pattern Detection: Given a pattern that is subjected to a particular type of variation, detect
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More information6.867 Machine learning
6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationNonlinear Dimensionality Reduction. Jose A. Costa
Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time
More informationStatistical Learning. Dong Liu. Dept. EEIS, USTC
Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle
More informationf x, y x 2 y 2 2x 6y 14. Then
SECTION 11.7 MAXIMUM AND MINIMUM VALUES 645 absolute minimum FIGURE 1 local maimum local minimum absolute maimum Look at the hills and valles in the graph of f shown in Figure 1. There are two points a,
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if
More informationL26: Advanced dimensionality reduction
L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern
More informationData Analysis and Manifold Learning Lecture 7: Spectral Clustering
Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral
More informationMachine Learning Lecture 3
Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationCh 3 Alg 2 Note Sheet.doc 3.1 Graphing Systems of Equations
Ch 3 Alg Note Sheet.doc 3.1 Graphing Sstems of Equations Sstems of Linear Equations A sstem of equations is a set of two or more equations that use the same variables. If the graph of each equation =.4
More informationDimensionality Reduc1on
Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the
More informationMachine Learning. 1. Linear Regression
Machine Learning Machine Learning 1. Linear Regression Lars Schmidt-Thieme Information Sstems and Machine Learning Lab (ISMLL) Institute for Business Economics and Information Sstems & Institute for Computer
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationLESSON #12 - FORMS OF A LINE COMMON CORE ALGEBRA II
LESSON # - FORMS OF A LINE COMMON CORE ALGEBRA II Linear functions come in a variet of forms. The two shown below have been introduced in Common Core Algebra I and Common Core Geometr. TWO COMMON FORMS
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More information4 The Cartesian Coordinate System- Pictures of Equations
The Cartesian Coordinate Sstem- Pictures of Equations Concepts: The Cartesian Coordinate Sstem Graphs of Equations in Two Variables -intercepts and -intercepts Distance in Two Dimensions and the Pthagorean
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationAPPENDIX D Rotation and the General Second-Degree Equation
APPENDIX D Rotation and the General Second-Degree Equation Rotation of Aes Invariants Under Rotation After rotation of the - and -aes counterclockwise through an angle, the rotated aes are denoted as the
More informationLEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach
LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationHYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH
HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationPolynomial and Rational Functions
Polnomial and Rational Functions Figure -mm film, once the standard for capturing photographic images, has been made largel obsolete b digital photograph. (credit film : modification of work b Horia Varlan;
More informationNON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS. Balaji Lakshminarayanan and Raviv Raich
NON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS Balaji Lakshminarayanan and Raviv Raich School of EECS, Oregon State University, Corvallis, OR 97331-551 {lakshmba,raich@eecs.oregonstate.edu}
More informationThe Nyström Extension and Spectral Methods in Learning
Introduction Main Results Simulation Studies Summary The Nyström Extension and Spectral Methods in Learning New bounds and algorithms for high-dimensional data sets Patrick J. Wolfe (joint work with Mohamed-Ali
More informationClustering, K-Means, EM Tutorial
Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means
More information20 Unsupervised Learning and Principal Components Analysis (PCA)
116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationAdvances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008
Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections
More informationReview Topics for MATH 1400 Elements of Calculus Table of Contents
Math 1400 - Mano Table of Contents - Review - page 1 of 2 Review Topics for MATH 1400 Elements of Calculus Table of Contents MATH 1400 Elements of Calculus is one of the Marquette Core Courses for Mathematical
More informationLearning From Data Lecture 7 Approximation Versus Generalization
Learning From Data Lecture 7 Approimation Versus Generalization The VC Dimension Approimation Versus Generalization Bias and Variance The Learning Curve M. Magdon-Ismail CSCI 4100/6100 recap: The Vapnik-Chervonenkis
More information5.6 RATIOnAl FUnCTIOnS. Using Arrow notation. learning ObjeCTIveS
CHAPTER PolNomiAl ANd rational functions learning ObjeCTIveS In this section, ou will: Use arrow notation. Solve applied problems involving rational functions. Find the domains of rational functions. Identif
More informationOverview. IAML: Linear Regression. Examples of regression problems. The Regression Problem
3 / 38 Overview 4 / 38 IAML: Linear Regression Nigel Goddard School of Informatics Semester The linear model Fitting the linear model to data Probabilistic interpretation of the error function Eamples
More information