Machine learning for pervasive systems Classification in high-dimensional spaces

Size: px
Start display at page:

Download "Machine learning for pervasive systems Classification in high-dimensional spaces"

Transcription

1 Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering Version 1.1,

2 Lecture overview Introduction Decision Trees Regression Training Principles, Performance Evaluation PCA, LSI and SVM Instance-based learning Artificial Neural Networks No lecture (ELEC christmas party) Probabilistic Graphical Models Unsupervised learning Anomaly detection, online-learning, recommender systems Trend mining and Topic Models Feature selection Presentations 2 / 32

3 Outline Principle Component Analysis Latent Semantic Indexing Support Vector Machines Cost function Hypothesis Kernels 3 / 32

4 Principle Component Analysis Principal Component Analysis Find lower dimensional surface onto which to project the data 4 / 32

5 Principle Component Analysis Principal Component Analysis Find lower dimensional surface onto which to project the data 4 / 32

6 Principle Component Analysis Principal Component Analysis Find lower dimensional surface onto which to project the data 4 / 32

7 Principle Component Analysis Principal Component Analysis Find lower dimensional surface onto which to project the data 4 / 32

8 Principle Component Analysis PCA finds k vectors v (1),..., v (k) onto which to project the data such that the projection error is minimized. 5 / 32

9 Principle Component Analysis PCA finds k vectors v (1),..., v (k) onto which to project the data such that the projection error is minimized. In particular, we find values z (i) to represent the x (i) in this k-dimensional vector space spanned by the v (i) 5 / 32

10 Principle Component Analysis 1 Compute the covariance matrix from the x (i) : C = 1 m n (x (i)) }{{} (x (i)) T }{{} i=1 1 m-dim. m 1-dim. } {{ } m m-dim. (We assume that all features are mean-normalized and scaled into [0, 1]) 6 / 32

11 Principle Component Analysis 1 Compute the covariance matrix from the x (i) : C = 1 m n (x (i)) }{{} (x (i)) T }{{} i=1 1 m-dim. m 1-dim. } {{ } m m-dim. (We assume that all features are mean-normalized and scaled into [0, 1]) Covariance A measure of spread of a set of points around their center of mass 6 / 32

12 Principle Component Analysis 1 Compute the covariance matrix from the x (i) : C = 1 m n (x (i)) }{{} (x (i)) T }{{} i=1 1 m-dim. m 1-dim. } {{ } m m-dim. (We assume that all features are mean-normalized and scaled into [0, 1]) 2 The pricipal components are found by computing the eigenvectors and eigenvalues of C (solving (C λi m)u = 0) 6 / 32

13 Principle Component Analysis 1 Compute the covariance matrix from the x (i) : When a matrix C is C = 1 multiplied n (x (i)) with a (x (i)) vector T u, this usually results in a new vector Cu m of different direction than u. i=1 }{{}}{{} 1 m-dim. m 1-dim. }{{} m m-dim. (We assume that all features are mean-normalized and scaled into [0, 1]) 2 The pricipal components are found by computing the eigenvectors and eigenvalues of C (solving (C λi m)u = 0) 6 / 32

14 Principle Component Analysis 1 Compute the covariance matrix from the x (i) : When a matrix C is C = 1 multiplied n (x (i)) with a (x (i)) vector T u, this usually results in a new vector Cu m of different direction than u. i=1 }{{}}{{} There are few vectors u, however, which have the same 1 m-dim. m 1-dim. direction (Cu = λu). }{{} m m-dim. These are the eigenvectors of C and λ are the eigenvalues (We assume that all features are mean-normalized and scaled into [0, 1]) 2 The pricipal components are found by computing the eigenvectors and eigenvalues of C (solving (C λi m)u = 0) 6 / 32

15 Principle Component Analysis 1 Compute the covariance matrix from the x (i) : C = 1 m n (x (i)) }{{} (x (i)) T }{{} i=1 1 m-dim. m 1-dim. } {{ } m m-dim. (We assume that all features are mean-normalized and scaled into [0, 1]) 2 The pricipal components are found by computing the eigenvectors and eigenvalues of C (solving (C λi m)u = 0) Eigenvectors and Eigenvalues The (orthogonal) eigenvectors are sorted by their eigenvalues with respect to the direction of greatest variance in the data. 6 / 32

16 Principle Component Analysis 1 Compute the covariance matrix from the x (i) : C = 1 m n (x (i)) }{{} (x (i)) T }{{} i=1 1 m-dim. m 1-dim. } {{ } m m-dim. (We assume that all features are mean-normalized and scaled into [0, 1]) 2 The pricipal components are found by computing the eigenvectors and eigenvalues of C (solving (C λi m)u = 0) 3 Choose the k eigenvectors with largest eigenvalues to represent the projection space U 6 / 32

17 Principle Component Analysis 1 Compute the covariance matrix from the x (i) : C = 1 n (x (i)) (x (i)) T m i=1 }{{}}{{} 1 m-dim. m 1-dim. }{{} m m-dim. (We assume that all features are mean-normalized and scaled into [0, 1]) 2 The pricipal components are found by computing the eigenvectors and eigenvalues of C (solving (C λi m)u = 0) 3 Choose the k eigenvectors with largest eigenvalues to represent the projection space U 4 These k eigenvectors in U are used to transform the inputs x i to z i : z (i) = U T x (i) 6 / 32

18 Principle Component Analysis How to choose the number k of dimensions? We can calculate Average squared projection error Total variation in the data m i=1 x (i) x (i) m i=1 x (i) 2 1 m approx 2 as the accuracy of the projection using k principle components as a function of the eigenvalues k i=1 λi m j=1 λj = d 7 / 32

19 Principle Component Analysis How to choose the number k of dimensions? We can calculate Average squared projection error Total variation in the data m i=1 x (i) x (i) m i=1 x (i) 2 1 m approx 2 as the accuracy of the projection using k principle components as a function of the eigenvalues k i=1 λi m = d j=1 λj We say that 100 (1 d)% of variance is retained. (Typically, d [0.01, 0.05] ) 7 / 32

20 Outline Principle Component Analysis Latent Semantic Indexing Support Vector Machines Cost function Hypothesis Kernels 8 / 32

21 Latent Semantic Indexing Motivation In information retrieval, a common task is to obtain from many documents that subset which best matches a query 9 / 32

22 Latent Semantic Indexing Motivation In information retrieval, a common task is to obtain from many documents that subset which best matches a query Typical feature representations of documents are then term-document matrices: 9 / 32

23 Latent Semantic Indexing Motivation In information retrieval, a common task is to obtain from many documents that subset which best matches a query Typical feature representations of documents are then term-document matrices: 9 / 32

24 Latent Semantic Indexing Motivation In information retrieval, a common task is to obtain from many documents that subset which best matches a query Typical feature representations of documents are then term-document matrices: These matrices are typically huge but sparse. 9 / 32

25 Latent Semantic Indexing Motivation In information retrieval, a common task is to obtain from many documents that subset which best matches a query Typical feature representations of documents are then term-document matrices: These matrices are typically huge but sparse. How to identify those feature dimensions (or combinations thereof) which are most meaningful? 9 / 32

26 Latent Semantic Indexing Singular Value Decomposition Any m n matrix C can be represented as a singular value decomposition in the form C = UΣV T where U m m matrix; columns are the orthog. eigenvectors of CC T V n n matrix; columns are the orthog. eigenvectors of C T C Σ Diagonal Matrix with Σ ii = λ i ; Σ ij = 0, i j 10 / 32

27 Latent Semantic Indexing Singular Value Decomposition Any m n matrix C can be represented as a singular value decomposition in the form C = UΣV T where U m m matrix; columns are the orthog. eigenvectors of CC T V n n matrix; columns are the orthog. eigenvectors of C T C Σ Diagonal Matrix with Σ ii = λ i ; Σ ij = 0, i j 10 / 32

28 Latent Semantic Indexing Singular Value Decomposition Any m n matrix C can be represented as a singular value decomposition in the form C = UΣV T where U m m matrix; columns are the orthog. eigenvectors of CC T V n n matrix; columns are the orthog. eigenvectors of C T C Σ Diagonal Matrix with Σ ii = λ i ; Σ ij = 0, i j First k eigenvectors map document vectors to lower dimensional representation It can be shown that this mapping resuls in the k-dim. space with smallest distance to the original space 10 / 32

29 Latent Semantic Indexing Example U: Σ: V T : 11 / 32

30 Latent Semantic Indexing Example U: Σ: V T : 11 / 32

31 Latent Semantic Indexing Example Σ: C 2 : Cosine-similarity 12 / 32

32 13 / 32

33 Outline Principle Component Analysis Latent Semantic Indexing Support Vector Machines Cost function Hypothesis Kernels 14 / 32

34 For our previous classifier, we have designed an objective function of sufficient dimension 15 / 32

35 For our previous classifier, we have designed an objective function of sufficient dimension Alternative to designing complex non-linear functions: Change dimension of input space so that linear separator is again possible 15 / 32

36 16 / 32

37 SVM pre-processes data to represent patterns in a high dimension Dimension often much higher than original feature space Then, insert hyperplane in order to separate the data 17 / 32

38 The goal for support vector machines is to find a separating hyperplane with the largest margin to the outer points in all sets If no such hyperplane exists, map all points into a higher dimensional space until such a plane exists 18 / 32

39 Simple application to several classes by iterative approach: One-versus-all belongs to class 1 or not? belongs to class 2 or not?... Search for optimum mapping between input space and feature space complicated (no optimum approach known) 19 / 32

40 Outline Principle Component Analysis Latent Semantic Indexing Support Vector Machines Cost function Hypothesis Kernels 20 / 32

41 Cost function Contribution of a single sample to the overall cost: 21 / 32

42 Cost function Contribution of a single sample to the overall cost: Logistic regression 1 y log 1 + e W T x ( (1 y) log e W T x ) 21 / 32

43 Cost function Contribution of a single sample to the overall cost: Logistic regression SVM 1 y log 1 + e W T x ( (1 y) log e W T x y cost y=1 (W T x) + (1 y) cost y=0 (W T x) ) 21 / 32

44 Cost function min W min W. 1 [ m ( 1 m i=1 y i log C m i=1 ) 1 + (1 y 1+e W T x i ) i Logistic regression ( log ( 1 ))] 1 + λ n 1+e W T x i 2m j=1 w j 2 SVM [ yi cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) ] + 1 n 2 j=1 w j 2 1 C here plays a similar role as 1 λ 22 / 32

45 Outline Principle Component Analysis Latent Semantic Indexing Support Vector Machines Cost function Hypothesis Kernels 23 / 32

46 SVM hypothesis min W C m i=1 [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) n j=1 w 2 j 24 / 32

47 SVM hypothesis min W C m i=1 [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) n j=1 w 2 j 24 / 32

48 SVM hypothesis W T x { 0 < 0 sufficient min W C m i=1 [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) n j=1 w 2 j 24 / 32

49 SVM hypothesis W T x { 1 1 confidence min W C m i=1 [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) n j=1 w 2 j 24 / 32

50 SVM hypothesis W T x { 1 1 confidence Outliers: Elastic decision boundary min W C m i=1 [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) n j=1 w 2 j 24 / 32

51 SVM hypothesis W T x { 1 1 confidence Outliers: Elastic decision boundary min W C m i=1 [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) n j=1 w 2 j 24 / 32

52 SVM hypothesis W T x { 1 1 confidence Outliers: Elastic decision boundary large C stricter boundary at the cost of smaller margin min W C m i=1 [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) n j=1 w 2 j 24 / 32

53 SVM hypothesis W T x { 1 1 confidence Outliers: Elastic decision boundary small C tolerates outliers min W C m i=1 [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) n j=1 w 2 j 24 / 32

54 Large margin classifier min C m [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) + 1 W 2 i=1 n j=1 w 2 j 25 / 32

55 Large margin classifier min C m [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) + 1 W 2 i=1 n j=1 w 2 j Rewrite the SVM optimisation problem as min W 1 n 2 j=1 w j 2 s.t. W T x i 1 if y i = 1 W T x i 1 if y i = 0 25 / 32

56 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 26 / 32

57 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 26 / 32

58 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 26 / 32

59 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 W T x = w 1 x 1 + w 2 x 2 26 / 32

60 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 W T x = w 1 x 1 + w 2 x 2 = W p 26 / 32

61 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 W T x = w 1 x 1 + w 2 x 2 = W p 26 / 32

62 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 Which decision boundaray is found? 26 / 32

63 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 Which decision boundaray is found? 26 / 32

64 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 Which decision boundaray is found? h(x) = w 1 x 1 + w 2 x 2 W orthogonal to all x with h(x) = 0 26 / 32

65 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 Which decision boundaray is found? h(x) = w 1 x 1 + w 2 x 2 W orthogonal to all x with h(x) = 0 26 / 32

66 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 Which decision boundaray is found? h(x) = w 1 x 1 + w 2 x 2 W orthogonal to all x with h(x) = 0 min 1 2 W 2 and W p i 1 necessitate larger p i 26 / 32

67 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 Which decision boundaray is found? h(x) = w 1 x 1 + w 2 x 2 W orthogonal to all x with h(x) = 0 min 1 2 W 2 and W p i 1 necessitate larger p i 26 / 32

68 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 Which decision boundaray is found? h(x) = w 1 x 1 + w 2 x 2 W orthogonal to all x with h(x) = 0 min 1 2 W 2 and W p i 1 necessitate larger p i 26 / 32

69 Outline Principle Component Analysis Latent Semantic Indexing Support Vector Machines Cost function Hypothesis Kernels 27 / 32

70 Kernels Non linear decision boundary 28 / 32

71 Kernels Non linear decision boundary Hypothesis = 1 if w 0 +w 1 x 1 +w 2 x 2 +w 3 x 1 x 2 +w 4 x / 32

72 Kernels Non linear decision boundary Hypothesis = 1 if w 0 +w 1 x 1 +w 2 x 2 +w 3 x 1 x 2 +w 4 x w 0 + w 1 k 1 + w 2 k 2 + w 3 k / 32

73 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Kernel Define kernel via landmarks 28 / 32

74 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Kernel Define kernel via landmarks 28 / 32

75 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Gaussian: k(x, l i ) = e x l i 2 2σ 2 28 / 32

76 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Gaussian: k(x, l i ) = e x l i 2 2σ 2 x l i k(x, l i ) 1 (towards 0 else) 28 / 32

77 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Gaussian: k(x, l i ) = e x l i 2 2σ 2 x l i k(x, l i ) 1 (towards 0 else) Example: w 0 = 0.5, w 1 = 1, w 2 = 1, w 3 = 0 h(x) = w 0 + w 1 k(x, l 1 ) + w 2 k(x, l 2 ) + w 3 k(x, l 3 ) 28 / 32

78 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Gaussian: k(x, l i ) = e x l i 2 2σ 2 x l i k(x, l i ) 1 (towards 0 else) Example: w 0 = 0.5, w 1 = 1, w 2 = 1, w 3 = 0 h(x) = w 0 + w 1 k(x, l 1 ) + w 2 k(x, l 2 ) + w 3 k(x, l 3 ) σ = 1 σ = 0.5 σ = 2 28 / 32

79 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Gaussian: k(x, l i ) = e x l i 2 2σ 2 x l i k(x, l i ) 1 (towards 0 else) Example: w 0 = 0.5, w 1 = 1, w 2 = 1, w 3 = 0 h(x) = w 0 + w 1 k(x, l 1 ) + w 2 k(x, l 2 ) + w 3 k(x, l 3 ) σ = 1 σ = 0.5 σ = 2 28 / 32

80 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Gaussian: k(x, l i ) = e x l i 2 2σ 2 x l i k(x, l i ) 1 (towards 0 else) Example: w 0 = 0.5, w 1 = 1, w 2 = 1, w 3 = 0 h(x) = w 0 + w 1 k(x, l 1 ) + w 2 k(x, l 2 ) + w 3 k(x, l 3 ) σ = 1 σ = 0.5 σ = 2 28 / 32

81 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Gaussian: k(x, l i ) = e x l i 2 2σ 2 x l i k(x, l i ) 1 (towards 0 else) σ controls the width of the Gaussian Example: w 0 = 0.5, w 1 = 1, w 2 = 1, w 3 = 0 h(x) = w 0 + w 1 k(x, l 1 ) + w 2 k(x, l 2 ) + w 3 k(x, l 3 ) σ = 1 σ = 0.5 σ = 2 28 / 32

82 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Gaussian: k(x, l i ) = e x l i 2 2σ 2 x l i k(x, l i ) 1 (towards 0 else) σ controls the width of the Gaussian Example: w 0 = 0.5, w 1 = 1, w 2 = 1, w 3 = 0 h(x) = w 0 + w 1 k(x, l 1 ) + w 2 k(x, l 2 ) + w 3 k(x, l 3 ) σ = 1 σ = 0.5 σ = 2 28 / 32

83 Kernels placement of landmarks Possible choice of initial landmarks: All training-set samples Training of w i f i = k(x i, l 1 ). k(x i, l m ) min C m y i cost yi =1(W T f i ) + (1 y i ) cost yi =0(W T f i ) + 1 W 2 i=1 m j=1 w 2 j 29 / 32

84 Outline Principle Component Analysis Latent Semantic Indexing Support Vector Machines Cost function Hypothesis Kernels 30 / 32

85 Questions? Le Nguyen 31 / 32

86 Literature C.M. Bishop: Pattern recognition and machine learning, Springer, R.O. Duda, P.E. Hart, D.G. Stork: Pattern Classification, Wiley, / 32

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Université du Littoral- Calais July 11, 16 First..... to the

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Information Search and Management Prof. Chris Clifton 6 September 2017 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group 1 Vector Space Model Disadvantages:

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

c 4, < y 2, 1 0, otherwise,

c 4, < y 2, 1 0, otherwise, Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Course in Data Science

Course in Data Science Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an

More information

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System processes System Overview Previous Systems:

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to: System 2 : Modelling & Recognising Modelling and Recognising Classes of Classes of Shapes Shape : PDM & PCA All the same shape? System 1 (last lecture) : limited to rigidly structured shapes System 2 :

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Intelligent Data Analysis Lecture Notes on Document Mining

Intelligent Data Analysis Lecture Notes on Document Mining Intelligent Data Analysis Lecture Notes on Document Mining Peter Tiňo Representing Textual Documents as Vectors Our next topic will take us to seemingly very different data spaces - those of textual documents.

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

ECE-271B. Nuno Vasconcelos ECE Department, UCSD

ECE-271B. Nuno Vasconcelos ECE Department, UCSD ECE-271B Statistical ti ti Learning II Nuno Vasconcelos ECE Department, UCSD The course the course is a graduate level course in statistical learning in SLI we covered the foundations of Bayesian or generative

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Matching the dimensionality of maps with that of the data

Matching the dimensionality of maps with that of the data Matching the dimensionality of maps with that of the data COLIN FYFE Applied Computational Intelligence Research Unit, The University of Paisley, Paisley, PA 2BE SCOTLAND. Abstract Topographic maps are

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Foundations of Computer Vision

Foundations of Computer Vision Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014 Principal Component Analysis and Singular Value Decomposition Volker Tresp, Clemens Otte Summer 2014 1 Motivation So far we always argued for a high-dimensional feature space Still, in some cases it makes

More information

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

Pattern Classification

Pattern Classification Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 6345 Automatic Speech Recognition Semi-Parametric Classifiers 1 Semi-Parametric

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Overview Motivation

More information

Linear and Non-Linear Dimensionality Reduction

Linear and Non-Linear Dimensionality Reduction Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

A few applications of the SVD

A few applications of the SVD A few applications of the SVD Many methods require to approximate the original data (matrix) by a low rank matrix before attempting to solve the original problem Regularization methods require the solution

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

CSC411: Final Review. James Lucas & David Madras. December 3, 2018 CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen

More information

Department of Computer Science and Engineering

Department of Computer Science and Engineering Linear algebra methods for data mining with applications to materials Yousef Saad Department of Computer Science and Engineering University of Minnesota ICSC 2012, Hong Kong, Jan 4-7, 2012 HAPPY BIRTHDAY

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information