Machine learning for pervasive systems Classification in high-dimensional spaces
|
|
- Garry Hicks
- 6 years ago
- Views:
Transcription
1 Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering Version 1.1,
2 Lecture overview Introduction Decision Trees Regression Training Principles, Performance Evaluation PCA, LSI and SVM Instance-based learning Artificial Neural Networks No lecture (ELEC christmas party) Probabilistic Graphical Models Unsupervised learning Anomaly detection, online-learning, recommender systems Trend mining and Topic Models Feature selection Presentations 2 / 32
3 Outline Principle Component Analysis Latent Semantic Indexing Support Vector Machines Cost function Hypothesis Kernels 3 / 32
4 Principle Component Analysis Principal Component Analysis Find lower dimensional surface onto which to project the data 4 / 32
5 Principle Component Analysis Principal Component Analysis Find lower dimensional surface onto which to project the data 4 / 32
6 Principle Component Analysis Principal Component Analysis Find lower dimensional surface onto which to project the data 4 / 32
7 Principle Component Analysis Principal Component Analysis Find lower dimensional surface onto which to project the data 4 / 32
8 Principle Component Analysis PCA finds k vectors v (1),..., v (k) onto which to project the data such that the projection error is minimized. 5 / 32
9 Principle Component Analysis PCA finds k vectors v (1),..., v (k) onto which to project the data such that the projection error is minimized. In particular, we find values z (i) to represent the x (i) in this k-dimensional vector space spanned by the v (i) 5 / 32
10 Principle Component Analysis 1 Compute the covariance matrix from the x (i) : C = 1 m n (x (i)) }{{} (x (i)) T }{{} i=1 1 m-dim. m 1-dim. } {{ } m m-dim. (We assume that all features are mean-normalized and scaled into [0, 1]) 6 / 32
11 Principle Component Analysis 1 Compute the covariance matrix from the x (i) : C = 1 m n (x (i)) }{{} (x (i)) T }{{} i=1 1 m-dim. m 1-dim. } {{ } m m-dim. (We assume that all features are mean-normalized and scaled into [0, 1]) Covariance A measure of spread of a set of points around their center of mass 6 / 32
12 Principle Component Analysis 1 Compute the covariance matrix from the x (i) : C = 1 m n (x (i)) }{{} (x (i)) T }{{} i=1 1 m-dim. m 1-dim. } {{ } m m-dim. (We assume that all features are mean-normalized and scaled into [0, 1]) 2 The pricipal components are found by computing the eigenvectors and eigenvalues of C (solving (C λi m)u = 0) 6 / 32
13 Principle Component Analysis 1 Compute the covariance matrix from the x (i) : When a matrix C is C = 1 multiplied n (x (i)) with a (x (i)) vector T u, this usually results in a new vector Cu m of different direction than u. i=1 }{{}}{{} 1 m-dim. m 1-dim. }{{} m m-dim. (We assume that all features are mean-normalized and scaled into [0, 1]) 2 The pricipal components are found by computing the eigenvectors and eigenvalues of C (solving (C λi m)u = 0) 6 / 32
14 Principle Component Analysis 1 Compute the covariance matrix from the x (i) : When a matrix C is C = 1 multiplied n (x (i)) with a (x (i)) vector T u, this usually results in a new vector Cu m of different direction than u. i=1 }{{}}{{} There are few vectors u, however, which have the same 1 m-dim. m 1-dim. direction (Cu = λu). }{{} m m-dim. These are the eigenvectors of C and λ are the eigenvalues (We assume that all features are mean-normalized and scaled into [0, 1]) 2 The pricipal components are found by computing the eigenvectors and eigenvalues of C (solving (C λi m)u = 0) 6 / 32
15 Principle Component Analysis 1 Compute the covariance matrix from the x (i) : C = 1 m n (x (i)) }{{} (x (i)) T }{{} i=1 1 m-dim. m 1-dim. } {{ } m m-dim. (We assume that all features are mean-normalized and scaled into [0, 1]) 2 The pricipal components are found by computing the eigenvectors and eigenvalues of C (solving (C λi m)u = 0) Eigenvectors and Eigenvalues The (orthogonal) eigenvectors are sorted by their eigenvalues with respect to the direction of greatest variance in the data. 6 / 32
16 Principle Component Analysis 1 Compute the covariance matrix from the x (i) : C = 1 m n (x (i)) }{{} (x (i)) T }{{} i=1 1 m-dim. m 1-dim. } {{ } m m-dim. (We assume that all features are mean-normalized and scaled into [0, 1]) 2 The pricipal components are found by computing the eigenvectors and eigenvalues of C (solving (C λi m)u = 0) 3 Choose the k eigenvectors with largest eigenvalues to represent the projection space U 6 / 32
17 Principle Component Analysis 1 Compute the covariance matrix from the x (i) : C = 1 n (x (i)) (x (i)) T m i=1 }{{}}{{} 1 m-dim. m 1-dim. }{{} m m-dim. (We assume that all features are mean-normalized and scaled into [0, 1]) 2 The pricipal components are found by computing the eigenvectors and eigenvalues of C (solving (C λi m)u = 0) 3 Choose the k eigenvectors with largest eigenvalues to represent the projection space U 4 These k eigenvectors in U are used to transform the inputs x i to z i : z (i) = U T x (i) 6 / 32
18 Principle Component Analysis How to choose the number k of dimensions? We can calculate Average squared projection error Total variation in the data m i=1 x (i) x (i) m i=1 x (i) 2 1 m approx 2 as the accuracy of the projection using k principle components as a function of the eigenvalues k i=1 λi m j=1 λj = d 7 / 32
19 Principle Component Analysis How to choose the number k of dimensions? We can calculate Average squared projection error Total variation in the data m i=1 x (i) x (i) m i=1 x (i) 2 1 m approx 2 as the accuracy of the projection using k principle components as a function of the eigenvalues k i=1 λi m = d j=1 λj We say that 100 (1 d)% of variance is retained. (Typically, d [0.01, 0.05] ) 7 / 32
20 Outline Principle Component Analysis Latent Semantic Indexing Support Vector Machines Cost function Hypothesis Kernels 8 / 32
21 Latent Semantic Indexing Motivation In information retrieval, a common task is to obtain from many documents that subset which best matches a query 9 / 32
22 Latent Semantic Indexing Motivation In information retrieval, a common task is to obtain from many documents that subset which best matches a query Typical feature representations of documents are then term-document matrices: 9 / 32
23 Latent Semantic Indexing Motivation In information retrieval, a common task is to obtain from many documents that subset which best matches a query Typical feature representations of documents are then term-document matrices: 9 / 32
24 Latent Semantic Indexing Motivation In information retrieval, a common task is to obtain from many documents that subset which best matches a query Typical feature representations of documents are then term-document matrices: These matrices are typically huge but sparse. 9 / 32
25 Latent Semantic Indexing Motivation In information retrieval, a common task is to obtain from many documents that subset which best matches a query Typical feature representations of documents are then term-document matrices: These matrices are typically huge but sparse. How to identify those feature dimensions (or combinations thereof) which are most meaningful? 9 / 32
26 Latent Semantic Indexing Singular Value Decomposition Any m n matrix C can be represented as a singular value decomposition in the form C = UΣV T where U m m matrix; columns are the orthog. eigenvectors of CC T V n n matrix; columns are the orthog. eigenvectors of C T C Σ Diagonal Matrix with Σ ii = λ i ; Σ ij = 0, i j 10 / 32
27 Latent Semantic Indexing Singular Value Decomposition Any m n matrix C can be represented as a singular value decomposition in the form C = UΣV T where U m m matrix; columns are the orthog. eigenvectors of CC T V n n matrix; columns are the orthog. eigenvectors of C T C Σ Diagonal Matrix with Σ ii = λ i ; Σ ij = 0, i j 10 / 32
28 Latent Semantic Indexing Singular Value Decomposition Any m n matrix C can be represented as a singular value decomposition in the form C = UΣV T where U m m matrix; columns are the orthog. eigenvectors of CC T V n n matrix; columns are the orthog. eigenvectors of C T C Σ Diagonal Matrix with Σ ii = λ i ; Σ ij = 0, i j First k eigenvectors map document vectors to lower dimensional representation It can be shown that this mapping resuls in the k-dim. space with smallest distance to the original space 10 / 32
29 Latent Semantic Indexing Example U: Σ: V T : 11 / 32
30 Latent Semantic Indexing Example U: Σ: V T : 11 / 32
31 Latent Semantic Indexing Example Σ: C 2 : Cosine-similarity 12 / 32
32 13 / 32
33 Outline Principle Component Analysis Latent Semantic Indexing Support Vector Machines Cost function Hypothesis Kernels 14 / 32
34 For our previous classifier, we have designed an objective function of sufficient dimension 15 / 32
35 For our previous classifier, we have designed an objective function of sufficient dimension Alternative to designing complex non-linear functions: Change dimension of input space so that linear separator is again possible 15 / 32
36 16 / 32
37 SVM pre-processes data to represent patterns in a high dimension Dimension often much higher than original feature space Then, insert hyperplane in order to separate the data 17 / 32
38 The goal for support vector machines is to find a separating hyperplane with the largest margin to the outer points in all sets If no such hyperplane exists, map all points into a higher dimensional space until such a plane exists 18 / 32
39 Simple application to several classes by iterative approach: One-versus-all belongs to class 1 or not? belongs to class 2 or not?... Search for optimum mapping between input space and feature space complicated (no optimum approach known) 19 / 32
40 Outline Principle Component Analysis Latent Semantic Indexing Support Vector Machines Cost function Hypothesis Kernels 20 / 32
41 Cost function Contribution of a single sample to the overall cost: 21 / 32
42 Cost function Contribution of a single sample to the overall cost: Logistic regression 1 y log 1 + e W T x ( (1 y) log e W T x ) 21 / 32
43 Cost function Contribution of a single sample to the overall cost: Logistic regression SVM 1 y log 1 + e W T x ( (1 y) log e W T x y cost y=1 (W T x) + (1 y) cost y=0 (W T x) ) 21 / 32
44 Cost function min W min W. 1 [ m ( 1 m i=1 y i log C m i=1 ) 1 + (1 y 1+e W T x i ) i Logistic regression ( log ( 1 ))] 1 + λ n 1+e W T x i 2m j=1 w j 2 SVM [ yi cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) ] + 1 n 2 j=1 w j 2 1 C here plays a similar role as 1 λ 22 / 32
45 Outline Principle Component Analysis Latent Semantic Indexing Support Vector Machines Cost function Hypothesis Kernels 23 / 32
46 SVM hypothesis min W C m i=1 [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) n j=1 w 2 j 24 / 32
47 SVM hypothesis min W C m i=1 [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) n j=1 w 2 j 24 / 32
48 SVM hypothesis W T x { 0 < 0 sufficient min W C m i=1 [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) n j=1 w 2 j 24 / 32
49 SVM hypothesis W T x { 1 1 confidence min W C m i=1 [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) n j=1 w 2 j 24 / 32
50 SVM hypothesis W T x { 1 1 confidence Outliers: Elastic decision boundary min W C m i=1 [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) n j=1 w 2 j 24 / 32
51 SVM hypothesis W T x { 1 1 confidence Outliers: Elastic decision boundary min W C m i=1 [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) n j=1 w 2 j 24 / 32
52 SVM hypothesis W T x { 1 1 confidence Outliers: Elastic decision boundary large C stricter boundary at the cost of smaller margin min W C m i=1 [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) n j=1 w 2 j 24 / 32
53 SVM hypothesis W T x { 1 1 confidence Outliers: Elastic decision boundary small C tolerates outliers min W C m i=1 [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) n j=1 w 2 j 24 / 32
54 Large margin classifier min C m [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) + 1 W 2 i=1 n j=1 w 2 j 25 / 32
55 Large margin classifier min C m [ ] y i cost y=1 (W T x i ) + (1 y i )cost y=0 (W T x i ) + 1 W 2 i=1 n j=1 w 2 j Rewrite the SVM optimisation problem as min W 1 n 2 j=1 w j 2 s.t. W T x i 1 if y i = 1 W T x i 1 if y i = 0 25 / 32
56 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 26 / 32
57 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 26 / 32
58 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 26 / 32
59 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 W T x = w 1 x 1 + w 2 x 2 26 / 32
60 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 W T x = w 1 x 1 + w 2 x 2 = W p 26 / 32
61 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 W T x = w 1 x 1 + w 2 x 2 = W p 26 / 32
62 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 Which decision boundaray is found? 26 / 32
63 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 Which decision boundaray is found? 26 / 32
64 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 Which decision boundaray is found? h(x) = w 1 x 1 + w 2 x 2 W orthogonal to all x with h(x) = 0 26 / 32
65 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 Which decision boundaray is found? h(x) = w 1 x 1 + w 2 x 2 W orthogonal to all x with h(x) = 0 26 / 32
66 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 Which decision boundaray is found? h(x) = w 1 x 1 + w 2 x 2 W orthogonal to all x with h(x) = 0 min 1 2 W 2 and W p i 1 necessitate larger p i 26 / 32
67 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 Which decision boundaray is found? h(x) = w 1 x 1 + w 2 x 2 W orthogonal to all x with h(x) = 0 min 1 2 W 2 and W p i 1 necessitate larger p i 26 / 32
68 Large margin classifier min W 1 ( ) 2 n 2 j=1 w j 2 = 1 2 w w n 2 = 1 W 2 2 s.t. W T x i 1 if y i = 1 W p i 1 W T x i 1 if y i = 0 W p i 1 Which decision boundaray is found? h(x) = w 1 x 1 + w 2 x 2 W orthogonal to all x with h(x) = 0 min 1 2 W 2 and W p i 1 necessitate larger p i 26 / 32
69 Outline Principle Component Analysis Latent Semantic Indexing Support Vector Machines Cost function Hypothesis Kernels 27 / 32
70 Kernels Non linear decision boundary 28 / 32
71 Kernels Non linear decision boundary Hypothesis = 1 if w 0 +w 1 x 1 +w 2 x 2 +w 3 x 1 x 2 +w 4 x / 32
72 Kernels Non linear decision boundary Hypothesis = 1 if w 0 +w 1 x 1 +w 2 x 2 +w 3 x 1 x 2 +w 4 x w 0 + w 1 k 1 + w 2 k 2 + w 3 k / 32
73 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Kernel Define kernel via landmarks 28 / 32
74 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Kernel Define kernel via landmarks 28 / 32
75 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Gaussian: k(x, l i ) = e x l i 2 2σ 2 28 / 32
76 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Gaussian: k(x, l i ) = e x l i 2 2σ 2 x l i k(x, l i ) 1 (towards 0 else) 28 / 32
77 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Gaussian: k(x, l i ) = e x l i 2 2σ 2 x l i k(x, l i ) 1 (towards 0 else) Example: w 0 = 0.5, w 1 = 1, w 2 = 1, w 3 = 0 h(x) = w 0 + w 1 k(x, l 1 ) + w 2 k(x, l 2 ) + w 3 k(x, l 3 ) 28 / 32
78 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Gaussian: k(x, l i ) = e x l i 2 2σ 2 x l i k(x, l i ) 1 (towards 0 else) Example: w 0 = 0.5, w 1 = 1, w 2 = 1, w 3 = 0 h(x) = w 0 + w 1 k(x, l 1 ) + w 2 k(x, l 2 ) + w 3 k(x, l 3 ) σ = 1 σ = 0.5 σ = 2 28 / 32
79 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Gaussian: k(x, l i ) = e x l i 2 2σ 2 x l i k(x, l i ) 1 (towards 0 else) Example: w 0 = 0.5, w 1 = 1, w 2 = 1, w 3 = 0 h(x) = w 0 + w 1 k(x, l 1 ) + w 2 k(x, l 2 ) + w 3 k(x, l 3 ) σ = 1 σ = 0.5 σ = 2 28 / 32
80 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Gaussian: k(x, l i ) = e x l i 2 2σ 2 x l i k(x, l i ) 1 (towards 0 else) Example: w 0 = 0.5, w 1 = 1, w 2 = 1, w 3 = 0 h(x) = w 0 + w 1 k(x, l 1 ) + w 2 k(x, l 2 ) + w 3 k(x, l 3 ) σ = 1 σ = 0.5 σ = 2 28 / 32
81 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Gaussian: k(x, l i ) = e x l i 2 2σ 2 x l i k(x, l i ) 1 (towards 0 else) σ controls the width of the Gaussian Example: w 0 = 0.5, w 1 = 1, w 2 = 1, w 3 = 0 h(x) = w 0 + w 1 k(x, l 1 ) + w 2 k(x, l 2 ) + w 3 k(x, l 3 ) σ = 1 σ = 0.5 σ = 2 28 / 32
82 Kernels Non linear decision boundary w 0 + w 1 k 1 + w 2 k 2 + w 3 k Gaussian: k(x, l i ) = e x l i 2 2σ 2 x l i k(x, l i ) 1 (towards 0 else) σ controls the width of the Gaussian Example: w 0 = 0.5, w 1 = 1, w 2 = 1, w 3 = 0 h(x) = w 0 + w 1 k(x, l 1 ) + w 2 k(x, l 2 ) + w 3 k(x, l 3 ) σ = 1 σ = 0.5 σ = 2 28 / 32
83 Kernels placement of landmarks Possible choice of initial landmarks: All training-set samples Training of w i f i = k(x i, l 1 ). k(x i, l m ) min C m y i cost yi =1(W T f i ) + (1 y i ) cost yi =0(W T f i ) + 1 W 2 i=1 m j=1 w 2 j 29 / 32
84 Outline Principle Component Analysis Latent Semantic Indexing Support Vector Machines Cost function Hypothesis Kernels 30 / 32
85 Questions? Le Nguyen 31 / 32
86 Literature C.M. Bishop: Pattern recognition and machine learning, Springer, R.O. Duda, P.E. Hart, D.G. Stork: Pattern Classification, Wiley, / 32
DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationDimension Reduction (PCA, ICA, CCA, FLD,
Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationDimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota
Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Université du Littoral- Calais July 11, 16 First..... to the
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationMLCC 2015 Dimensionality Reduction and PCA
MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationPrincipal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationCS47300: Web Information Search and Management
CS47300: Web Information Search and Management Prof. Chris Clifton 6 September 2017 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group 1 Vector Space Model Disadvantages:
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationCOMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare
COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of
More informationNotes on Latent Semantic Analysis
Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationc 4, < y 2, 1 0, otherwise,
Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,
More informationPrincipal Component Analysis
B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationCourse in Data Science
Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an
More informationModeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview
4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System processes System Overview Previous Systems:
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationSystem 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:
System 2 : Modelling & Recognising Modelling and Recognising Classes of Classes of Shapes Shape : PDM & PCA All the same shape? System 1 (last lecture) : limited to rigidly structured shapes System 2 :
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationSTATISTICAL LEARNING SYSTEMS
STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationIntelligent Data Analysis Lecture Notes on Document Mining
Intelligent Data Analysis Lecture Notes on Document Mining Peter Tiňo Representing Textual Documents as Vectors Our next topic will take us to seemingly very different data spaces - those of textual documents.
More informationDimensionality Reduction
Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationLecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University
Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationLEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach
LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits
More informationECE-271B. Nuno Vasconcelos ECE Department, UCSD
ECE-271B Statistical ti ti Learning II Nuno Vasconcelos ECE Department, UCSD The course the course is a graduate level course in statistical learning in SLI we covered the foundations of Bayesian or generative
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationMatching the dimensionality of maps with that of the data
Matching the dimensionality of maps with that of the data COLIN FYFE Applied Computational Intelligence Research Unit, The University of Paisley, Paisley, PA 2BE SCOTLAND. Abstract Topographic maps are
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationECE 661: Homework 10 Fall 2014
ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationFoundations of Computer Vision
Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationPrincipal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014
Principal Component Analysis and Singular Value Decomposition Volker Tresp, Clemens Otte Summer 2014 1 Motivation So far we always argued for a high-dimensional feature space Still, in some cases it makes
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationCS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)
CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions
More informationPattern Classification
Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 6345 Automatic Speech Recognition Semi-Parametric Classifiers 1 Semi-Parametric
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More informationSupport Vector Machines
Support Vector Machines Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Overview Motivation
More informationLinear and Non-Linear Dimensionality Reduction
Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,
Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationData Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis
Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationComputation. For QDA we need to calculate: Lets first consider the case that
Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the
More informationCSC 411 Lecture 12: Principal Component Analysis
CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised
More informationA few applications of the SVD
A few applications of the SVD Many methods require to approximate the original data (matrix) by a low rank matrix before attempting to solve the original problem Regularization methods require the solution
More informationFace Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition
ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationCSC411: Final Review. James Lucas & David Madras. December 3, 2018
CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be
More informationUnsupervised Learning: Dimensionality Reduction
Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationDepartment of Computer Science and Engineering
Linear algebra methods for data mining with applications to materials Yousef Saad Department of Computer Science and Engineering University of Minnesota ICSC 2012, Hong Kong, Jan 4-7, 2012 HAPPY BIRTHDAY
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More information