Linear and Non-Linear Dimensionality Reduction

Size: px
Start display at page:

Download "Linear and Non-Linear Dimensionality Reduction"

Transcription

1 Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa and

2 Overview Dimensionality Reduction Motivation Linear Projections Linear Mappings in Feature Space Neighbor Embedding Manifold Learning Advances in Dimensionality Reduction Speedup for Neighbor Embeddings Quality Assessment of DR Feature Relevance for DR Visualization of Classifiers Supervised Dimensionality Reduction

3 Curse of Dimensionality 1 1 [Lee and Verleysen, 27]

4 Curse of Dimensionality 1 high-dimensional spaces are almost empty 1 [Lee and Verleysen, 27]

5 Curse of Dimensionality 1 high-dimensional spaces are almost empty Hypervolume concentrates in a thin shell close to the surface 1 [Lee and Verleysen, 27]

6 Why use Dimensionality Reduction?

7 Why use Dimensionality Reduction?

8 Why use Dimensionality Reduction?

9 Principal Component Analysis (PCA) variance maximization var ( w x i ) with w = 1

10 Principal Component Analysis (PCA) variance maximization var ( w ) x i with w = 1 = 1 N i (w x i ) 2 i w x i x i w = 1 N

11 Principal Component Analysis (PCA) variance maximization var ( w ) x i with w = 1 = 1 N i (w x i ) 2 = 1 N i w x i x i w = w Cw

12 Principal Component Analysis (PCA) variance maximization var ( w ) x i with w = 1 = 1 N i (w x i ) 2 = 1 N i w x i x i w = w Cw Eigenvectors of the covariance matrix are optimal

13 PCA in Action [Gisbrecht and Hammer, 215]

14 Distance Preservation (CMDS) distances δ ij = x i x j 2, d ij = y i y j 2 objective δ ij d ij

15 Distance Preservation (CMDS) distances δ ij = x i x j 2, d ij = y i y j 2 objective δ ij d ij distances d ij and similarities s ij can be transformed into each other S = UΛU matrix of pairwise similarities

16 Distance Preservation (CMDS) distances δ ij = x i x j 2, d ij = y i y j 2 objective δ ij d ij distances d ij and similarities s ij can be transformed into each other S = UΛU matrix of pairwise similarities best low rank approximation of S in Frobenius norm is S = U ΛU with the largest eigenvalue

17 Manifold Learning learn linear manifold represent data as projections on unknown w: C = 1 2N i (x i y i w) 2

18 Manifold Learning learn linear manifold represent data as projections on unknown w: C = 1 2N i (x i y i w) 2 What are the best parameters y i and w?

19 PCA three ways to obtain PCA maximize variance of a linear projection preserve distances find a linear manifold such that errors are minimal in an L2 sense

20 Kernel PCA 3 Idea Apply a fixed nonlinear preprocessing φ(x) Perform standard PCA in feature space 3 [Schölkopf et al., 1998]

21 Kernel PCA 3 Idea Apply a fixed nonlinear preprocessing φ(x) Perform standard PCA in feature space How to apply the kernel trick here? 3 [Schölkopf et al., 1998]

22 Kernel PCA in Action [Gisbrecht and Hammer, 215]

23 Stochastic Neighbor Embedding (SNE) 5 introduce a probabilistic neighborhood in the input space p j i = exp(.5 x i x j 2 /σi 2 ) k,k i exp(.5 x i x k 2 /σi 2 ) 5 [Hinton and Roweis, 22]

24 Stochastic Neighbor Embedding (SNE) 5 introduce a probabilistic neighborhood in the input space p j i = exp(.5 x i x j 2 /σi 2 ) k,k i exp(.5 x i x k 2 /σi 2 ) and in the output space q j i = exp(.5 y i y j 2 ) k,k i exp(.5 y i y k 2 ) 5 [Hinton and Roweis, 22]

25 Stochastic Neighbor Embedding (SNE) 5 introduce a probabilistic neighborhood in the input space p j i = exp(.5 x i x j 2 /σi 2 ) k,k i exp(.5 x i x k 2 /σi 2 ) and in the output space q j i = exp(.5 y i y j 2 ) k,k i exp(.5 y i y k 2 ) optimize the sum of Kullback-Leibler divergences C = KL(P i, Q i ) = i i p j i log j i ( pj i q j i ) 5 [Hinton and Roweis, 22]

26 Neighbor Retrieval Visualizer (NeRV) 6 6 [Venna et al., 21]

27 Neighbor Retrieval Visualizer (NeRV) 6 precision(i) = N TP,i k i = 1 N FP,i k i, recall(i) = N TP,i r i = 1 N MISS,i r i 6 [Venna et al., 21]

28 Neighbor Retrieval Visualizer (NeRV) KL(P i, Q i ) generalizes recall KL(Q i, P i ) generalizes precision

29 Neighbor Retrieval Visualizer (NeRV) KL(P i, Q i ) generalizes recall KL(Q i, P i ) generalizes precision NeRV optimizes C = λ i KL(P i, Q i ) + (1 λ) i KL(Q i, P i )

30 t-distributed SNE (t-sne) 7 symmetrized probabilities p and q uses a Student-t distribution in the output space 7 [van der Maaten and Hinton, 28]

31 t-distributed SNE (t-sne) 7 symmetrized probabilities p and q uses a Student-t distribution in the output space 7 [van der Maaten and Hinton, 28]

32 Neighbor Embedding in Action [Gisbrecht and Hammer, 215]

33 Maximum Variance Unfolding (MVU) 9 goal: unfold a given manifold while keeping all the local distances and angles fixed 9 [Weinberger and Saul, 26]

34 Maximum Variance Unfolding (MVU) 9 goal: unfold a given manifold while keeping all the local distances and angles fixed maximize ij y i y j 2 s.t. i y i = y i y j 2 = x i x j 2, for all neighbors 9 [Weinberger and Saul, 26]

35 Manifold Learner in Action [Gisbrecht and Hammer, 215]

36 Overview Dimensionality Reduction Motivation Linear Projections Linear Mappings in Feature Space Neighbor Embedding Manifold Learning Advances in Dimensionality Reduction Speedup for Neighbor Embeddings Quality Assessment of DR Feature Relevance for DR Visualization of Classifiers Supervised Dimensionality Reduction

37 Barnes Hut Approach 11 Complexity of NE Neighbor Embeddings have the complexity O(N 2 ) 11 [Yang et al., 213, van der Maaten, 213]

38 Barnes Hut Approach 11 Complexity of NE Neighbor Embeddings have the complexity O(N 2 ) matrices P and Q are squared 11 [Yang et al., 213, van der Maaten, 213]

39 Barnes Hut Approach 11 Complexity of NE Neighbor Embeddings have the complexity O(N 2 ) matrices P and Q are squared squared summation for the gradient C y i = j i g ij (y i y j ) 11 [Yang et al., 213, van der Maaten, 213]

40 12 [van der Maaten, 213] 12

41 Barnes Hut Approach Barnes Hut approximate the gradient C y i = j i g ij(y i y j ) as g ij (y i y j ) Gt i g ij (y i ŷ i t) t j i

42 Barnes Hut Approach Barnes Hut approximate the gradient C y i = j i g ij(y i y j ) as g ij (y i y j ) Gt i g ij (y i ŷ i t) t j i approximate P as sparse matrix results in a O(N log N) algorithm

43 Barnes Hut SNE [van der Maaten, 213]

44 Kernel t-sne 14 O(N) algorithm: apply NE to a fixed subset, map remainder with out of sample projection 14 [Gisbrecht et al., 215]

45 Kernel t-sne 14 O(N) algorithm: apply NE to a fixed subset, map remainder with out of sample projection how to obtain an out of sample extension? 14 [Gisbrecht et al., 215]

46 Kernel t-sne 14 O(N) algorithm: apply NE to a fixed subset, map remainder with out of sample projection how to obtain an out of sample extension? use kernel mapping x y(x) = j α j k(x, x j ) l k(x, x l) = Ak 14 [Gisbrecht et al., 215]

47 Kernel t-sne 14 O(N) algorithm: apply NE to a fixed subset, map remainder with out of sample projection how to obtain an out of sample extension? use kernel mapping x y(x) = j α j k(x, x j ) l k(x, x l) = Ak minimization of y i y(x i ) 2 yields A = Y K 1 i 14 [Gisbrecht et al., 215]

48 Kernel t-sne [Gisbrecht et al., 215]

49 Quality Assessment of DR 16 most popular measure Q k (X, Y ) = i ( ) N k ( x i ) N k ( y i ) /(Nk) 16 [Lee et al., 213]

50 Quality Assessment of DR 16 most popular measure Q k (X, Y ) = i ( ) N k ( x i ) N k ( y i ) /(Nk) rescaling added recently 16 [Lee et al., 213]

51 Quality Assessment of DR 16 most popular measure Q k (X, Y ) = i ( ) N k ( x i ) N k ( y i ) /(Nk) rescaling added recently recently used to compare many DR techniques 16 [Lee et al., 213]

52 Quality Assessment of DR [Peluffo-Ordóñez et al., 214]

53 Feature Relevance for DR Which features are important for a given projection?

54 data set: ESANN participants Partic. University ESANN paper #publications... likes beer... 1 A A B C C D......

55 Visualization of ESANN participants Dim Dim 1

56 Relevance of features for the projection

57 Relevance of features for the projection

58 Visualization of ESANN participants Beer 1 Beer Beer -1 2 Dim Dim 1

59 Aim estimate the relevance of single features for non linear dimensionality reductions

60 Aim estimate the relevance of single features for non linear dimensionality reductions Idea change the influence of a single feature and observe the change in the quality

61 Evaluation of Feature Relevance for DR 19 NeRV cost function 18 Q NeRV k interpretation from an information retrieval perspective 18 [Venna et al., 21] 19 [Schulz et al., 214a]

62 Evaluation of Feature Relevance for DR 19 NeRV cost function 18 Q NeRV k interpretation from an information retrieval perspective d( x i, x j ) 2 = l (x l i x j l )2 becomes l λ2 l (x l i x j l )2 18 [Venna et al., 21] 19 [Schulz et al., 214a]

63 Evaluation of Feature Relevance for DR 19 NeRV cost function 18 Q NeRV k interpretation from an information retrieval perspective d( x i, x j ) 2 = l (x l i x j l )2 becomes l λ2 l (x l i x j l )2 λ k NeRV (l) := λ2 l where λ optimizes Qk NeRV (X λ, Y ) + δ l λ2 l 18 [Venna et al., 21] 19 [Schulz et al., 214a]

64 Toy data set 1.2 C1 C2 C3.1 Dim Dim Dim 1.2.4

65 Toy data set 1 Dim C1 C2 C3 DimII Dim Dim C 1 C 2 C DimI

66 Toy data set 1 Dim C1 C2 C3 DimII Dim Dim C 1 C 2 C DimI Dim 1 Dim 2 Dim 3 λ NeRV Neighborhood size

67 Toy data set 2 C 1 C 2 C 3.1 Dim Dim Dim 1

68 Toy data set 2 C 1 C 2 C 3.1 Dim Dim Dim C 1 C 2 C 3.2 DimII DimI

69 Toy data set 2 C 1 C 2 C 3.1 Dim Dim Dim C 1 C 2 C DimII.2 DimII C 1 C 2 C DimI DimI

70 Relevances for different projections C 1 C 2 C DimII.2 DimII C 1 C 2 C DimI DimI Dim 1 Dim 2 Dim λ NeRV Neighborhood size

71 Relevances for different projections C 1 C 2 C DimII.2 DimII C 1 C 2 C DimI DimI Dim 1 Dim 2 Dim Dim 1 Dim 2 Dim 3 λ NeRV 1.8 λ NeRV Neighborhood size Neighborhood size

72 98.7%

73 Why visualize Classifiers?

74 Why visualize Classifiers?

75 Why visualize Classifiers?

76 Where is the Difficulty? Class borders are often non linear often not given in an explicit functional form (e.g. SVM) high dimensional which makes it non feasible to sample them for a projection

77 An illustration the approach class 1 class 2 class 1 class 2.6 dim3.5.4 dim dim dim dim dim1

78 An illustration the approach class 1 class 2 class 1 class 2.6 dim3.5.4 dim dim dim dim dim1 project data to 2D

79 An illustration: dimensionality reduction class 1 class 2 SVs dimii dimi

80 An illustration: dimensionality reduction class 1 dimii dimi sample the 2D data space project the samples up

81 An illustration: dimensionality reduction 1.2 class 1 class 1 class 2 SVs 2D samples 1 dim3.8.6 dimii dimi dim1 I sample the 2D data space I project the samples up.2.4 dim

82 An illustration: dimensionality reduction 1.2 class 1 class 1 class 2 SVs 2D samples 1 dim3.8.6 dimii dimi dim1 I sample the 2D data space I project the samples up I.2 classify them.4 dim

83 An illustration: border visualization color intensity codes the certainty of the classifier

84 Classifier visualization2 dimii class 1 class 2 SVs class 1 I project data to 2D I sample the 2D data space dimii dimi 1.2 class 1 class 2 SVs 2D samples 1 project the samples up.8 dim3 I.6 dimi.4.2 I classify them.5 1 dim1 2 [Schulz et al., 214b].2.4 dim

85 Toy Data Set 1 dim3 C1 C2 dim2 dim1

86 Toy Data Set 1 dim3 C1 C2 dim2 dim1

87 Toy Data Set 2 C1 C2 dim3 dim2 dim1

88 Toy Data Set 2 C1 C2 dim3 dim2 dim1

89 Toy Data Set 2 with NE C1 C2 dim3 dim2 dim1

90 Toy Data Set 2 with NE C1 C2 dim3 dim2 dim1

91 DR: intrinsically 3D data C1 C2 C1 C2

92 DR: NE projection

93 Supervised dimensionality reduction Use the Fisher metric 21 d(x, x + dx) = x J(x)x J(x) = E p(c x) { ( x log p(c x) )( x log p(c x) ) } 21 [Peltonen et al., 24, Gisbrecht et al., 215]

94 Supervised dimensionality reduction Use the Fisher metric 22 d(x, x + dx) = x J(x)x J(x) = E p(c x) { ( x log p(c x) )( x log p(c x) ) } 22 [Peltonen et al., 24, Gisbrecht et al., 215]

95 DR: supervised NE projection

96 DR: supervised NE projection

97 prototypes in original space C1 C2

98 Summary different objectives of dimensionality reduction

99 Summary different objectives of dimensionality reduction new approach to get insight into trained classification models discriminative information can yield major imrpovements

100 Thank You For Your Attention! Alexander Schulz aschulz (at) techfak.uni-bielefeld.de

101 Literature I Gisbrecht, A. and Hammer, B. (215). Data visualization by nonlinear dimensionality reduction. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 5(2): Gisbrecht, A., Schulz, A., and Hammer, B. (215). Parametric nonlinear dimensionality reduction using kernel t-sne. Neurocomputing, 147: Hinton, G. and Roweis, S. (22). Stochastic neighbor embedding. In Advances in Neural Information Processing Systems 15, pages MIT Press. Lee, J. A., Renard, E., Bernard, G., Dupont, P., and Verleysen, M. (213). Type 1 and 2 mixtures of kullback-leibler divergences as cost functions in dimensionality reduction based on similarity preservation. Neurocomput., 112: Lee, J. A. and Verleysen, M. (27). Nonlinear dimensionality reduction. Springer. Peltonen, J., Klami, A., and Kaski, S. (24). Improved learning of riemannian metrics for exploratory analysis. Neural Networks, 17:

102 Literature II Peluffo-Ordóñez, D. H., Lee, J. A., and Verleysen, M. (214). Recent methods for dimensionality reduction: A brief comparative analysis. In 22th European Symposium on Artificial Neural Networks, ESANN 214, Bruges, Belgium, April 23-25, 214. Schölkopf, B., Smola, A., and Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput., 1(5): Schulz, A., Gisbrecht, A., and Hammer, B. (214a). Relevance learning for dimensionality reduction. In 22th European Symposium on Artificial Neural Networks, ESANN 214, Bruges, Belgium, April 23-25, 214. Schulz, A., Gisbrecht, A., and Hammer, B. (214b). Using discriminative dimensionality reduction to visualize classifiers. Neural Processing Letters, pages van der Maaten, L. (213). Barnes-hut-sne. CoRR, abs/ van der Maaten, L. and Hinton, G. (28). Visualizing high-dimensional data using t-sne. Journal of Machine Learning Research, 9: Venna, J., Peltonen, J., Nybo, K., Aidos, H., and Kaski, S. (21). Information retrieval perspective to nonlinear dimensionality reduction for data visualization. Journal of Machine Learning Research, 11:

103 Literature III Weinberger, K. Q. and Saul, L. K. (26). An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In Proceedings of the 21st National Conference on Artificial Intelligence. AAAI. Yang, Z., Peltonen, J., and Kaski, S. (213). Scalable optimization of neighbor embedding for visualization. In Dasgupta, S. and Mcallester, D., editors, Proceedings of the 3th International Conference on Machine Learning (ICML-13), volume 28, pages JMLR Workshop and Conference Proceedings.

Large-scale nonlinear dimensionality reduction for network intrusion detection

Large-scale nonlinear dimensionality reduction for network intrusion detection ESANN 207 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 26-28 April 207, i6doc.com publ., ISBN 978-287587039-. Large-scale

More information

Immediate Reward Reinforcement Learning for Projective Kernel Methods

Immediate Reward Reinforcement Learning for Projective Kernel Methods ESANN'27 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 27, d-side publi., ISBN 2-9337-7-2. Immediate Reward Reinforcement Learning for Projective Kernel Methods

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Efficient unsupervised clustering for spatial bird population analysis along the Loire river

Efficient unsupervised clustering for spatial bird population analysis along the Loire river Efficient unsupervised clustering for spatial bird population analysis along the Loire river Aurore Payen 1, Ludovic Journaux 1.2, Clément Delion 1, Lucile Sautot 1,3, Bruno Faivre 3 1- AgroSup Dijon 26

More information

Semi-Supervised Learning through Principal Directions Estimation

Semi-Supervised Learning through Principal Directions Estimation Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

More information

Perplexity-free t-sne and twice Student tt-sne

Perplexity-free t-sne and twice Student tt-sne ESA 8 proceedings, European Symposium on Artificial eural etworks, Computational Intelligence and Machine Learning. Bruges (Belgium), 5-7 April 8, i6doc.com publ., ISB 978-8758747-6. Perplexity-free t-se

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Interpreting Deep Classifiers

Interpreting Deep Classifiers Ruprecht-Karls-University Heidelberg Faculty of Mathematics and Computer Science Seminar: Explainable Machine Learning Interpreting Deep Classifiers by Visual Distillation of Dark Knowledge Author: Daniela

More information

(Non-linear) dimensionality reduction. Department of Computer Science, Czech Technical University in Prague

(Non-linear) dimensionality reduction. Department of Computer Science, Czech Technical University in Prague (Non-linear) dimensionality reduction Jiří Kléma Department of Computer Science, Czech Technical University in Prague http://cw.felk.cvut.cz/wiki/courses/a4m33sad/start poutline motivation, task definition,

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Statistical Learning. Dong Liu. Dept. EEIS, USTC Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

t-sne and its theoretical guarantee

t-sne and its theoretical guarantee t-sne and its theoretical guarantee Ziyuan Zhong Columbia University July 4, 2018 Ziyuan Zhong (Columbia University) t-sne July 4, 2018 1 / 72 Overview Timeline: PCA (Karl Pearson, 1901) Manifold Learning(Isomap

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

Linking non-binned spike train kernels to several existing spike train metrics

Linking non-binned spike train kernels to several existing spike train metrics Linking non-binned spike train kernels to several existing spike train metrics Benjamin Schrauwen Jan Van Campenhout ELIS, Ghent University, Belgium Benjamin.Schrauwen@UGent.be Abstract. This work presents

More information

Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction

Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Masashi Sugiyama sugi@cs.titech.ac.jp Department of Computer Science, Tokyo Institute of Technology, ---W8-7, O-okayama, Meguro-ku,

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Unsupervised Kernel Dimension Reduction Supplemental Material

Unsupervised Kernel Dimension Reduction Supplemental Material Unsupervised Kernel Dimension Reduction Supplemental Material Meihong Wang Dept. of Computer Science U. of Southern California Los Angeles, CA meihongw@usc.edu Fei Sha Dept. of Computer Science U. of Southern

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

EM-algorithm for Training of State-space Models with Application to Time Series Prediction EM-algorithm for Training of State-space Models with Application to Time Series Prediction Elia Liitiäinen, Nima Reyhani and Amaury Lendasse Helsinki University of Technology - Neural Networks Research

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Dimension Reduction and Low-dimensional Embedding

Dimension Reduction and Low-dimensional Embedding Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Distance Metric Learning

Distance Metric Learning Distance Metric Learning Technical University of Munich Department of Informatics Computer Vision Group November 11, 2016 M.Sc. John Chiotellis: Distance Metric Learning 1 / 36 Outline Computer Vision

More information

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger kilianw@cis.upenn.edu Fei Sha feisha@cis.upenn.edu Lawrence K. Saul lsaul@cis.upenn.edu Department of Computer and Information

More information

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13 CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Nonlinear Projection Trick in kernel methods: An alternative to the Kernel Trick

Nonlinear Projection Trick in kernel methods: An alternative to the Kernel Trick Nonlinear Projection Trick in kernel methods: An alternative to the Kernel Trick Nojun Kak, Member, IEEE, Abstract In kernel methods such as kernel PCA and support vector machines, the so called kernel

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury

Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury Metric Learning 16 th Feb 2017 Rahul Dey Anurag Chowdhury 1 Presentation based on Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data."

More information

Learning a kernel matrix for nonlinear dimensionality reduction

Learning a kernel matrix for nonlinear dimensionality reduction University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 7-4-2004 Learning a kernel matrix for nonlinear dimensionality reduction Kilian Q. Weinberger

More information

A Sparse Kernelized Matrix Learning Vector Quantization Model for Human Activity Recognition

A Sparse Kernelized Matrix Learning Vector Quantization Model for Human Activity Recognition A Sparse Kernelized Matrix Learning Vector Quantization Model for Human Activity Recognition M. Kästner 1, M. Strickert 2, and T. Villmann 1 1- University of Appl. Sciences Mittweida - Dept. of Mathematics

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Dimensionality Reduction AShortTutorial

Dimensionality Reduction AShortTutorial Dimensionality Reduction AShortTutorial Ali Ghodsi Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario, Canada, 2006 c Ali Ghodsi, 2006 Contents 1 An Introduction to

More information

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Université du Littoral- Calais July 11, 16 First..... to the

More information

Lecture 10: Dimension Reduction Techniques

Lecture 10: Dimension Reduction Techniques Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set

More information

Incorporating Invariances in Nonlinear Support Vector Machines

Incorporating Invariances in Nonlinear Support Vector Machines Incorporating Invariances in Nonlinear Support Vector Machines Olivier Chapelle olivier.chapelle@lip6.fr LIP6, Paris, France Biowulf Technologies Bernhard Scholkopf bernhard.schoelkopf@tuebingen.mpg.de

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Analysis of Multiclass Support Vector Machines

Analysis of Multiclass Support Vector Machines Analysis of Multiclass Support Vector Machines Shigeo Abe Graduate School of Science and Technology Kobe University Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract Since support vector machines for pattern

More information

The Curse of Dimensionality for Local Kernel Machines

The Curse of Dimensionality for Local Kernel Machines The Curse of Dimensionality for Local Kernel Machines Yoshua Bengio, Olivier Delalleau & Nicolas Le Roux April 7th 2005 Yoshua Bengio, Olivier Delalleau & Nicolas Le Roux Snowbird Learning Workshop Perspective

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method Antti Honkela 1, Stefan Harmeling 2, Leo Lundqvist 1, and Harri Valpola 1 1 Helsinki University of Technology,

More information

From Last Meeting. Studied Fisher Linear Discrimination. - Mathematics. - Point Cloud view. - Likelihood view. - Toy examples

From Last Meeting. Studied Fisher Linear Discrimination. - Mathematics. - Point Cloud view. - Likelihood view. - Toy examples From Last Meeting Studied Fisher Linear Discrimination - Mathematics - Point Cloud view - Likelihood view - Toy eamples - Etensions (e.g. Principal Discriminant Analysis) Polynomial Embedding Aizerman,

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Probabilistic and Statistical Learning. Alexandros Iosifidis Academy of Finland Postdoctoral Research Fellow (term )

Probabilistic and Statistical Learning. Alexandros Iosifidis Academy of Finland Postdoctoral Research Fellow (term ) Probabilistic and Statistical Learning Alexandros Iosifidis Academy of Finland Postdoctoral Research Fellow (term 2016-2019) Tampere, August 2016 Outline Problems considered Bayesian Learning Subspace

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

arxiv: v1 [cs.lg] 9 Apr 2008

arxiv: v1 [cs.lg] 9 Apr 2008 On Kernelization of Supervised Mahalanobis Distance Learners Ratthachat Chatpatanasiri, Teesid Korsrilabutr, Pasakorn Tangchanachaianan, and Boonserm Kijsirikul arxiv:0804.1441v1 [cs.lg] 9 Apr 2008 Department

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Local Learning Projections

Local Learning Projections Mingrui Wu mingrui.wu@tuebingen.mpg.de Max Planck Institute for Biological Cybernetics, Tübingen, Germany Kai Yu kyu@sv.nec-labs.com NEC Labs America, Cupertino CA, USA Shipeng Yu shipeng.yu@siemens.com

More information

Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels

Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels Daoqiang Zhang Zhi-Hua Zhou National Laboratory for Novel Software Technology Nanjing University, Nanjing 2193, China

More information

Cluster Kernels for Semi-Supervised Learning

Cluster Kernels for Semi-Supervised Learning Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de

More information

Unsupervised learning: beyond simple clustering and PCA

Unsupervised learning: beyond simple clustering and PCA Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

Self-Organization by Optimizing Free-Energy

Self-Organization by Optimizing Free-Energy Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present

More information

A Duality View of Spectral Methods for Dimensionality Reduction

A Duality View of Spectral Methods for Dimensionality Reduction A Duality View of Spectral Methods for Dimensionality Reduction Lin Xiao 1 Jun Sun 2 Stephen Boyd 3 May 3, 2006 1 Center for the Mathematics of Information, California Institute of Technology, Pasadena,

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Nonlinear Manifold Learning Summary

Nonlinear Manifold Learning Summary Nonlinear Manifold Learning 6.454 Summary Alexander Ihler ihler@mit.edu October 6, 2003 Abstract Manifold learning is the process of estimating a low-dimensional structure which underlies a collection

More information

Graph-Based Semi-Supervised Learning

Graph-Based Semi-Supervised Learning Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005 Graph-Based Semi-Supervised Learning Yoshua Bengio, Olivier

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Dimensionality reduction. PCA. Kernel PCA.

Dimensionality reduction. PCA. Kernel PCA. Dimensionality reduction. PCA. Kernel PCA. Dimensionality reduction Principal Component Analysis (PCA) Kernelizing PCA If we have time: Autoencoders COMP-652 and ECSE-608 - March 14, 2016 1 What is dimensionality

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Mathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Divergences

Mathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Divergences MACHINE LEARNING REPORTS Mathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Report 02/2010 Submitted: 01.04.2010 Published:26.04.2010 T. Villmann and S. Haase University of Applied

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Neural Computation, June 2003; 15 (6):1373-1396 Presentation for CSE291 sp07 M. Belkin 1 P. Niyogi 2 1 University of Chicago, Department

More information

A Duality View of Spectral Methods for Dimensionality Reduction

A Duality View of Spectral Methods for Dimensionality Reduction Lin Xiao lxiao@caltech.edu Center for the Mathematics of Information, California Institute of Technology, Pasadena, CA 91125, USA Jun Sun sunjun@stanford.edu Stephen Boyd boyd@stanford.edu Department of

More information

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and

More information

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM Wei Chu Chong Jin Ong chuwei@gatsby.ucl.ac.uk mpeongcj@nus.edu.sg S. Sathiya Keerthi mpessk@nus.edu.sg Control Division, Department

More information

Chemometrics: Classification of spectra

Chemometrics: Classification of spectra Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Descriptive statistics Techniques to visualize

More information

Gaussian Process Latent Random Field

Gaussian Process Latent Random Field Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Gaussian Process Latent Random Field Guoqiang Zhong, Wu-Jun Li, Dit-Yan Yeung, Xinwen Hou, Cheng-Lin Liu National Laboratory

More information

Locality Preserving Projections

Locality Preserving Projections Locality Preserving Projections Xiaofei He Department of Computer Science The University of Chicago Chicago, IL 60637 xiaofei@cs.uchicago.edu Partha Niyogi Department of Computer Science The University

More information

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen Lecture 3: Linear feature extraction Feature extraction feature extraction: (more general) transform the original to (k < d).

More information

Neural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology

Neural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology Neural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology Erkki Oja Department of Computer Science Aalto University, Finland

More information

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction. Jose A. Costa Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Overview Motivation

More information

Learning Binary Classifiers for Multi-Class Problem

Learning Binary Classifiers for Multi-Class Problem Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,

More information