Support vector machines, Kernel methods, and Applications in bioinformatics

Size: px
Start display at page:

Download "Support vector machines, Kernel methods, and Applications in bioinformatics"

Transcription

1 1 Support vector machines, Kernel methods, and Applications in bioinformatics Ecole des Mines de Paris Computational Biology group Machine Learning in Bioinformatics conference, October 17th, 2003, Brussels, Belgium

2 2 Overview 1. Support Vector Machines and kernel methods 2. Application: Protein remote homology detection 3. Application: Extracting pathway activity from gene expression data

3 3 Partie 1 Support Vector Machines (SVM) and Kernel Methods

4 The pattern recognition problem 4

5 5 The pattern recognition problem Learn from labelled examples a discrimination rule

6 6 The pattern recognition problem Learn from labelled examples a discrimination rule Use it to predict the class of new points

7 7 Pattern recognition examples Medical diagnosis (e.g., from microarrays) Drugability/activity of chemical compouds Gene function, structure, localization Protein interactions

8 8 Support Vector Machines for pattern recognition φ Object x represented by the vector Φ(x) (feature space)

9 9 Support Vector Machines for pattern recognition φ Object x represented by the vector Φ(x) (feature space) Linear separation in the feature space

10 10 Support Vector Machines for pattern recognition φ Object x represented by the vector Φ(x) (feature space) Linear separation with large margin in the feature space

11 Large margin separation 11

12 12 Large margin separation H

13 13 Large margin separation m H

14 14 Large margin separation m e1 e3 e2 e4 e5 H

15 15 Large margin separation m e1 e3 e2 e4 e5 H min H,m { 1 m 2 + C i e i }

16 16 Dual formulation The classification of a new point x is the sign of: f(x) = i α i K(x, x i ) + b, where α i solves: n max α i=1 α i 1 n 2 i,j=1 α iα j y i y j K(x i, x j ) i = 1,..., n 0 α i C n i=1 α iy i = 0 with the notation: K(x, x ) = Φ(x). Φ(x )

17 17 The kernel trick for SVM The separation can be found without knowing Φ(x). kernel matters: K(x, y) = Φ(x). Φ(y) Only the Simple kernels K(x, y) can correspond to complex Φ SVM work with any sort of data as soon as a kernel is defined

18 18 Kernel examples Linear : Polynomial : K(x, x ) = x.x K(x, x ) = (x.x + c) d Gaussian RBf : K(x, x ) = exp ( x x 2 ) 2σ 2

19 19 Kernels For any set X, a function K : X X R is a kernel iff: it is symetric : K(x, y) = K(y, x), it is positive semi-definite: a i a j K(x i, x j ) 0 i,j for all a i R and x i X

20 20 Advantages of SVM Works well on real-world applications Large dimensions, noise OK (?) Can be applied to any kind of data as soon as a kernel is available

21 21 Examples: SVM in bioinformatics Gene functional classification from microarry: Brown et al. (2000), Pavlidis et al. (2001) Tissue classification from microarray: Mukherje et al. (1999), Furey et al. (2000), Guyon et al. (2001) Protein family prediction from sequence: Jaakkoola et al. (1998) Protein secondary structure prediction: Hua et al. (2001) Protein subcellular localization prediction from sequence: Hua et al. (2001)

22 22 Kernel methods Let K(x, y) be a given kernel. Then is it possible to perform other linear algorithms implicitly in the feature space such as: Compute the distance between points Principal component analysis (PCA) Canonical correlation analysis (CCA)

23 23 Compute the distance between objects φ( g 1 ) d 0 φ( g 2 ) d(g 1, g 2 ) 2 = Φ(g 1 ) Φ(g 2 ) 2 ( = Φ(g1 ) Φ(g ) ( 2 ). Φ(g1 ) Φ(g ) 2 ) = Φ(g 1 ). Φ(g 1 ) + Φ(g 2 ). Φ(g 2 ) 2Φ(g 1 ). Φ(g 2 ) d(g 1, g 2 ) 2 = K(g 1, g 1 ) + K(g 2, g 2 ) 2K(g 1, g 2 )

24 24 Distance to the center of mass φ( g 1 ) m Center of mass: m = 1 N N i=1 Φ(g i ), hence: Φ(g 1 ) m 2 = Φ(g 1 ). Φ(g 1 ) 2 Φ(g 1 ). m + m. m = K(g 1, g 1 ) 2 N N i=1 K(g 1, g i ) + 1 N 2 N i,j=1 K(g i, g j )

25 25 Principal component analysis PC2 PC1 It is equivalent to find the eigenvectors of ( K = Φ(gi ). Φ(g ) j ) i,j=1...n = ( K(g i, g j ) ) i,j=1...n Useful to project the objects on small-dimensional spaces (feature extraction).

26 26 Canonical correlation analysis CCA2 CCA1 CCA2 CCA1 K 1 and K 2 are two kernels for the same objects. CCA can be performed by solving the following generalized eigenvalue problem: ( ) ( ) 0 K1 K 2 K 2 ξ = ρ 1 0 ξ K 2 K K2 2 Useful to find correlations between different representations of the same objects (ex: genes,...)

27 27 Part 2 Local alignment kernel for strings (with S. Hiroto, N. Ueda, T. Akutsu, preprint 2003)

28 28 Motivations Develop a kernel for strings adapted to protein / DNA sequences Several methods have been adopted in bioinformatics to measure the similarity between sequences... but are not valid kernels How to mimic them?

29 29 Related work Spectrum kernel (Leslie et al.): K(x 1... x m, y 1... y n ) = m k i=1 n k j=1 δ(x i... x i+k, y j... y j+k ).

30 29 Related work Spectrum kernel (Leslie et al.): K(x 1... x m, y 1... y n ) = m k i=1 n k j=1 δ(x i... x i+k, y j... y j+k ). Fisher kernel (Jaakkola et al.): given a statistical model ( pθ, θ Θ R d) : φ(x) = θ log p θ (x) and use the Fisher information matrix.

31 30 Local alignment For two strings x and y, a local alignment π with gaps is: ABCD EF G HI JKL MNO EEPQRGS I TUVWX The score is: s(x, y, π) = s(e, E) + s(f, F ) + s(g, G) + s(i, I) s(gaps)

32 31 Smith-Waterman (SW) score SW (x, y) = max s(x, y, π) π Π(x,y) Computed by dynamic programming Not a kernel in general

33 32 Convolution kernels (Haussler 99) Let K 1 and K 2 be two kernels for strings Their convolution is the following valid kernel: K 1 K 2 (x, y) = K 1 (x 1, y 1 )K 2 (x 2, y 2 ) x 1 x 2 =x,y 1 y 2 =y

34 33 3 basic kernels For the unaligned parts: K 0 (x, y) = 1.

35 33 3 basic kernels For the unaligned parts: K 0 (x, y) = 1. For aligned residues: K (β) a (x, y) = { 0 if x 1 or y 1, exp (βs(x, y)) otherwise

36 33 3 basic kernels For the unaligned parts: K 0 (x, y) = 1. For aligned residues: K (β) a (x, y) = { 0 if x 1 or y 1, exp (βs(x, y)) otherwise For gaps: K g (β) (x, y) = exp [β (g( x ) + g( y ))]

37 34 Combining the kernels Detecting local alignments of exactly n residues: K (β) (n) (x, y) = K 0 ( K (β) a ) (n 1) K g (β) K (β) a K 0.

38 34 Combining the kernels Detecting local alignments of exactly n residues: K (β) (n) (x, y) = K 0 ( K (β) a ) (n 1) K g (β) K (β) a K 0. Considering all possible local alignments: K (β) LA = i=0 K (β) (i).

39 35 Properties K (β) LA (x, y) = π Π(x,y) exp (βs(x, y, π)),

40 35 Properties K (β) LA (x, y) = π Π(x,y) exp (βs(x, y, π)), lim β + 1 β ln K(β) LA (x, y) = SW (x, y).

41 36 Kernel computation e X0 d X d X2 B M E d Y0 Y Y2 e

42 37 Application: remote homology detection Unrelated proteins Twilight zone Close homologs Sequence similarity Same structure/function but sequence diverged Remote homology can not be found by direct sequence similarity

43 38 SCOP database SCOP Fold Superfamily Family Remote homologs Close homologs

44 39 A benchmark experiment Can we predict the superfamily of a domain if we have not seen any member of its family before?

45 39 A benchmark experiment Can we predict the superfamily of a domain if we have not seen any member of its family before? During learning: remove a family and learn the difference between the superfamily and the rest

46 39 A benchmark experiment Can we predict the superfamily of a domain if we have not seen any member of its family before? During learning: remove a family and learn the difference between the superfamily and the rest Then, use the model to test each domain of the family removed

47 40 SCOP superfamily recognition benchmark No. of families with given performance SVM-LA SVM-pairwise SVM-Mismatch SVM-Fisher ROC50

48 41 Part 3 Detecting pathway activity from microarray data (with M. Kanehisa, ECCB 2003)

49 42 Genes encode proteins which can catalyse chemical reations Nicotinamide Mononucleotide Adenylyltransferase With Bound Nad+

50 43 Chemical reactions are often parts of pathways From

51 44 Microarray technology monitors mrna quantity (From Spellman et al., 1998)

52 45 Comparing gene expression and pathway databases VS Detect active pathways? Denoise expression data? Denoise pathway database? Find new pathways? Are there correlations?

53 46 A useful first step and g1 g8 g5 g6 g2 g7 g3 g4 g6 g1 g3 g8 g2 g7 g5 g4

54 47 Using microarray only PC1 g1 g8 g5 g6 g2 g7 g3 g4 PCA finds the directions (profiles) explaining the largest amount of variations among expression profiles.

55 48 PCA formulation Let f v (i) be the projection of the i-th profile onto v. The amount of variation captured by f v is: h 1 (v) = N i=1 f v (i) 2 PCA finds an orthonormal basis by solving successively: max v h 1 (v)

56 49 Issues with PCA PCA is useful if there is a small number of strong signal In concrete applications, we observe a noisy superposition of many events Using a prior knowledge of metabolic networks can help denoising the information detected by PCA

57 50 The metabolic gene network GAL10 Glucose HKA, HKB, GLK1 Glucose 6P PGT1 Fructose 6P PFK1,PFK2 Fructose 1,6P2 FBA1 Glucose 1P PGM1, PGM2 FBP1 HKA1 PGM1 PFK1 GAL10 HKA2 PGT1 FBP1 FBA1 GLK1 PGM2 PFK2 Link two genes when they can catalyze two successive reactions

58 51 Mapping f v to the metabolic gene network 0.8 g1 0.7g2 0.4 g3 g g8 g7+0.5 g6 g g1 g8 g5 g6 g2 g7 g3 g Does it look interesting or not?

59 52 Important hypothesis If v is related to a metabolic activity, then f v should vary smoothly on the graph Smooth Rugged

60 53 Graph Laplacian L = D A 1 2 L =

61 54 Smoothness quantification h 2 (f) = f exp( βl)f f f is large when f is smooth h(f) = 2.5 h(f) = 34.2

62 55 Motivation For a candidate profile v, h 1 (f v ) is large when v captures a lot of natural variation among profiles h 2 (f v ) is large when f v is smooth on the graph Try to maximize both terms in the same time

63 56 Problem reformulation Find a function f v and a function f 2 such that: h 1 (f v ) be large h 2 (f 2 ) be large corr(f v, f 2 ) be large by solving: max (f v,f 2 ) corr(f v, f 2 ) h 1(f v ) h 1 (f v ) + δ h 2(f 2 ) h 2 (f 2 ) + δ

64 57 Solving the problem This formultation is equivalent to a generalized form of CCA (Kernel-CCA, Bach and Jordan, 2002), which is solved by the following generalized eigenvector problem ( 0 K1 K 2 K 2 K 1 0 ) ( α β ) ( K 2 = ρ 1 + δk K2 2 + δk 2 ) ( α β ) where [K 1 ] i,j = e i e j and K 2 = exp( L). Then, f v = K 1 α and f 2 = K 2 β.

65 58 The kernel point of view... g1 g2 g3 g4 g8 g7 g5 g6 g1 g8 g5 g6 g2 g7 g3 g4 Diffusion kernel Linear kernel g7 g3 g4 g1 g2 g5 g8 g6 Kernel CCA g2 g4 g1 g3 g6 g8 g7 g5

66 59 Data Gene network: two genes are linked if the catalyze successive reactions in the KEGG database (669 yeast genes) Expression profiles: 18 time series measures for the 6,000 genes of yeast, during two cell cycles

67 60 First pattern of expression Expression Time

68 61 Related metabolic pathways 50 genes with highest s 2 s 1 belong to: Oxidative phosphorylation (10 genes) Citrate cycle (7) Purine metabolism (6) Glycerolipid metabolism (6) Sulfur metabolism (5) Selenoaminoacid metabolism (4), etc...

69 Related genes 62

70 Related genes 63

71 Related genes 64

72 65 Opposite pattern Expression Time

73 66 Related genes RNA polymerase (11 genes) Pyrimidine metabolism (10) Aminoacyl-tRNA biosynthesis (7) Urea cycle and metabolism of amino groups (3) Oxidative phosphorlation (3) ATP synthesis(3), etc...

74 Related genes 67

75 Related genes 68

76 Related genes 69

77 70 Second pattern Expression Time

78 71 Extensions Can be used to extract features from expression profiles (preprint 2002) Can be generalized to more than 2 datasets and other kernels Can be used to extract clusters of genes (e.g., operon detection, ISMB 03 with Y. Yamanishi, A. Nakaya and M. Kanehisa)

79 Conclusion 72

80 73 Conclusion Kernels offer a versatile framework to represent biological data SVM and kernel methods work well on real-life problems, in particular in high dimension and with noise Encouraging results on real-world applications Many opportunities in developping kernels for particular applications

Metabolic networks: Activity detection and Inference

Metabolic networks: Activity detection and Inference 1 Metabolic networks: Activity detection and Inference Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group Advanced microarray analysis course, Elsinore, Denmark, May 21th,

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Canonical Correlation Analysis with Kernels

Canonical Correlation Analysis with Kernels Canonical Correlation Analysis with Kernels Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Computational Diagnostics Group Seminar 2003 Mar 10 1 Overview

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

Statistical learning on graphs

Statistical learning on graphs Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,

More information

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Non-linear models Unsupervised machine learning Generic scaffolding So far Supervised

More information

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Least squares regression, SVR Fisher s discriminant, Perceptron, Logistic model,

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information

Analysis of N-terminal Acetylation data with Kernel-Based Clustering

Analysis of N-terminal Acetylation data with Kernel-Based Clustering Analysis of N-terminal Acetylation data with Kernel-Based Clustering Ying Liu Department of Computational Biology, School of Medicine University of Pittsburgh yil43@pitt.edu 1 Introduction N-terminal acetylation

More information

Support Vector Machine & Its Applications

Support Vector Machine & Its Applications Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 29 April, SoSe 2015 Support Vector Machines (SVMs) 1. One of

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

Machine learning methods to infer drug-target interaction network

Machine learning methods to infer drug-target interaction network Machine learning methods to infer drug-target interaction network Yoshihiro Yamanishi Medical Institute of Bioregulation Kyushu University Outline n Background Drug-target interaction network Chemical,

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Linear Classification and SVM. Dr. Xin Zhang

Linear Classification and SVM. Dr. Xin Zhang Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

c 4, < y 2, 1 0, otherwise,

c 4, < y 2, 1 0, otherwise, Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 10: Support Vector Machine and Large Margin Classifier Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu

More information

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444 Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

More information

SVMs: nonlinearity through kernels

SVMs: nonlinearity through kernels Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Modifying A Linear Support Vector Machine for Microarray Data Classification

Modifying A Linear Support Vector Machine for Microarray Data Classification Modifying A Linear Support Vector Machine for Microarray Data Classification Prof. Rosemary A. Renaut Dr. Hongbin Guo & Wang Juh Chen Department of Mathematics and Statistics, Arizona State University

More information

Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition

Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition Mar Girolami 1 Department of Computing Science University of Glasgow girolami@dcs.gla.ac.u 1 Introduction

More information

CS798: Selected topics in Machine Learning

CS798: Selected topics in Machine Learning CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Cluster Kernels for Semi-Supervised Learning

Cluster Kernels for Semi-Supervised Learning Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de

More information

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems

More information

K-means-based Feature Learning for Protein Sequence Classification

K-means-based Feature Learning for Protein Sequence Classification K-means-based Feature Learning for Protein Sequence Classification Paul Melman and Usman W. Roshan Department of Computer Science, NJIT Newark, NJ, 07102, USA pm462@njit.edu, usman.w.roshan@njit.edu Abstract

More information

Kernel PCA, clustering and canonical correlation analysis

Kernel PCA, clustering and canonical correlation analysis ernel PCA, clustering and canonical correlation analsis Le Song Machine Learning II: Advanced opics CSE 8803ML, Spring 2012 Support Vector Machines (SVM) 1 min w 2 w w + C j ξ j s. t. w j + b j 1 ξ j,

More information

December 2, :4 WSPC/INSTRUCTION FILE jbcb-profile-kernel. Profile-based string kernels for remote homology detection and motif extraction

December 2, :4 WSPC/INSTRUCTION FILE jbcb-profile-kernel. Profile-based string kernels for remote homology detection and motif extraction Journal of Bioinformatics and Computational Biology c Imperial College Press Profile-based string kernels for remote homology detection and motif extraction Rui Kuang 1, Eugene Ie 1,3, Ke Wang 1, Kai Wang

More information

Review: Support vector machines. Machine learning techniques and image analysis

Review: Support vector machines. Machine learning techniques and image analysis Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Universal Learning Technology: Support Vector Machines

Universal Learning Technology: Support Vector Machines Special Issue on Information Utilizing Technologies for Value Creation Universal Learning Technology: Support Vector Machines By Vladimir VAPNIK* This paper describes the Support Vector Machine (SVM) technology,

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior

More information

Multiple Kernel Learning

Multiple Kernel Learning CS 678A Course Project Vivek Gupta, 1 Anurendra Kumar 2 Sup: Prof. Harish Karnick 1 1 Department of Computer Science and Engineering 2 Department of Electrical Engineering Indian Institute of Technology,

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Support Vector Machines

Support Vector Machines Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification

More information

Combining pairwise sequence similarity and support vector machines for remote protein homology detection

Combining pairwise sequence similarity and support vector machines for remote protein homology detection Combining pairwise sequence similarity and support vector machines for remote protein homology detection Li Liao Central Research & Development E. I. du Pont de Nemours Company li.liao@usa.dupont.com William

More information

Deviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation

Deviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation Deviations from linear separability Kernel methods CSE 250B Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Systematic deviation

More information

1 Diffusion Kernels 2003/08/19 22:08

1 Diffusion Kernels 2003/08/19 22:08 1 Diffusion Kernels Risi Kondor Columbia University, Department of Computer Science Mail Code 0401, 1214 Amsterdam Avenue, New York, NY 10027, U.S.A. risi@cs.columbia.edu Jean-Philippe Vert Centre de Géostatistique,

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Learning with kernels and SVM

Learning with kernels and SVM Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

Kernel methods CSE 250B

Kernel methods CSE 250B Kernel methods CSE 250B Deviations from linear separability Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Deviations from

More information

Microarray Data Analysis: Discovery

Microarray Data Analysis: Discovery Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Zhenqiu Liu, Dechang Chen 2 Department of Computer Science Wayne State University, Market Street, Frederick, MD 273,

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Chemometrics: Classification of spectra

Chemometrics: Classification of spectra Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture

More information

Training algorithms for fuzzy support vector machines with nois

Training algorithms for fuzzy support vector machines with nois Training algorithms for fuzzy support vector machines with noisy data Presented by Josh Hoak Chun-fu Lin 1 Sheng-de Wang 1 1 National Taiwan University 13 April 2010 Prelude Problem: SVMs are particularly

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Infinite Ensemble Learning with Support Vector Machinery

Infinite Ensemble Learning with Support Vector Machinery Infinite Ensemble Learning with Support Vector Machinery Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.-T. Lin and L. Li (Learning Systems

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

Combining pairwise sequence similarity and support vector machines for remote protein homology detection

Combining pairwise sequence similarity and support vector machines for remote protein homology detection Combining pairwise sequence similarity and support vector machines for remote protein homology detection Li Liao Central Research & Development E. I. du Pont de Nemours Company li.liao@usa.dupont.com William

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information