Support vector machines, Kernel methods, and Applications in bioinformatics
|
|
- John McKinney
- 5 years ago
- Views:
Transcription
1 1 Support vector machines, Kernel methods, and Applications in bioinformatics Ecole des Mines de Paris Computational Biology group Machine Learning in Bioinformatics conference, October 17th, 2003, Brussels, Belgium
2 2 Overview 1. Support Vector Machines and kernel methods 2. Application: Protein remote homology detection 3. Application: Extracting pathway activity from gene expression data
3 3 Partie 1 Support Vector Machines (SVM) and Kernel Methods
4 The pattern recognition problem 4
5 5 The pattern recognition problem Learn from labelled examples a discrimination rule
6 6 The pattern recognition problem Learn from labelled examples a discrimination rule Use it to predict the class of new points
7 7 Pattern recognition examples Medical diagnosis (e.g., from microarrays) Drugability/activity of chemical compouds Gene function, structure, localization Protein interactions
8 8 Support Vector Machines for pattern recognition φ Object x represented by the vector Φ(x) (feature space)
9 9 Support Vector Machines for pattern recognition φ Object x represented by the vector Φ(x) (feature space) Linear separation in the feature space
10 10 Support Vector Machines for pattern recognition φ Object x represented by the vector Φ(x) (feature space) Linear separation with large margin in the feature space
11 Large margin separation 11
12 12 Large margin separation H
13 13 Large margin separation m H
14 14 Large margin separation m e1 e3 e2 e4 e5 H
15 15 Large margin separation m e1 e3 e2 e4 e5 H min H,m { 1 m 2 + C i e i }
16 16 Dual formulation The classification of a new point x is the sign of: f(x) = i α i K(x, x i ) + b, where α i solves: n max α i=1 α i 1 n 2 i,j=1 α iα j y i y j K(x i, x j ) i = 1,..., n 0 α i C n i=1 α iy i = 0 with the notation: K(x, x ) = Φ(x). Φ(x )
17 17 The kernel trick for SVM The separation can be found without knowing Φ(x). kernel matters: K(x, y) = Φ(x). Φ(y) Only the Simple kernels K(x, y) can correspond to complex Φ SVM work with any sort of data as soon as a kernel is defined
18 18 Kernel examples Linear : Polynomial : K(x, x ) = x.x K(x, x ) = (x.x + c) d Gaussian RBf : K(x, x ) = exp ( x x 2 ) 2σ 2
19 19 Kernels For any set X, a function K : X X R is a kernel iff: it is symetric : K(x, y) = K(y, x), it is positive semi-definite: a i a j K(x i, x j ) 0 i,j for all a i R and x i X
20 20 Advantages of SVM Works well on real-world applications Large dimensions, noise OK (?) Can be applied to any kind of data as soon as a kernel is available
21 21 Examples: SVM in bioinformatics Gene functional classification from microarry: Brown et al. (2000), Pavlidis et al. (2001) Tissue classification from microarray: Mukherje et al. (1999), Furey et al. (2000), Guyon et al. (2001) Protein family prediction from sequence: Jaakkoola et al. (1998) Protein secondary structure prediction: Hua et al. (2001) Protein subcellular localization prediction from sequence: Hua et al. (2001)
22 22 Kernel methods Let K(x, y) be a given kernel. Then is it possible to perform other linear algorithms implicitly in the feature space such as: Compute the distance between points Principal component analysis (PCA) Canonical correlation analysis (CCA)
23 23 Compute the distance between objects φ( g 1 ) d 0 φ( g 2 ) d(g 1, g 2 ) 2 = Φ(g 1 ) Φ(g 2 ) 2 ( = Φ(g1 ) Φ(g ) ( 2 ). Φ(g1 ) Φ(g ) 2 ) = Φ(g 1 ). Φ(g 1 ) + Φ(g 2 ). Φ(g 2 ) 2Φ(g 1 ). Φ(g 2 ) d(g 1, g 2 ) 2 = K(g 1, g 1 ) + K(g 2, g 2 ) 2K(g 1, g 2 )
24 24 Distance to the center of mass φ( g 1 ) m Center of mass: m = 1 N N i=1 Φ(g i ), hence: Φ(g 1 ) m 2 = Φ(g 1 ). Φ(g 1 ) 2 Φ(g 1 ). m + m. m = K(g 1, g 1 ) 2 N N i=1 K(g 1, g i ) + 1 N 2 N i,j=1 K(g i, g j )
25 25 Principal component analysis PC2 PC1 It is equivalent to find the eigenvectors of ( K = Φ(gi ). Φ(g ) j ) i,j=1...n = ( K(g i, g j ) ) i,j=1...n Useful to project the objects on small-dimensional spaces (feature extraction).
26 26 Canonical correlation analysis CCA2 CCA1 CCA2 CCA1 K 1 and K 2 are two kernels for the same objects. CCA can be performed by solving the following generalized eigenvalue problem: ( ) ( ) 0 K1 K 2 K 2 ξ = ρ 1 0 ξ K 2 K K2 2 Useful to find correlations between different representations of the same objects (ex: genes,...)
27 27 Part 2 Local alignment kernel for strings (with S. Hiroto, N. Ueda, T. Akutsu, preprint 2003)
28 28 Motivations Develop a kernel for strings adapted to protein / DNA sequences Several methods have been adopted in bioinformatics to measure the similarity between sequences... but are not valid kernels How to mimic them?
29 29 Related work Spectrum kernel (Leslie et al.): K(x 1... x m, y 1... y n ) = m k i=1 n k j=1 δ(x i... x i+k, y j... y j+k ).
30 29 Related work Spectrum kernel (Leslie et al.): K(x 1... x m, y 1... y n ) = m k i=1 n k j=1 δ(x i... x i+k, y j... y j+k ). Fisher kernel (Jaakkola et al.): given a statistical model ( pθ, θ Θ R d) : φ(x) = θ log p θ (x) and use the Fisher information matrix.
31 30 Local alignment For two strings x and y, a local alignment π with gaps is: ABCD EF G HI JKL MNO EEPQRGS I TUVWX The score is: s(x, y, π) = s(e, E) + s(f, F ) + s(g, G) + s(i, I) s(gaps)
32 31 Smith-Waterman (SW) score SW (x, y) = max s(x, y, π) π Π(x,y) Computed by dynamic programming Not a kernel in general
33 32 Convolution kernels (Haussler 99) Let K 1 and K 2 be two kernels for strings Their convolution is the following valid kernel: K 1 K 2 (x, y) = K 1 (x 1, y 1 )K 2 (x 2, y 2 ) x 1 x 2 =x,y 1 y 2 =y
34 33 3 basic kernels For the unaligned parts: K 0 (x, y) = 1.
35 33 3 basic kernels For the unaligned parts: K 0 (x, y) = 1. For aligned residues: K (β) a (x, y) = { 0 if x 1 or y 1, exp (βs(x, y)) otherwise
36 33 3 basic kernels For the unaligned parts: K 0 (x, y) = 1. For aligned residues: K (β) a (x, y) = { 0 if x 1 or y 1, exp (βs(x, y)) otherwise For gaps: K g (β) (x, y) = exp [β (g( x ) + g( y ))]
37 34 Combining the kernels Detecting local alignments of exactly n residues: K (β) (n) (x, y) = K 0 ( K (β) a ) (n 1) K g (β) K (β) a K 0.
38 34 Combining the kernels Detecting local alignments of exactly n residues: K (β) (n) (x, y) = K 0 ( K (β) a ) (n 1) K g (β) K (β) a K 0. Considering all possible local alignments: K (β) LA = i=0 K (β) (i).
39 35 Properties K (β) LA (x, y) = π Π(x,y) exp (βs(x, y, π)),
40 35 Properties K (β) LA (x, y) = π Π(x,y) exp (βs(x, y, π)), lim β + 1 β ln K(β) LA (x, y) = SW (x, y).
41 36 Kernel computation e X0 d X d X2 B M E d Y0 Y Y2 e
42 37 Application: remote homology detection Unrelated proteins Twilight zone Close homologs Sequence similarity Same structure/function but sequence diverged Remote homology can not be found by direct sequence similarity
43 38 SCOP database SCOP Fold Superfamily Family Remote homologs Close homologs
44 39 A benchmark experiment Can we predict the superfamily of a domain if we have not seen any member of its family before?
45 39 A benchmark experiment Can we predict the superfamily of a domain if we have not seen any member of its family before? During learning: remove a family and learn the difference between the superfamily and the rest
46 39 A benchmark experiment Can we predict the superfamily of a domain if we have not seen any member of its family before? During learning: remove a family and learn the difference between the superfamily and the rest Then, use the model to test each domain of the family removed
47 40 SCOP superfamily recognition benchmark No. of families with given performance SVM-LA SVM-pairwise SVM-Mismatch SVM-Fisher ROC50
48 41 Part 3 Detecting pathway activity from microarray data (with M. Kanehisa, ECCB 2003)
49 42 Genes encode proteins which can catalyse chemical reations Nicotinamide Mononucleotide Adenylyltransferase With Bound Nad+
50 43 Chemical reactions are often parts of pathways From
51 44 Microarray technology monitors mrna quantity (From Spellman et al., 1998)
52 45 Comparing gene expression and pathway databases VS Detect active pathways? Denoise expression data? Denoise pathway database? Find new pathways? Are there correlations?
53 46 A useful first step and g1 g8 g5 g6 g2 g7 g3 g4 g6 g1 g3 g8 g2 g7 g5 g4
54 47 Using microarray only PC1 g1 g8 g5 g6 g2 g7 g3 g4 PCA finds the directions (profiles) explaining the largest amount of variations among expression profiles.
55 48 PCA formulation Let f v (i) be the projection of the i-th profile onto v. The amount of variation captured by f v is: h 1 (v) = N i=1 f v (i) 2 PCA finds an orthonormal basis by solving successively: max v h 1 (v)
56 49 Issues with PCA PCA is useful if there is a small number of strong signal In concrete applications, we observe a noisy superposition of many events Using a prior knowledge of metabolic networks can help denoising the information detected by PCA
57 50 The metabolic gene network GAL10 Glucose HKA, HKB, GLK1 Glucose 6P PGT1 Fructose 6P PFK1,PFK2 Fructose 1,6P2 FBA1 Glucose 1P PGM1, PGM2 FBP1 HKA1 PGM1 PFK1 GAL10 HKA2 PGT1 FBP1 FBA1 GLK1 PGM2 PFK2 Link two genes when they can catalyze two successive reactions
58 51 Mapping f v to the metabolic gene network 0.8 g1 0.7g2 0.4 g3 g g8 g7+0.5 g6 g g1 g8 g5 g6 g2 g7 g3 g Does it look interesting or not?
59 52 Important hypothesis If v is related to a metabolic activity, then f v should vary smoothly on the graph Smooth Rugged
60 53 Graph Laplacian L = D A 1 2 L =
61 54 Smoothness quantification h 2 (f) = f exp( βl)f f f is large when f is smooth h(f) = 2.5 h(f) = 34.2
62 55 Motivation For a candidate profile v, h 1 (f v ) is large when v captures a lot of natural variation among profiles h 2 (f v ) is large when f v is smooth on the graph Try to maximize both terms in the same time
63 56 Problem reformulation Find a function f v and a function f 2 such that: h 1 (f v ) be large h 2 (f 2 ) be large corr(f v, f 2 ) be large by solving: max (f v,f 2 ) corr(f v, f 2 ) h 1(f v ) h 1 (f v ) + δ h 2(f 2 ) h 2 (f 2 ) + δ
64 57 Solving the problem This formultation is equivalent to a generalized form of CCA (Kernel-CCA, Bach and Jordan, 2002), which is solved by the following generalized eigenvector problem ( 0 K1 K 2 K 2 K 1 0 ) ( α β ) ( K 2 = ρ 1 + δk K2 2 + δk 2 ) ( α β ) where [K 1 ] i,j = e i e j and K 2 = exp( L). Then, f v = K 1 α and f 2 = K 2 β.
65 58 The kernel point of view... g1 g2 g3 g4 g8 g7 g5 g6 g1 g8 g5 g6 g2 g7 g3 g4 Diffusion kernel Linear kernel g7 g3 g4 g1 g2 g5 g8 g6 Kernel CCA g2 g4 g1 g3 g6 g8 g7 g5
66 59 Data Gene network: two genes are linked if the catalyze successive reactions in the KEGG database (669 yeast genes) Expression profiles: 18 time series measures for the 6,000 genes of yeast, during two cell cycles
67 60 First pattern of expression Expression Time
68 61 Related metabolic pathways 50 genes with highest s 2 s 1 belong to: Oxidative phosphorylation (10 genes) Citrate cycle (7) Purine metabolism (6) Glycerolipid metabolism (6) Sulfur metabolism (5) Selenoaminoacid metabolism (4), etc...
69 Related genes 62
70 Related genes 63
71 Related genes 64
72 65 Opposite pattern Expression Time
73 66 Related genes RNA polymerase (11 genes) Pyrimidine metabolism (10) Aminoacyl-tRNA biosynthesis (7) Urea cycle and metabolism of amino groups (3) Oxidative phosphorlation (3) ATP synthesis(3), etc...
74 Related genes 67
75 Related genes 68
76 Related genes 69
77 70 Second pattern Expression Time
78 71 Extensions Can be used to extract features from expression profiles (preprint 2002) Can be generalized to more than 2 datasets and other kernels Can be used to extract clusters of genes (e.g., operon detection, ISMB 03 with Y. Yamanishi, A. Nakaya and M. Kanehisa)
79 Conclusion 72
80 73 Conclusion Kernels offer a versatile framework to represent biological data SVM and kernel methods work well on real-life problems, in particular in high dimension and with noise Encouraging results on real-world applications Many opportunities in developping kernels for particular applications
Metabolic networks: Activity detection and Inference
1 Metabolic networks: Activity detection and Inference Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group Advanced microarray analysis course, Elsinore, Denmark, May 21th,
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationCanonical Correlation Analysis with Kernels
Canonical Correlation Analysis with Kernels Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Computational Diagnostics Group Seminar 2003 Mar 10 1 Overview
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationStatistical learning on graphs
Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,
More informationKernel Methods. Konstantin Tretyakov MTAT Machine Learning
Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Non-linear models Unsupervised machine learning Generic scaffolding So far Supervised
More informationKernel Methods. Konstantin Tretyakov MTAT Machine Learning
Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Least squares regression, SVR Fisher s discriminant, Perceptron, Logistic model,
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationAnalysis of N-terminal Acetylation data with Kernel-Based Clustering
Analysis of N-terminal Acetylation data with Kernel-Based Clustering Ying Liu Department of Computational Biology, School of Medicine University of Pittsburgh yil43@pitt.edu 1 Introduction N-terminal acetylation
More informationSupport Vector Machine & Its Applications
Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 29 April, SoSe 2015 Support Vector Machines (SVMs) 1. One of
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationMachine learning methods to infer drug-target interaction network
Machine learning methods to infer drug-target interaction network Yoshihiro Yamanishi Medical Institute of Bioregulation Kyushu University Outline n Background Drug-target interaction network Chemical,
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationLinear Classification and SVM. Dr. Xin Zhang
Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationc 4, < y 2, 1 0, otherwise,
Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationLecture 10: Support Vector Machine and Large Margin Classifier
Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu
More informationKernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444
Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444
More informationCISC 636 Computational Biology & Bioinformatics (Fall 2016)
CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation
More informationSVMs: nonlinearity through kernels
Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationFoundation of Intelligent Systems, Part I. SVM s & Kernel Methods
Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationPredicting Protein Functions and Domain Interactions from Protein Interactions
Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput
More informationModifying A Linear Support Vector Machine for Microarray Data Classification
Modifying A Linear Support Vector Machine for Microarray Data Classification Prof. Rosemary A. Renaut Dr. Hongbin Guo & Wang Juh Chen Department of Mathematics and Statistics, Arizona State University
More informationBayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition
Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition Mar Girolami 1 Department of Computing Science University of Glasgow girolami@dcs.gla.ac.u 1 Introduction
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationLecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University
Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More informationCluster Kernels for Semi-Supervised Learning
Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationK-means-based Feature Learning for Protein Sequence Classification
K-means-based Feature Learning for Protein Sequence Classification Paul Melman and Usman W. Roshan Department of Computer Science, NJIT Newark, NJ, 07102, USA pm462@njit.edu, usman.w.roshan@njit.edu Abstract
More informationKernel PCA, clustering and canonical correlation analysis
ernel PCA, clustering and canonical correlation analsis Le Song Machine Learning II: Advanced opics CSE 8803ML, Spring 2012 Support Vector Machines (SVM) 1 min w 2 w w + C j ξ j s. t. w j + b j 1 ξ j,
More informationDecember 2, :4 WSPC/INSTRUCTION FILE jbcb-profile-kernel. Profile-based string kernels for remote homology detection and motif extraction
Journal of Bioinformatics and Computational Biology c Imperial College Press Profile-based string kernels for remote homology detection and motif extraction Rui Kuang 1, Eugene Ie 1,3, Ke Wang 1, Kai Wang
More informationReview: Support vector machines. Machine learning techniques and image analysis
Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization
More informationMotivating the need for optimal sequence alignments...
1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationUniversal Learning Technology: Support Vector Machines
Special Issue on Information Utilizing Technologies for Value Creation Universal Learning Technology: Support Vector Machines By Vladimir VAPNIK* This paper describes the Support Vector Machine (SVM) technology,
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationPrincipal Component Analysis
B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior
More informationMultiple Kernel Learning
CS 678A Course Project Vivek Gupta, 1 Anurendra Kumar 2 Sup: Prof. Harish Karnick 1 1 Department of Computer Science and Engineering 2 Department of Electrical Engineering Indian Institute of Technology,
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationSupport Vector Machines
Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification
More informationCombining pairwise sequence similarity and support vector machines for remote protein homology detection
Combining pairwise sequence similarity and support vector machines for remote protein homology detection Li Liao Central Research & Development E. I. du Pont de Nemours Company li.liao@usa.dupont.com William
More informationDeviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation
Deviations from linear separability Kernel methods CSE 250B Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Systematic deviation
More information1 Diffusion Kernels 2003/08/19 22:08
1 Diffusion Kernels Risi Kondor Columbia University, Department of Computer Science Mail Code 0401, 1214 Amsterdam Avenue, New York, NY 10027, U.S.A. risi@cs.columbia.edu Jean-Philippe Vert Centre de Géostatistique,
More informationBasis Expansion and Nonlinear SVM. Kai Yu
Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationLearning with kernels and SVM
Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationCSCE555 Bioinformatics. Protein Function Annotation
CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The
More informationDimension Reduction (PCA, ICA, CCA, FLD,
Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction
More informationKernel methods CSE 250B
Kernel methods CSE 250B Deviations from linear separability Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Deviations from
More informationMicroarray Data Analysis: Discovery
Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationCISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)
CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST
More informationBioinformatics Chapter 1. Introduction
Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationGene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm
Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Zhenqiu Liu, Dechang Chen 2 Department of Computer Science Wayne State University, Market Street, Frederick, MD 273,
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationChemometrics: Classification of spectra
Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture
More informationTraining algorithms for fuzzy support vector machines with nois
Training algorithms for fuzzy support vector machines with noisy data Presented by Josh Hoak Chun-fu Lin 1 Sheng-de Wang 1 1 National Taiwan University 13 April 2010 Prelude Problem: SVMs are particularly
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationInfinite Ensemble Learning with Support Vector Machinery
Infinite Ensemble Learning with Support Vector Machinery Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.-T. Lin and L. Li (Learning Systems
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationCombining pairwise sequence similarity and support vector machines for remote protein homology detection
Combining pairwise sequence similarity and support vector machines for remote protein homology detection Li Liao Central Research & Development E. I. du Pont de Nemours Company li.liao@usa.dupont.com William
More informationBiological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor
Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms
More information