Compressed Sensing in Cancer Biology? (A Work in Progress)

Size: px
Start display at page:

Download "Compressed Sensing in Cancer Biology? (A Work in Progress)"

Transcription

1 Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas m.vidyasagar University of Cambridge, 23 November 2012

2 A Cautionary Statement Most talks are finished products they cover completed research. This talk is an attempt to share our current thinking. It is like James Joyce s stream of consciousness it represents work in progress. The questions that are currently occupying my mind are discussed. Six months from now some or all of these questions might be deemed to be irrelevant!

3 Outline Introduction 1 Introduction 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?

4 What is the Problem? Biologists are now able to generate massive amounts of data: Data forms are of different types: real numbers, integers, and Boolean. Data forms are of variable quality and repeatability; different error models are needed for different data forms. What my students and I hope to contribute to cancer research: Development of predictors (classifiers and regressors) based on the integration of multiple types of data. Along the way, we hope to: understand and analyze the behavior of various algorithms.

5 Broad Themes of Current Research 1 Understand the nature of biological data: What is being measured, how is it being measured, and most important, what are the potential sources and nature of error? The types of questions to which biologists would like to have answers, e.g. Prognosis for lung cancer patients Effects of applying multiple drug combinations, when we know only the effects of applying one drug at a time Identify possible upstream genes when gene of interest is not directly druggable and so on.

6 Broad Themes of Current Research 2 And then Devise appropriate algorithms to integrate multiple types of data and make predictions Validate the algorithms on existing data where outcomes are known Make predictions on outcomes of new experiments Most important: Persuade biologists to undertake those experiments Use outcomes of new experiments to fine-tune / reject algorithms Repeat

7 An Abstract Problem Formulation The data consists of labelled samples of the form (x ij, c i ), where i = 1,..., n, j = 1,..., p, and p n. n is the number of samples on which experiments are conducted, and p is the number of entities that are measured for each sample. c i is the class label; it depends only on the sample and not on the entities being measured. x ij, the measurement, can be one of three types: It can be a real number, a nonnegative integer, or a Boolean variable. c i, the class label, can be a real number, or an integer 2. Objective: Construct a function f such that f(x i ) is a reasonably good predictor of c i, where x i = (x ij, j = 1,..., p).

8 An Abstract Problem Formulation 2 In classification problems one often wants a posterior probability of belonging to a class. Define S k = {v R k + : k v i = 1}, o=1 the set of probability distributions on a set with k elements. If c i {1,..., k}, instead of seeking a function f : x f(x) {1,..., k}, one can ask that f(x) S k. Then f(x) is a k-dimensional vector where f l (x) = Pr{c(x) = l}, l = 1,..., k.

9 Some Terminology Suppose x i R p. (Note that this may not be true!). If c i R, then the problem is one of regression, or fitting a function to measured values. If c i {1,..., k}, then the problem is one of k-class classification. If k = 2 and c i { 1, 1} (after relabeling), and if we find a function d : R n R and set f(x) = sign[d(x)], then d( ) is called the discriminant function. Classification is an easier problem than regression. Also, a discriminant function need not be unique. Often f (or d) is a linear function of x (greater tolerance to noisy measurements, simpler interpretation, etc.).

10 Outline 1 Introduction Introduction 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?

11 Outline 1 Introduction Introduction 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?

12 Conventional Least-Squares Regression Define X = [x ij ] R n p, c = [c i ] R n, try to fit c with a linear function Xβ. Different ways of measuring the error of the fit lead to different methods. Standard least-squares regression is min β y Xβ 2 2. If X T X R p p has rank p, then solution is β = (X T X) 1 X T y. This is standard procedure when n > p (more measurements than parameters).

13 Ridge Regression Introduction If p > n, then X T X is singular. Ridge regression is: or in Lagrangian formulation, min β y Xβ 2 2 s.t. β 2 a, J ridge = min β y Xβ λ β 2 2 where λ is the Lagrangian parameter.

14 Ridge Regression 2 Introduction Hessian of J ridge = X T X + λi p is always positive definite; so optimal solution always exists and is unique (that is a good thing). In fact β ridge = (XT X + λi p ) 1 X T y. In general all components of β ridge will be nonzero not a good thing. Why does this happen? Because penalty term λ β 2 2 penalizes the the norm of the coefficient vector β, not the the number of nonzero components of β.

15 An NP-Hard Problem Given β R p, define its l 0 - norm as the number of nonzero components of β. Given a fixed integer k, the problem is min β y Xβ 2 2 s.t. β 0 k, or in words, find the best fit in the l 2 -norm that uses k or fewer nonzero components. Bad news! This problem is NP-hard! So we need to try some other approach.

16 LASSO Regression Introduction LASSO (or lasso) = least absolute shrinkage and selection operator. The problem formulation is (Tibshirani (1996)): or in Lagrangian formulation, min β y Xβ 2 2 s.t. β 1 a, J ridge = min β y Xβ λ β 1 where λ is the Lagrangian parameter.

17 LASSO Regression 2 Lasso regression is a quadratic programming problem very easy to solve. Unlike in ridge regression, in lasso the optimal β lasso has at most n components, where n is the number of measurements. General approach: Start with some choice for the Lagrange multiplier λ, compute β λ, and then let λ approach infinity. This is called the trajectory.

18 Elastic Net Regression The elastic formulation combines ridge and lasso regression. The problem formulation is (Zou and Hastie (2005)): min β y Xβ 2 2 s.t. α β 2 + (1 α) β 1 a, or in Lagrangian formulation, J ridge = min β y Xβ λ[α β 2 + (1 α) β 1 ], where λ is the Lagrangian parameter and α (0, 1) is another parameter. The elastic net solution also has at most n nonzero entries, and converges to the optimum is supposedly faster.

19 Outline 1 Introduction Introduction 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?

20 Support Vector Machines The data is said to be linearly separable if there exists a weight vector w R p and a bias b such that { x T > 0 if ci = 1, i w b < 0 if c i = 1. Data might not be linearly separable If data is linearly separable, then there exist infinitely many choices of w, b. Support Vector Machines (SVMs) due to Cortes and Vapnik (1997) give an elegant solution.

21 Linear SVM: Illustration SVM (support vector machine): Maximize minimum distance to separating hyperplane. Equivalent to: Minimize w while achieving a gap of 1.

22 Linear SVM: Some Properties Quadratic programming problem: very easy to solve for huge data sets. Optimal weight w is supported on just a few entries. So adding more vectors x i often does not change the optimal choice of w. In general all components of optimal w are nonzero. OK if n p, but not if p n. Linear separability is not an issue if p n 1.

23 Generalized SVMs Introduction Due to Bradley and Mangasarian (1998). Measure distances in feature space using some norm, then measure distances in weight space using the dual norm w d = max x 1 xt w. In traditional SVM, l 2 -norm is used, which is its own dual. In this case in general all components of optimal w are nonzero. If we use l 1 -norm on x, x 1 = i x i, and l -norm on w, w = max i x i, then optimal w has at most m nonzero entries (m = number of samples). What if data is not linearly separable? Settle for misclassifying as few samples as possible.

24 Modified l 1 -Norm SVM To trade off penalties for false positives and false negatives, incorporate ideas from Veropoulos et al. (1999): Choose constants λ close to zero, and α (0, 1). Let m 1 = M 1, m 2 = M 2, and let e denote a column vector of all ones. Then min (1 w,b,y,z λ)[αet n 1 y + (1 α)e n2 z t ] + λ w s.t. x t iw b + 1 y i, 1 i n 1, x t iw b 1 + z i, n i n, y 0 m1, z 0 m2. If α > 0.5, then more emphasis on correctly classifying entries in M 1. Opposite if α < 0.5.

25 Outline 1 Introduction Introduction Compressed Sensing Techniques The Lone Star Algorithm 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?

26 Outline 1 Introduction Introduction Compressed Sensing Techniques The Lone Star Algorithm 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?

27 S-Sparse Vectors Introduction Compressed Sensing Techniques The Lone Star Algorithm Model: As before we have a predictor matrix X R n p. The measurement vector y R n is given by y = Xβ + z, where β R p is the parameter of interest and z 1,..., z p are i.i.d. in N(0, σ). The vector β is said to be S-sparse, where S is a specified integer, if β j = 0 for all except at most S entries. Premise: The data is generated by a true but unknown S-sparse parameter vector β, where measurements are corrupted by i.i.d. Gaussian noise.

28 The Dantzig Selector Introduction Compressed Sensing Techniques The Lone Star Algorithm Solution (The Dantzig Selector): Define r := y Xβ to be the residual. The problem (Candes and Tao (2007)) is: min β 1 s.t. X T r c β where c is some constant. This is equivalent to: min X T r s.t. β 1 c β where c is some (possibly different) constant. Clearly this is a linear programming problem.

29 Compressed Sensing Techniques The Lone Star Algorithm Behavior of Dantzig Algorithm: Noiseless Case Suppose the matrix X satisfies the uniform uncertainty principle (UUP), basically a statement about the maximum correlation between columns of X, and that z = 0. Then recovers the unknown β exactly. min β 1 s.t. Xβ = y β

30 Compressed Sensing Techniques The Lone Star Algorithm Behavior of Dantzig Algorithm: Noisy Case In the noisy case, we need to introduce a probability measure P on the set of all S-sparse vectors. All conclusions are with respect to P. β 2 Red lines depict set of 1-sparse vectors β in R 2. β 1 P is a probability measure on the set of S-sparse vectors in R p.

31 Compressed Sensing Techniques The Lone Star Algorithm Behavior of Dantzig Algorithm: Noisy Case 2 There exist strictly increasing functions f, g : R + R + with f(0) = g(0) = 0 such that: For all S-sparse β except those belonging to a set of volume f(σ), the output of the Dantzig selector ˆβ satisfies ˆβ β 2 g(σ). Roughly speaking, for almost all true but unknown β, the output of the Dantzig algorithm ˆβ almost equals the true β. Explicit formulas are available for the functions f( ) and g( ).

32 Outline 1 Introduction Introduction Compressed Sensing Techniques The Lone Star Algorithm 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?

33 Problem Formulation Introduction Compressed Sensing Techniques The Lone Star Algorithm We focus on two-class classification: c i { 1, 1}. Without loss of generality, suppose c i = 1 for 1 i n 1, c i = 1 for n i n, and define n 2 = n n 1. Objective: Given x i R p, i = 1,..., n, and class labels c i {0, 1}, find a function f : R p R such that { > 0 if ci = 1, f(x i ) < 0 if c i = 0. Or, if this is not possible, misclassify relatively few samples.

34 The Lone Star Algorithm 1 Compressed Sensing Techniques The Lone Star Algorithm Preliminary step: Define N 1 = {1,..., n 1 }, N 2 = {n 1 + 1,..., n}. For all j = 1,..., p, compute means µ j,l = 1 n 1 i N l x i,j, l = 1, 2. Eliminate all j for which µ j,1 µ j,2 is not statistically significant using Student t-test. Let p again denote reduced number of features. Choose integers k 1, k 2, the number of training samples for each run, such that k 1 n 1 /2, k 2 n 2 /2, k 1 k 2.

35 The Lone Star Algorithm 2 Compressed Sensing Techniques The Lone Star Algorithm 1 Choose k 1 samples from N 1, k 2 samples from N 2 at random. Run l 1 -norm SVM to compute optimal w on reduced feature set. Repeat many times. 2 Each optimal w has at most k 1 + k 2 nonzero entries, but in different locations. Let k denote number of nonzero entries for each run. Average all optimal weights, and retain only features with k highest weights. 3 Repeat. This is similar to RFE = Recursive Feature Elimination (Guyon et al., 2002). Algorithm: l 1 -norm SVM t-test and RFE, where SVM = Support Vector Machine and RFE = Recursive Feature Elimination. Acronym: l 1 -StaR, pronounced as lone star.

36 The Lone Star Algorithm 3 Compressed Sensing Techniques The Lone Star Algorithm The lone star algorithm has been applied to several data sets from various forms of cancer. Since it is based on an l 1 -norm SVM, the optimal weight vector is guaranteed to have no more nonzero entries than the number of training samples. In all examples run thus far, the actual number of features used is many times smaller than the number of training samples! The question is: Is this a general property, or just a fluke?

37 Outline 1 Introduction Introduction Some Open Problems A New Algorithm? 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?

38 Outline 1 Introduction Introduction Some Open Problems A New Algorithm? 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?

39 Some Open Problems A New Algorithm? Regression and Classification with Mixed Measurements Given data {(x ij, y i )}, suppose data is of mixed type. So for a fixed sample i, some x ij R while other x ij {0, 1}. How can we carry out regression and classification in this case?

40 Behavior of Lone Star Algorithm Some Open Problems A New Algorithm? In all examples of cancer data to which the lone star algorithm has been tried ( 20), the number of final features selected is much less than the number of training samples. Can this be proved, or situations identified when this can be expected to happen? What is the behavior of the Dantzig selector if there is no true but unknown parameter vector β? The current proof collapses irretrievably; does something else take its place?

41 Outline 1 Introduction Introduction Some Open Problems A New Algorithm? 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?

42 Student t Test: Objective Some Open Problems A New Algorithm? Given two sets of samples x = (x 1,..., x n ) and y = (y 1,..., y m ), define x = 1 n x i, ȳ = 1 m y i n m i=1 j=1 denote the two means of the two sets of samples. The Student t test can be used to determine whether the difference between the means is statistically significant.

43 Some Open Problems A New Algorithm? Combining Features to Improve t Statistic: Depiction Given multiple measurements on two classes, we wish to combine them to increase the value of the t statistic (enhance discriminative ability). Class 1 Class 2 x x 1k... x n1... x nk y y 1k. y m y mk... λ 1... λ k

44 Some Open Problems A New Algorithm? Combining Features to Improve t Statistic Questions: Given k different sets of (real-valued) measurements x 1j,..., x nj and y 1j,..., y mj, j = 1,..., k, is it possible to form a linear combination statistic k k u i = x ij λ j, v i = y ij λ j, j=1 j=1 such that, for the right choice of the weights λ, we have t(u; v) > max t(x j ; y j )? j (Combined statistic has greater discrimination than any individual statistic.) What is the optimal choice of weight vector λ?

45 Student t Test: Details Some Open Problems A New Algorithm? Quick Recap: Define x = 1 n n x i, ȳ = 1 m i=1 m j=1 y i denote the two means, and [ ( n 1 S P = (x i x) 2 + n + m 2 Then the quantity t(x; y) = i=1 )] 1/2 m (y i ȳ) 2. i=1 ( 1 n + 1 ) 1/2 x ȳ m S P satisfies the t distribution with n + m 2 degrees of freedom.

46 Optimal Choice of Weights Some Open Problems A New Algorithm? Define X = [x ij ] R n p, Y = [y ij ] R m p. For each index j, compute associated means x j = 1 n and vectors and matrices n x ij, ȳ j = 1 n i=1 m y ij, i=1 x = [ x j ] R k, ȳ = [ȳ j ] R k, c = x ȳ R k, C = Diag(c 1,..., c k ) R k k.

47 Optimal Choice of Weights 2 Define constants ( 1 a = n + 1 m covariance matrices ) 1/2 n 1 n + m 2, b = Some Open Problems A New Algorithm? ( 1 n + 1 ) 1/2 m 1 m n + m 2, Σ X = (X e n x)(x e n x) T, Σ Y = (Y e m ȳ)(y e n ȳ) T, where e n is a column vector of n ones, and finally Then optimal choice of λ is M = C 1 (aσ X + bσ Y )C 1. λ = 1 e T k M 1 e k e T k M 1.

48 Application Introduction Some Open Problems A New Algorithm? Each measurement indicates some difference between two classes. By combining measurements so as to optimize the t statistic, we can maximize the discriminatory ability, and also get a posterior probability of belonging to the two classes. Question: This work could have been done 70 or 80 years ago. Was it?

49 And Finally: A Confession Some Open Problems A New Algorithm? SIAM has asked me to write a book, and I have chosen a tentative title: Computational Cancer Biology: A Machine Learning Approach. What you have seen is an extended abstract of what I hope to cover. Feedback? Comments? Questions?

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

References. Lecture 7: Support Vector Machines. Optimum Margin Perceptron. Perceptron Learning Rule

References. Lecture 7: Support Vector Machines. Optimum Margin Perceptron. Perceptron Learning Rule References Lecture 7: Support Vector Machines Isabelle Guyon guyoni@inf.ethz.ch An training algorithm for optimal margin classifiers Boser-Guyon-Vapnik, COLT, 992 http://www.clopinet.com/isabelle/p apers/colt92.ps.z

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

A Short Introduction to the Lasso Methodology

A Short Introduction to the Lasso Methodology A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection An Improved 1-norm SVM for Simultaneous Classification and Variable Selection Hui Zou School of Statistics University of Minnesota Minneapolis, MN 55455 hzou@stat.umn.edu Abstract We propose a novel extension

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Midterm Exam, Spring 2005

Midterm Exam, Spring 2005 10-701 Midterm Exam, Spring 2005 1. Write your name and your email address below. Name: Email address: 2. There should be 15 numbered pages in this exam (including this cover sheet). 3. Write your name

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

y(x) = x w + ε(x), (1)

y(x) = x w + ε(x), (1) Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 29 April, SoSe 2015 Support Vector Machines (SVMs) 1. One of

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

STAT 462-Computational Data Analysis

STAT 462-Computational Data Analysis STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Some models of genomic selection

Some models of genomic selection Munich, December 2013 What is the talk about? Barley! Steptoe x Morex barley mapping population Steptoe x Morex barley mapping population genotyping from Close at al., 2009 and phenotyping from cite http://wheat.pw.usda.gov/ggpages/sxm/

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam. CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of

More information

Convex Optimization and Support Vector Machine

Convex Optimization and Support Vector Machine Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs) ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality

More information

Polyhedral Computation. Linear Classifiers & the SVM

Polyhedral Computation. Linear Classifiers & the SVM Polyhedral Computation Linear Classifiers & the SVM mcuturi@i.kyoto-u.ac.jp Nov 26 2010 1 Statistical Inference Statistical: useful to study random systems... Mutations, environmental changes etc. life

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

CSC 411 Lecture 17: Support Vector Machine

CSC 411 Lecture 17: Support Vector Machine CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17

More information