Compressed Sensing in Cancer Biology? (A Work in Progress)
|
|
- Peter Hancock
- 6 years ago
- Views:
Transcription
1 Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas m.vidyasagar University of Cambridge, 23 November 2012
2 A Cautionary Statement Most talks are finished products they cover completed research. This talk is an attempt to share our current thinking. It is like James Joyce s stream of consciousness it represents work in progress. The questions that are currently occupying my mind are discussed. Six months from now some or all of these questions might be deemed to be irrelevant!
3 Outline Introduction 1 Introduction 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?
4 What is the Problem? Biologists are now able to generate massive amounts of data: Data forms are of different types: real numbers, integers, and Boolean. Data forms are of variable quality and repeatability; different error models are needed for different data forms. What my students and I hope to contribute to cancer research: Development of predictors (classifiers and regressors) based on the integration of multiple types of data. Along the way, we hope to: understand and analyze the behavior of various algorithms.
5 Broad Themes of Current Research 1 Understand the nature of biological data: What is being measured, how is it being measured, and most important, what are the potential sources and nature of error? The types of questions to which biologists would like to have answers, e.g. Prognosis for lung cancer patients Effects of applying multiple drug combinations, when we know only the effects of applying one drug at a time Identify possible upstream genes when gene of interest is not directly druggable and so on.
6 Broad Themes of Current Research 2 And then Devise appropriate algorithms to integrate multiple types of data and make predictions Validate the algorithms on existing data where outcomes are known Make predictions on outcomes of new experiments Most important: Persuade biologists to undertake those experiments Use outcomes of new experiments to fine-tune / reject algorithms Repeat
7 An Abstract Problem Formulation The data consists of labelled samples of the form (x ij, c i ), where i = 1,..., n, j = 1,..., p, and p n. n is the number of samples on which experiments are conducted, and p is the number of entities that are measured for each sample. c i is the class label; it depends only on the sample and not on the entities being measured. x ij, the measurement, can be one of three types: It can be a real number, a nonnegative integer, or a Boolean variable. c i, the class label, can be a real number, or an integer 2. Objective: Construct a function f such that f(x i ) is a reasonably good predictor of c i, where x i = (x ij, j = 1,..., p).
8 An Abstract Problem Formulation 2 In classification problems one often wants a posterior probability of belonging to a class. Define S k = {v R k + : k v i = 1}, o=1 the set of probability distributions on a set with k elements. If c i {1,..., k}, instead of seeking a function f : x f(x) {1,..., k}, one can ask that f(x) S k. Then f(x) is a k-dimensional vector where f l (x) = Pr{c(x) = l}, l = 1,..., k.
9 Some Terminology Suppose x i R p. (Note that this may not be true!). If c i R, then the problem is one of regression, or fitting a function to measured values. If c i {1,..., k}, then the problem is one of k-class classification. If k = 2 and c i { 1, 1} (after relabeling), and if we find a function d : R n R and set f(x) = sign[d(x)], then d( ) is called the discriminant function. Classification is an easier problem than regression. Also, a discriminant function need not be unique. Often f (or d) is a linear function of x (greater tolerance to noisy measurements, simpler interpretation, etc.).
10 Outline 1 Introduction Introduction 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?
11 Outline 1 Introduction Introduction 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?
12 Conventional Least-Squares Regression Define X = [x ij ] R n p, c = [c i ] R n, try to fit c with a linear function Xβ. Different ways of measuring the error of the fit lead to different methods. Standard least-squares regression is min β y Xβ 2 2. If X T X R p p has rank p, then solution is β = (X T X) 1 X T y. This is standard procedure when n > p (more measurements than parameters).
13 Ridge Regression Introduction If p > n, then X T X is singular. Ridge regression is: or in Lagrangian formulation, min β y Xβ 2 2 s.t. β 2 a, J ridge = min β y Xβ λ β 2 2 where λ is the Lagrangian parameter.
14 Ridge Regression 2 Introduction Hessian of J ridge = X T X + λi p is always positive definite; so optimal solution always exists and is unique (that is a good thing). In fact β ridge = (XT X + λi p ) 1 X T y. In general all components of β ridge will be nonzero not a good thing. Why does this happen? Because penalty term λ β 2 2 penalizes the the norm of the coefficient vector β, not the the number of nonzero components of β.
15 An NP-Hard Problem Given β R p, define its l 0 - norm as the number of nonzero components of β. Given a fixed integer k, the problem is min β y Xβ 2 2 s.t. β 0 k, or in words, find the best fit in the l 2 -norm that uses k or fewer nonzero components. Bad news! This problem is NP-hard! So we need to try some other approach.
16 LASSO Regression Introduction LASSO (or lasso) = least absolute shrinkage and selection operator. The problem formulation is (Tibshirani (1996)): or in Lagrangian formulation, min β y Xβ 2 2 s.t. β 1 a, J ridge = min β y Xβ λ β 1 where λ is the Lagrangian parameter.
17 LASSO Regression 2 Lasso regression is a quadratic programming problem very easy to solve. Unlike in ridge regression, in lasso the optimal β lasso has at most n components, where n is the number of measurements. General approach: Start with some choice for the Lagrange multiplier λ, compute β λ, and then let λ approach infinity. This is called the trajectory.
18 Elastic Net Regression The elastic formulation combines ridge and lasso regression. The problem formulation is (Zou and Hastie (2005)): min β y Xβ 2 2 s.t. α β 2 + (1 α) β 1 a, or in Lagrangian formulation, J ridge = min β y Xβ λ[α β 2 + (1 α) β 1 ], where λ is the Lagrangian parameter and α (0, 1) is another parameter. The elastic net solution also has at most n nonzero entries, and converges to the optimum is supposedly faster.
19 Outline 1 Introduction Introduction 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?
20 Support Vector Machines The data is said to be linearly separable if there exists a weight vector w R p and a bias b such that { x T > 0 if ci = 1, i w b < 0 if c i = 1. Data might not be linearly separable If data is linearly separable, then there exist infinitely many choices of w, b. Support Vector Machines (SVMs) due to Cortes and Vapnik (1997) give an elegant solution.
21 Linear SVM: Illustration SVM (support vector machine): Maximize minimum distance to separating hyperplane. Equivalent to: Minimize w while achieving a gap of 1.
22 Linear SVM: Some Properties Quadratic programming problem: very easy to solve for huge data sets. Optimal weight w is supported on just a few entries. So adding more vectors x i often does not change the optimal choice of w. In general all components of optimal w are nonzero. OK if n p, but not if p n. Linear separability is not an issue if p n 1.
23 Generalized SVMs Introduction Due to Bradley and Mangasarian (1998). Measure distances in feature space using some norm, then measure distances in weight space using the dual norm w d = max x 1 xt w. In traditional SVM, l 2 -norm is used, which is its own dual. In this case in general all components of optimal w are nonzero. If we use l 1 -norm on x, x 1 = i x i, and l -norm on w, w = max i x i, then optimal w has at most m nonzero entries (m = number of samples). What if data is not linearly separable? Settle for misclassifying as few samples as possible.
24 Modified l 1 -Norm SVM To trade off penalties for false positives and false negatives, incorporate ideas from Veropoulos et al. (1999): Choose constants λ close to zero, and α (0, 1). Let m 1 = M 1, m 2 = M 2, and let e denote a column vector of all ones. Then min (1 w,b,y,z λ)[αet n 1 y + (1 α)e n2 z t ] + λ w s.t. x t iw b + 1 y i, 1 i n 1, x t iw b 1 + z i, n i n, y 0 m1, z 0 m2. If α > 0.5, then more emphasis on correctly classifying entries in M 1. Opposite if α < 0.5.
25 Outline 1 Introduction Introduction Compressed Sensing Techniques The Lone Star Algorithm 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?
26 Outline 1 Introduction Introduction Compressed Sensing Techniques The Lone Star Algorithm 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?
27 S-Sparse Vectors Introduction Compressed Sensing Techniques The Lone Star Algorithm Model: As before we have a predictor matrix X R n p. The measurement vector y R n is given by y = Xβ + z, where β R p is the parameter of interest and z 1,..., z p are i.i.d. in N(0, σ). The vector β is said to be S-sparse, where S is a specified integer, if β j = 0 for all except at most S entries. Premise: The data is generated by a true but unknown S-sparse parameter vector β, where measurements are corrupted by i.i.d. Gaussian noise.
28 The Dantzig Selector Introduction Compressed Sensing Techniques The Lone Star Algorithm Solution (The Dantzig Selector): Define r := y Xβ to be the residual. The problem (Candes and Tao (2007)) is: min β 1 s.t. X T r c β where c is some constant. This is equivalent to: min X T r s.t. β 1 c β where c is some (possibly different) constant. Clearly this is a linear programming problem.
29 Compressed Sensing Techniques The Lone Star Algorithm Behavior of Dantzig Algorithm: Noiseless Case Suppose the matrix X satisfies the uniform uncertainty principle (UUP), basically a statement about the maximum correlation between columns of X, and that z = 0. Then recovers the unknown β exactly. min β 1 s.t. Xβ = y β
30 Compressed Sensing Techniques The Lone Star Algorithm Behavior of Dantzig Algorithm: Noisy Case In the noisy case, we need to introduce a probability measure P on the set of all S-sparse vectors. All conclusions are with respect to P. β 2 Red lines depict set of 1-sparse vectors β in R 2. β 1 P is a probability measure on the set of S-sparse vectors in R p.
31 Compressed Sensing Techniques The Lone Star Algorithm Behavior of Dantzig Algorithm: Noisy Case 2 There exist strictly increasing functions f, g : R + R + with f(0) = g(0) = 0 such that: For all S-sparse β except those belonging to a set of volume f(σ), the output of the Dantzig selector ˆβ satisfies ˆβ β 2 g(σ). Roughly speaking, for almost all true but unknown β, the output of the Dantzig algorithm ˆβ almost equals the true β. Explicit formulas are available for the functions f( ) and g( ).
32 Outline 1 Introduction Introduction Compressed Sensing Techniques The Lone Star Algorithm 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?
33 Problem Formulation Introduction Compressed Sensing Techniques The Lone Star Algorithm We focus on two-class classification: c i { 1, 1}. Without loss of generality, suppose c i = 1 for 1 i n 1, c i = 1 for n i n, and define n 2 = n n 1. Objective: Given x i R p, i = 1,..., n, and class labels c i {0, 1}, find a function f : R p R such that { > 0 if ci = 1, f(x i ) < 0 if c i = 0. Or, if this is not possible, misclassify relatively few samples.
34 The Lone Star Algorithm 1 Compressed Sensing Techniques The Lone Star Algorithm Preliminary step: Define N 1 = {1,..., n 1 }, N 2 = {n 1 + 1,..., n}. For all j = 1,..., p, compute means µ j,l = 1 n 1 i N l x i,j, l = 1, 2. Eliminate all j for which µ j,1 µ j,2 is not statistically significant using Student t-test. Let p again denote reduced number of features. Choose integers k 1, k 2, the number of training samples for each run, such that k 1 n 1 /2, k 2 n 2 /2, k 1 k 2.
35 The Lone Star Algorithm 2 Compressed Sensing Techniques The Lone Star Algorithm 1 Choose k 1 samples from N 1, k 2 samples from N 2 at random. Run l 1 -norm SVM to compute optimal w on reduced feature set. Repeat many times. 2 Each optimal w has at most k 1 + k 2 nonzero entries, but in different locations. Let k denote number of nonzero entries for each run. Average all optimal weights, and retain only features with k highest weights. 3 Repeat. This is similar to RFE = Recursive Feature Elimination (Guyon et al., 2002). Algorithm: l 1 -norm SVM t-test and RFE, where SVM = Support Vector Machine and RFE = Recursive Feature Elimination. Acronym: l 1 -StaR, pronounced as lone star.
36 The Lone Star Algorithm 3 Compressed Sensing Techniques The Lone Star Algorithm The lone star algorithm has been applied to several data sets from various forms of cancer. Since it is based on an l 1 -norm SVM, the optimal weight vector is guaranteed to have no more nonzero entries than the number of training samples. In all examples run thus far, the actual number of features used is many times smaller than the number of training samples! The question is: Is this a general property, or just a fluke?
37 Outline 1 Introduction Introduction Some Open Problems A New Algorithm? 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?
38 Outline 1 Introduction Introduction Some Open Problems A New Algorithm? 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?
39 Some Open Problems A New Algorithm? Regression and Classification with Mixed Measurements Given data {(x ij, y i )}, suppose data is of mixed type. So for a fixed sample i, some x ij R while other x ij {0, 1}. How can we carry out regression and classification in this case?
40 Behavior of Lone Star Algorithm Some Open Problems A New Algorithm? In all examples of cancer data to which the lone star algorithm has been tried ( 20), the number of final features selected is much less than the number of training samples. Can this be proved, or situations identified when this can be expected to happen? What is the behavior of the Dantzig selector if there is no true but unknown parameter vector β? The current proof collapses irretrievably; does something else take its place?
41 Outline 1 Introduction Introduction Some Open Problems A New Algorithm? 2 3 Compressed Sensing Techniques The Lone Star Algorithm 4 Some Open Problems A New Algorithm?
42 Student t Test: Objective Some Open Problems A New Algorithm? Given two sets of samples x = (x 1,..., x n ) and y = (y 1,..., y m ), define x = 1 n x i, ȳ = 1 m y i n m i=1 j=1 denote the two means of the two sets of samples. The Student t test can be used to determine whether the difference between the means is statistically significant.
43 Some Open Problems A New Algorithm? Combining Features to Improve t Statistic: Depiction Given multiple measurements on two classes, we wish to combine them to increase the value of the t statistic (enhance discriminative ability). Class 1 Class 2 x x 1k... x n1... x nk y y 1k. y m y mk... λ 1... λ k
44 Some Open Problems A New Algorithm? Combining Features to Improve t Statistic Questions: Given k different sets of (real-valued) measurements x 1j,..., x nj and y 1j,..., y mj, j = 1,..., k, is it possible to form a linear combination statistic k k u i = x ij λ j, v i = y ij λ j, j=1 j=1 such that, for the right choice of the weights λ, we have t(u; v) > max t(x j ; y j )? j (Combined statistic has greater discrimination than any individual statistic.) What is the optimal choice of weight vector λ?
45 Student t Test: Details Some Open Problems A New Algorithm? Quick Recap: Define x = 1 n n x i, ȳ = 1 m i=1 m j=1 y i denote the two means, and [ ( n 1 S P = (x i x) 2 + n + m 2 Then the quantity t(x; y) = i=1 )] 1/2 m (y i ȳ) 2. i=1 ( 1 n + 1 ) 1/2 x ȳ m S P satisfies the t distribution with n + m 2 degrees of freedom.
46 Optimal Choice of Weights Some Open Problems A New Algorithm? Define X = [x ij ] R n p, Y = [y ij ] R m p. For each index j, compute associated means x j = 1 n and vectors and matrices n x ij, ȳ j = 1 n i=1 m y ij, i=1 x = [ x j ] R k, ȳ = [ȳ j ] R k, c = x ȳ R k, C = Diag(c 1,..., c k ) R k k.
47 Optimal Choice of Weights 2 Define constants ( 1 a = n + 1 m covariance matrices ) 1/2 n 1 n + m 2, b = Some Open Problems A New Algorithm? ( 1 n + 1 ) 1/2 m 1 m n + m 2, Σ X = (X e n x)(x e n x) T, Σ Y = (Y e m ȳ)(y e n ȳ) T, where e n is a column vector of n ones, and finally Then optimal choice of λ is M = C 1 (aσ X + bσ Y )C 1. λ = 1 e T k M 1 e k e T k M 1.
48 Application Introduction Some Open Problems A New Algorithm? Each measurement indicates some difference between two classes. By combining measurements so as to optimize the t statistic, we can maximize the discriminatory ability, and also get a posterior probability of belonging to the two classes. Question: This work could have been done 70 or 80 years ago. Was it?
49 And Finally: A Confession Some Open Problems A New Algorithm? SIAM has asked me to write a book, and I have chosen a tentative title: Computational Cancer Biology: A Machine Learning Approach. What you have seen is an extended abstract of what I hope to cover. Feedback? Comments? Questions?
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More information6. Regularized linear regression
Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationReferences. Lecture 7: Support Vector Machines. Optimum Margin Perceptron. Perceptron Learning Rule
References Lecture 7: Support Vector Machines Isabelle Guyon guyoni@inf.ethz.ch An training algorithm for optimal margin classifiers Boser-Guyon-Vapnik, COLT, 992 http://www.clopinet.com/isabelle/p apers/colt92.ps.z
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationL5 Support Vector Classification
L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationA Short Introduction to the Lasso Methodology
A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationSparse Approximation and Variable Selection
Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationAn Improved 1-norm SVM for Simultaneous Classification and Variable Selection
An Improved 1-norm SVM for Simultaneous Classification and Variable Selection Hui Zou School of Statistics University of Minnesota Minneapolis, MN 55455 hzou@stat.umn.edu Abstract We propose a novel extension
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationMidterm Exam, Spring 2005
10-701 Midterm Exam, Spring 2005 1. Write your name and your email address below. Name: Email address: 2. There should be 15 numbered pages in this exam (including this cover sheet). 3. Write your name
More informationFast Regularization Paths via Coordinate Descent
August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationy(x) = x w + ε(x), (1)
Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 29 April, SoSe 2015 Support Vector Machines (SVMs) 1. One of
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationSTAT 462-Computational Data Analysis
STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationSome models of genomic selection
Munich, December 2013 What is the talk about? Barley! Steptoe x Morex barley mapping population Steptoe x Morex barley mapping population genotyping from Close at al., 2009 and phenotyping from cite http://wheat.pw.usda.gov/ggpages/sxm/
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationNonlinear Support Vector Machines through Iterative Majorization and I-Splines
Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationMidterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.
CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More informationExam: high-dimensional data analysis January 20, 2014
Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y
More informationThe prediction of house price
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More information6.867 Machine learning
6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of
More informationConvex Optimization and Support Vector Machine
Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationConvex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)
ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality
More informationPolyhedral Computation. Linear Classifiers & the SVM
Polyhedral Computation Linear Classifiers & the SVM mcuturi@i.kyoto-u.ac.jp Nov 26 2010 1 Statistical Inference Statistical: useful to study random systems... Mutations, environmental changes etc. life
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More information