Machine Learning Lecture 7

Size: px
Start display at page:

Download "Machine Learning Lecture 7"

Transcription

1 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory Discriminative Approaches (5 weeks) Linear Discriminant Functions Statistical Learning Theory & SVMs Ensemble Methods & Boosting Randomized Trees, Forests & Ferns Bastian Leibe RWTH Aachen Generative Models (4 weeks) Bayesian Networks Markov Random Fields leibe@vision.rwth-aachen.de Many slides adapted from B. Schiele 3 Topics of This Lecture Recap: Linear Discriminant Functions Recap: Generalized Linear Discriminants Logistic Regression Probabilistic discriminative models Logistic sigmoid (logit function) Cross-entropy error Gradient descent Iteratively Reweighted Least Squares Note on error functions Statistical Learning Theory Generalization and overfitting Empirical and actual risk VC dimension Empirical Risk Minimization Structural Risk Minimization 4 Basic idea Directly encode decision boundary Minimize misclassification probability directly. Linear discriminant functions y(x) = w T x + w 0 weight vector bias (= threshold) w, w 0 define a hyperplane in R D. If a data set can be perfectly classified by a linear discriminant, then we call it linearly separable. y = 0 y < 0 y > 0 5 Recap: Extension to Nonlinear Basis Fcts. Recap: Basis Functions Generalization Transform vector x with M nonlinear basis functions Á j (x): MX y k (x) = w Á j (x) + w k0 Advantages j=1 Transformation allows non-linear decision boundaries. By choosing the right Á j, every continuous function can (in principle) be approximated with arbitrary accuracy. Generally, we consider models of the following form where Á j (x) are known as basis functions. In the simplest case, we use linear basis functions: Á d (x) = x d. Other popular basis functions Disadvantage The error function can in general no longer be minimized in closed form. Minimization with Gradient Descent 6 Polynomial Gaussian Sigmoid 7 1

2 Gradient Descent Recap: Gradient Descent Iterative minimization Start with an initial guess for the parameter values w (0). Move towards a (local) minimum by following the gradient. Basic strategies w( ) Example: Quadratic error function E(w) = (y(x n ; w) t n ) 2 Sequential updating leads to delta rule (=LMS rule) (y k(x n ; w) t kn ) Á j (x n ) Sequential updating where E(w) E n (w) w( ) 8 where ± kná j (x n ) ± kn = y k (x n ; w) t kn Simply feed back the input data point, weighted by the classification error. 9 Recap: Gradient Descent Recap: Classification as Dim. Reduction Cases with differentiable, non-linear activation function 0 1 MX y k (x) = g(a k ) = w ki Á j (x n ) A j=0 bad separation good separation Gradient descent (again with quadratic error n (y k (x n ; w) t kn ) Á j (x n ) ± kná j (x n ) ± kn (y k (x n ; w) t kn ) 10 Classification as dimensionality reduction Interpret linear classification as a projection onto a lower-dim. space. y = w T x Learning problem: Try to find the projection vector w that maximizes class separation. 11 Image source: C.M. Bishop, 2006 Recap: Fisher s Linear Discriminant Analysis Topics of This Lecture Class 1 x x w Slide adapted from Ales Leonardis Class 2 Maximize distance between classes Minimize distance within a class Criterion: S B between-class scatter matrix S W within-class scatter matrix The optimal solution for w can be obtained as: w / S 1 W (m 2 m 1 ) Classification function: y J(w) = wt S Bw w T S W w where w 0 = w T m 12 Recap: Generalized Linear Discriminants Logistic Regression Probabilistic discriminative models Logistic sigmoid (logit function) Cross-entropy error Gradient descent Iteratively Reweighted Least Squares Note on error functions Statistical Learning Theory Generalization and overfitting Empirical and actual risk VC dimension Empirical Risk Minimization Structural Risk Minimization 17 2

3 Probabilistic Discriminative Models Probabilistic Discriminative Models We have seen that we can write p(c 1 jx) = ¾(a) 1 = 1 + exp( a) We can obtain the familiar probabilistic model by setting a = ln p(xjc 1)p(C 1 ) p(xjc 2 )p(c 2 ) Or we can use generalized linear discriminant models or a = w T x a = w T Á(x) logistic sigmoid function 18 In the following, we will consider models of the form with This model is called logistic regression. Why should we do this? What advantage does such a model have compared to modeling the probabilities? p(ájc 1 )p(c 1 ) p(c 1 já) = p(ájc 1 )p(c 1 ) + p(ájc 2 )p(c 2 ) Any ideas? p(c 1 já) = y(á) = ¾(w T Á) p(c 2 já) = 1 p(c 1 já) 19 Comparison Logistic Sigmoid Let s look at the number of parameters Assume we have an M-dimensional feature space Á. And assume we represent p(á C k ) and p(c k ) by Gaussians. How many parameters do we need? For the means: 2M For the covariances: M(M+1)/2 Together with the class priors, this gives M(M+5)/2+1 parameters! How many parameters do we need for logistic regression? p(c 1 já) = y(á) = ¾(w T Á) Properties Definition: Inverse: ¾(a) = Symmetry property: exp( a) µ ¾ a = ln 1 ¾ ¾( a) = 1 ¾(a) logit function Just the values of w M parameters. For large M, logistic regression has clear advantages! Derivative: d¾ = ¾(1 ¾) da Logistic Regression Gradient of the Error Function y n = ¾(w T Á n ) Let s consider a data set {Á n,t n } with n = 1,,N, where Á n = Á(x n ) and t n 2 f0; 1g, t = (t 1 ; : : : ; t N ) T. With y n = p(c 1 Á n ), we can write the likelihood as NY p(tjw) = yn tn f1 y ng 1 tn Define the error function as the negative log-likelihood E(w) = ln p(tjw) = ft n ln y n + (1 t n ) ln(1 y n )g This is the so-called cross-entropy error function. 22 dy n Error function dw = y n(1 y n )Á n E(w) = ft n ln y n + (1 t n ) ln(1 y n )g Gradient ( d dw re(w) = t y d n dw n + (1 t n ) (1 y ) n) y n (1 y n ) ½ y n (1 y n ) = t n Á y n (1 t n ) y ¾ n(1 y n ) n (1 y n ) Á n = f(t n t n y n y n + t n y n )Á n g = (y n t n )Á n 23 3

4 Gradient of the Error Function A More Efficient Iterative Method Gradient for logistic regression re(w) = (y n t n )Á n Second-order Newton-Raphson gradient descent scheme H 1 re(w) where H = rre(w) is the Hessian matrix, i.e. the matrix of second derivatives. Does this look familiar to you? This is the same result as for the Delta (=LMS) rule (y k(x n ; w) t kn )Á j (x n ) Properties Local quadratic approximation to the log-likelihood. Faster convergence. We can use this to derive a sequential estimation algorithm. However, this will be quite slow Newton-Raphson for Least-Squares Estimation Newton-Raphson for Logistic Regression Let s first apply Newton-Raphson to the least-squares error function: E(w) = 1 w T 2 Á 2 n t n re(w) = H = rre(w) = w T Á n t n Án = T w T t 2 Á n Á T n = T where 6 = 4 Resulting update scheme: ( T ) 1 ( T w ( ) T t) Á T 1.. Á T N = ( T ) 1 T t Closed-form solution! Now, let s try Newton-Raphson on the cross-entropy error function: E(w) = ft n ln y n + (1 t n ) ln(1 y n )g re(w) = H = rre(w) = dy n dw = yn(1 yn)á n (y n t n )Á n = T (y t) y n (1 y n )Á n Á T n = T R where R is an N N diagonal matrix with R nn = y n (1 y n ). The Hessian is no longer constant, but depends on w through the weighting matrix R. 27 Iteratively Reweighted Least Squares Summary: Logistic Regression Update equations ( T R ) 1 T (y t) o = ( T R ) n 1 T R w ( ) T (y t) = ( T R ) 1 T Rz with z = w ( ) R 1 (y t) Again very similar form (normal equations) But now with non-constant weighing matrix R (depends on w). Need to apply normal equations iteratively. Iteratively Reweighted Least-Squares (IRLS) 28 Properties Directly represent posterior distribution p(á C k ) Requires fewer parameters than modeling the likelihood + prior. Very often used in statistics. It can be shown that the cross-entropy error function is concave Optimization leads to unique minimum But no closed-form solution exists Iterative optimization (IRLS) Both online and batch optimizations exist There is a multi-class version described in (Bishop Ch.4.3.4). Caveat Logistic regression tends to systematically overestimate odds ratios when the sample size is less than ~

5 Topics of This Lecture A Note on Error Functions Recap: Generalized Linear Discriminants Logistic Regression Probabilistic discriminative models Logistic sigmoid (logit function) Cross-entropy error Gradient descent Iteratively Reweighted Least Squares Note on error functions Statistical Learning Theory Generalization and overfitting Empirical and actual risk VC dimension Empirical Risk Minimization Structural Risk Minimization 30 Not differentiable! Ideal misclassification error function (black) This is what we want to approximate, Unfortunately, it is not differentiable. The gradient is zero for misclassified points. Ideal misclassification error z n = t n y(x n ) We cannot minimize it by gradient descent. 31 Image source: Bishop, 2006 A Note on Error Functions A Note on Error Functions Ideal misclassification error Squared error Ideal misclassification error Squared error Squared error (sigmoid) Sensitive to outliers! But: Zero gradient here! Penalizes too correct data points! Sensitivity to outliers fixed! Too correct data points fixed! z n = t n y(x n ) z n = t n y(x n ) Squared error used in Least-Squares Classification Very popular, leads to closed-form solutions. However, sensitive to outliers due to squared penalty. Penalizes too correct data points Generally does not lead to good classifiers. 32 Image source: Bishop, 2006 Squared error with sigmoid activation function (tanh) Fixes the problems with outliers and too correct data points. But: zero gradient for confidently misclassified data points. Will give better performance than original squared error, but still does not fix all problems. 33 Image source: Bishop, 2006 A Note on Error Functions Topics of This Lecture Robust to outliers! Ideal misclassification error Squared error Cross-entropy error Recap: Generalized Linear Discriminants Logistic Regression Probabilistic discriminative models Logistic sigmoid (logit function) Cross-entropy error Gradient descent Iteratively Reweighted Least Squares Cross-Entropy Error Minimizer of this error is given by posterior class probabilities. Concave error function, unique minimum exists. Robust to outliers, error increases only roughly linearly z n = t n y(x n ) But no closed-form solution, requires iterative estimation. 34 Image source: Bishop, 2006 Note on error functions Statistical Learning Theory Generalization and overfitting Empirical and actual risk VC dimension Empirical Risk Minimization Structural Risk Minimization 36 5

6 Generalization and Overfitting Example: Linearly Separable Data Goal: predict class labels of new observations Train classification model on limited training set. The further we optimize the model parameters, the more the training error will decrease. However, at some point the test error will go up again. Overfitting to the training set! test error training error 37 Image source: B. Schiele Overfitting is often a problem with linearly separable data Which of the many possible decision boundaries is correct? All of them have zero error on the training set However, they will most likely result in different predictions on novel test data. Different generalization performance How to select the classifier with the best generalization performance???? 38 A Broader View on Statistical Learning Statistical Learning Theory Formal treatment: Statistical Learning Theory Supervised learning (from the learning machine s view) Supervised learning Environment: assumed stationary. I.e. the data x have an unknown but fixed probability density p X (x) Selection of a specific function f(x; ) Given: training examples f(x i ; y i )g N i=1 Goal: the desired response y shall be approximated optimally. Teacher: specifies for each data point x the desired classification y (where y may be subject to noise). y = g(x; º) with noise º Measuring the optimality Loss function L(y; f(x; )) Learning machine: represented by class of functions, which produce for each x an output y: y = f(x; ) with parameters Example: quadratic loss L(y; f(x; )) = (y f(x; )) Risk Risk Measuring the optimality Measure the optimality by the risk (= expected loss). Difficulty: how should the risk be estimated? However, what we re really interested in is Actual risk (= ZExpected risk) R( ) = L(y; f(x; ))dp X;Y (x; y) Practical way Empirical risk (measured on the training/validation set) R emp ( ) = 1 N L(y i ; f(x i ; )) i=1 P X;Y (x; y) is the probability distribution of (x,y). P X;Y (x; y) is fixed, but typically unknown. In general, we can t compute the actual risk directly! Example: quadratic loss function R emp ( ) = 1 N (y i f(x i ; )) 2 i=1 The expected risk is the expectation of the error on all data. I.e., it is the expected value of the generalization error

7 Summary: Risk Actual risk Advantage: measure for the generalization ability Disadvantage: in general, we don t know P X;Y (x; y) Empirical risk Disadvantage: no direct measure of the generalization ability Advantage: does not depend on P X;Y (x; y) We typically know learning algorithms which minimize the empirical risk. Strong interest in connection between both types of risk 45 Statistical Learning Theory Idea Compute an upper bound on the actual risk based on the empirical risk where N: number of training examples p * : probability that the bound is correct h: capacity of the learning machine ( VC-dimension ) Side note: R( ) R emp ( ) + ²(N; p ; h) (This idea of specifying a bound that only holds with a certain probability is explored in a branch of learning theory called Probably Approximately Correct or PAC Learning). 48 VC Dimension Vapnik-Chervonenkis dimension Measure for the capacity of a learning machine. VC Dimension Interpretation as a two-player game Opponent s turn: He says a number N. Formal definition: If a given set of ` points can be labeled in all possible 2` ways, and for each labeling, a member of the set {f( )} can be found which correctly assigns those labels, we say that the set of points is shattered by the set of functions. The VC dimension for the set of functions {f( )} is defined as the maximum number of training points that can be shattered by {f( )}. Our turn: We specify a set of N points {x 1,,x N }. Opponent s turn: He gives us a labeling {x 1,,x N }2 {0,1} N Our turn: We specify a function f( ) which correctly classifies all N points. If we can do that for all 2 N possible labelings, then the VC dimension is at least N VC Dimension VC Dimension Example The VC dimension of all oriented lines in R 2 is Shattering 3 points with an oriented line: Intuitive feeling (unfortunately wrong) The VC dimension has a direct connection with the number of parameters. Counterexample f(x; ) = g(sin( x)) ( 1; x > 0 g(x) = 1; x 0 2. More difficult to show: it is not possible to shatter 4 points (XOR) More general: the VC dimension of all hyperplanes in R n is n Image source: C. Burges, 1998 Just a single parameter. Infinite VC dimension Proof: Choose x i = 10 i ; i = 1; : : : ; ` Ã! `X (1 y i )10 i = ¼ i=1 52 7

8 Upper Bound on the Risk Upper Bound on the Risk Important result (Vapnik 1979, 1995) r h(log(2n=h) + 1) log( =4) With probability (1- ), the following bound holds R( ) R emp ( ) + N This bound is independent of P X;Y (x; y)! VC confidence Typically, we cannot compute the left-hand side (the actual risk) If we know h (the VC dimension), we can however easily compute the risk bound R( ) R emp ( ) + ²(N; p ; h) ²(N; p ; h) R emp ( ) Structural Risk Minimization References and Further Reading How can we implement this? R( ) R emp ( ) + ²(N; p ; h) More information on SVMs can be found in Chapter 7.1 of Bishop s book. Classic approach ²(N; p ; h) R emp ( ) Keep constant and minimize. ²(N; p ; h) be kept constant by controlling the model can parameters. Christopher M. Bishop Pattern Recognition and Machine Learning Springer, 2006 Support Vector Machines (SVMs) R emp ( ) ²(N; p ; h) Keep constant and minimize. In fact: R emp ( ) = 0 for separable data. ²(N; p ; h) Control by adapting the VC dimension (controlling the capacity of the classifier). 56 Additional information about Statistical Learning Theory and a more in-depth introduction to SVMs are available in the following tutorial: C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, Vol. 2(2), pp

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 12 Neural Networks 10.12.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Lecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants

Lecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants Advanced Machine Learning Lecture 2 Neural Networks 24..206 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Talk Announcement Yann LeCun (NYU & FaceBook AI) 28..

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 12 Neural Networks 24.11.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Talk Announcement Yann LeCun (NYU & FaceBook AI)

More information

Machine Learning Lecture 10

Machine Learning Lecture 10 Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Iterative Reweighted Least Squares

Iterative Reweighted Least Squares Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative

More information

Linear Models for Classification

Linear Models for Classification Catherine Lee Anderson figures courtesy of Christopher M. Bishop Department of Computer Science University of Nebraska at Lincoln CSCE 970: Pattern Recognition and Machine Learning Congradulations!!!!

More information

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Bayesian Logistic Regression

Bayesian Logistic Regression Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. 7. Logistic and Linear Regression Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve

More information

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 9, 2018 Content 1 2 3 4 Outline 1 2 3 4 problems { C 1, y(x) threshold predict(x) = C 2, y(x) < threshold, with threshold

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input

More information

Machine Learning Lecture 12

Machine Learning Lecture 12 Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Overview Motivation

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Lecture Machine Learning (past summer semester) If at least one English-speaking student is present.

Lecture Machine Learning (past summer semester) If at least one English-speaking student is present. Advanced Machine Learning Lecture 1 Organization Lecturer Prof. Bastian Leibe (leibe@vision.rwth-aachen.de) Introduction 20.10.2015 Bastian Leibe Visual Computing Institute RWTH Aachen University http://www.vision.rwth-aachen.de/

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

PDEEC Machine Learning 2016/17

PDEEC Machine Learning 2016/17 PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Machine Learning Lecture 1

Machine Learning Lecture 1 Many slides adapted from B. Schiele Machine Learning Lecture 1 Introduction 18.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Organization Lecturer Prof.

More information

Multiclass Logistic Regression

Multiclass Logistic Regression Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

More information

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.

More information

Today. Calculus. Linear Regression. Lagrange Multipliers

Today. Calculus. Linear Regression. Lagrange Multipliers Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject

More information

Machine Learning Lecture 14

Machine Learning Lecture 14 Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

(Kernels +) Support Vector Machines

(Kernels +) Support Vector Machines (Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science Neural Networks Prof. Dr. Rudolf Kruse Computational Intelligence Group Faculty for Computer Science kruse@iws.cs.uni-magdeburg.de Rudolf Kruse Neural Networks 1 Supervised Learning / Support Vector Machines

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

ECE-271B. Nuno Vasconcelos ECE Department, UCSD

ECE-271B. Nuno Vasconcelos ECE Department, UCSD ECE-271B Statistical ti ti Learning II Nuno Vasconcelos ECE Department, UCSD The course the course is a graduate level course in statistical learning in SLI we covered the foundations of Bayesian or generative

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Multivariate statistical methods and data mining in particle physics Lecture 4 (19 June, 2008)

Multivariate statistical methods and data mining in particle physics Lecture 4 (19 June, 2008) Multivariate statistical methods and data mining in particle physics Lecture 4 (19 June, 2008) RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Learning from Data Logistic Regression

Learning from Data Logistic Regression Learning from Data Logistic Regression Copyright David Barber 2-24. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 2.9.8.7.6.5.4.3.2...2.3.4.5.6.7.8.9

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland SUPPORT VECTOR MACHINES Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Machine Learning: Jordan

More information

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How

More information