LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
|
|
- Dorthy Rice
- 6 years ago
- Views:
Transcription
1 LINEAR CLASSIFIERS
2 Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input variable x may still be continuous, but the target variable is discrete. In the simplest case, t can have only 2 values. e.g., Let t = +1 assigned to C 1 t = 1 assigned to C 2
3 Example Problem 3 Animal or Vegetable?
4 Linear Models for Classification 4 Linear models for classification separate input vectors into classes using linear (hyperplane) decision boundaries. Example: 2D Input vector x Two discrete classes C 1 and C x 2!2!4!6!8!4! x 1
5 Two Class Discriminant Function 5 y>0 y =0 y<0 x 2 R 2 R 1 y(x) = w t x +w 0 y(x)! 0 " x assigned to C 1 w x y(x) w y(x) < 0 " x assigned to C 2 Thus y(x) = 0 defines the decision boundary x w 0 w x 1
6 Two-Class Discriminant Function 6 y(x) = w t x +w 0 y(x)! 0 " x assigned to C 1 y(x) < 0 " x assigned to C 2 y>0 y =0 y<0 x 2 R 2 R 1 For convenience, let w =! w 1 w # " M $t %! w 0 w 1 w # " M $ and t w x x y(x) w x =! x 1 x # " M $t %! 1 x1 x # " M $ t w 0 w x 1 So we can express y(x) = w t x
7 Generalized Linear Models 7 For classification problems, we want y to be a predictor of t. In other words, we wish to map the input vector into one of a number of discrete classes, or to posterior probabilities that lie between 0 and 1. For this purpose, it is useful to elaborate the linear model by introducing a nonlinear activation function f, which typically will constrain y to lie between -1 and 1 or between 0 and 1. y(x) = f ( w t x + w ) 0
8 The Perceptron 8 y(x) = f ( w t x + w ) 0 y(x)! 0 " x assigned to C 1 y(x) < 0 " x assigned to C 2 A classifier based upon this simple generalized linear model is called a (single layer) perceptron. It can also be identified with an abstracted model of a neuron called the McCulloch Pitts model.
9 Parameter Learning 9 How do we learn the parameters of a perceptron?
10 Outline 10 The Perceptron Algorithm Least-Squares Classifiers Fisher s Linear Discriminant Logistic Classifiers Support Vector Machines
11 Case 1. Linearly Separable Inputs 11 For starters, let s assume that the training data is in fact perfectly linearly separable. In other words, there exists at least one hyperplane (one set of weights) that yields 0 classification error. We seek an algorithm that can automatically find such a hyperplane.
12 The Perceptron Algorithm 12 The perceptron algorithm was invented by Frank Rosenblatt (1962). The algorithm is iterative. The strategy is to start with a random guess at the weights w, and to then iteratively change the weights to move the hyperplane in a direction that lowers the classification error. Frank Rosenblatt ( )
13 The Perceptron Algorithm 13 Note that as we change the weights continuously, the classification error changes in discontinuous, piecewise constant fashion. Thus we cannot use the classification error per se as our objective function to minimize. What would be a better objective function?
14 The Perceptron Criterion 14 Note that we seek w such that w t x! 0 when t = +1 w t x < 0 when t = "1 In other words, we would like w t x n t n! 0 "n Thus we seek to minimize E P ( w) =! w t x n t n # n"m where M is the set of misclassified inputs.
15 The Perceptron Criterion 15 E P ( w) =! w t x n t n # n"m where M is the set of misclassified inputs. Observations: E P (w) is always non-negative. E P (w) is continuous and piecewise linear, and thus easier to minimize. E P ( w) w i
16 The Perceptron Algorithm 16 E P ( w) =! w t x n t n # n"m where M is the set of misclassified inputs. ( ) de P w dw =! # x t n n n"m where the derivative exists. Gradient descent: E P ( w) w! +1 = w! " #$E P ( w) = w! + # x n t n & n%m w i
17 The Perceptron Algorithm 17 w! +1 = w! " #$E P ( w) = w t + # x n t n & n%m Why does this make sense? If an input from C 1 (t = +1) is misclassified, we need to make its projection on w more positive. If an input from C 2 (t = -1) is misclassified, we need to make its projection on w more negative.
18 The Perceptron Algorithm 18 The algorithm can be implemented sequentially: Repeat until convergence: For each input (x n, t n ): If it is correctly classified, do nothing If it is misclassified, update the weight vector to be w! +1 = w! + "x n t n Note that this will lower the contribution of input n to the objective function: ( ) t (" +1) x n t n #!( w ) t x n t n =!( w ) (" ) t x n t n! $ x n t n! w (" ) ( ) t x n t n <! w (" ) ( ) t x n t n.
19 Not Monotonic 19 While updating with respect to a misclassified input n will lower the error for that input, the error for other misclassified inputs may increase. Also, new inputs that had been classified correctly may now be misclassified. The result is that the perceptron algorithm is not guaranteed to reduce the total error monotonically at each stage.
20 The Perceptron Convergence Theorem 20 Despite this non-monotonicity, if in fact the data are linearly separable, then the algorithm is guaranteed to find an exact solution in a finite number of steps (Rosenblatt, 1962).
21 Example
22 The First Learning Machine 22 Mark 1 Perceptron Hardware (c. 1960) Visual Inputs Patch board allowing configuration of inputs Rack of adaptive weights w (motor-driven potentiometers)
23 Practical Limitations 23 The Perceptron Convergence Theorem is an important result. However, there are practical limitations: Convergence may be slow If the data are not separable, the algorithm will not converge. We will only know that the data are separable once the algorithm converges. The solution is in general not unique, and will depend upon initialization, scheduling of input vectors, and the learning rate.
24 Generalization to inputs that are not linearly separable. 24 The single-layer perceptron can be generalized to yield good linear solutions to problems that are not linearly separable. Example: The Pocket Algorithm (Gal 1990) Idea: Run the perceptron algorithm Keep track of the weight vector w* that has produced the best classification error achieved so far. It can be shown that w* will converge to an optimal solution with probability 1.
25 Generalization to Multiclass Problems 25 How can we use perceptrons, or linear classifiers in general, to classify inputs when there are K > 2 classes?
26 K>2 Classes 26 Idea #1: Just use K-1 discriminant functions, each of which separates one class C k from the rest. (Oneversus-the-rest classifier.) Problem: Ambiguous regions? R 1 R 2 C 1 R 3 not C 1 C 2 not C 2
27 K>2 Classes 27 Idea #2: Use K(K-1)/2 discriminant functions, each of which separates two classes C j, C k from each other. (One-versus-one classifier.) Each point classified by majority vote. Problem: Ambiguous regions C 1 C 3 R 1 C 1? R 3 C 2 R 2 C 3 C 2
28 K>2 Classes 28 Idea #3: Use K discriminant functions y k (x) Use the magnitude of y k (x), not just the sign. y k (x) = w k t x x assigned to C k if y k (x) > y j (x)!j " k Decision boundary y k (x) = y j (x)! ( w k " w j ) t x + ( w k 0 " w j 0 ) = 0 Results in decision regions that are simply-connected and convex. R i R j R k x B x A x
29 Example: Kesler s Construction 29 The perceptron algorithm can be generalized to K- class classification problems. Example: Kesler s Construction: Allows use of the perceptron algorithm to simultaneously learn K separate weight vectors w i. Inputs are then classified in Class i if and only if w t i x > w t j x!j " i The algorithm will converge to an optimal solution if a solution exists, i.e., if all training vectors can be correctly classified according to this rule.
30 1-of-K Coding Scheme 30 When there are K>2 classes, target variables can be coded using the 1-of-K coding scheme: Input from Class C i! t = [ ] t Element i
31 Computational Limitations of Perceptrons 31 Initially, the perceptron was thought to be a potentially powerful learning machine that could model human neural processing. However, Minsky & Papert (1969) showed that the singlelayer perceptron could not learn a simple XOR function. This is just one example of a non-linearly separable pattern that cannot be learned by a single-layer perceptron. x 2 x 1 Marvin Minsky ( )
32 Multi-Layer Perceptrons 32 Minsky & Papert s book was widely misinterpreted as showing that artificial neural networks were inherently limited. This contributed to a decline in the reputation of neural network research through the 70s and 80s. However, their findings apply only to single-layer perceptrons. Multilayer perceptrons are capable of learning highly nonlinear functions, and are used in many practical applications.
33 End of Lecture 11
34 Outline 34 The Perceptron Algorithm Least-Squares Classifiers Fisher s Linear Discriminant Logistic Classifiers Support Vector Machines
35 Dealing with Non-Linearly Separable Inputs 35 The perceptron algorithm fails when the training data are not perfectly linearly separable. Let s now turn to methods for learning the parameter vector w of a perceptron (linear classifier) even when the training data are not linearly separable.
36 The Least Squares Method 36 In the least squares method, we simply fit the (x, t) observations with a hyperplane y(x). Note that this is kind of a weird idea, since the t values are binary (when K=2), e.g., 0 or 1. However it can work pretty well.
37 Least Squares: Learning the Parameters 37 Assume D! dimensional input vectors x. For each class k!1 K : y k (x) = w k t x + w k 0! y(x) = W! t!x where!x = (1,x t ) t ( ) t!w is a (D + 1)! K matrix whose kth column is!w k = w 0,w k t
38 Learning the Parameters 38 Method #2: Least Squares y(x) =! W t!x Training dataset ( x n,t n ), n = 1,,N where we use the 1-of-K coding scheme for t n Let T be the N! K matrix whose n th row is t n t Let! X be the N! (D + 1) matrix whose n th row is!x n t Let R D Setting derivative wrt! W to 0 yields: ( )!1! X t T =! X T!W =! X t! X ( W! ) = X! W!! T Then we define the error as E D ( W! ) = 1 2 2! R ij = 1 i,j 2 Tr R D { (!W ) t R! D ( W )}
39 Outline 39 The Perceptron Algorithm Least-Squares Classifiers Fisher s Linear Discriminant Logistic Classifiers Support Vector Machines
40 Fisher s Linear Discriminant 40 Another way to view linear discriminants: find the 1D subspace that maximizes the separation between the two classes. Let m 1 = 1 " x N n, m 2 = 1 " x 1 N n 2 n!c 1 n!c 2 For example, might choose w to maximize w t ( m 2! m 1 ), subject to w = 1 This leads to w! m 2 " m However, if conditional distributions are not isotropic, this is typically not optimal. 0!2!2 2 6
41 Fisher s Linear Discriminant 41 Let m 1 = w t m 1, m 2 = w t m 2 be the conditional means on the 1D subspace. Let s k 2 = # ( y n! m k ) 2 be the within-class variance on the subspace for class C k n"c k ( The Fisher criterion is then J(w) = m! m 2 1) 2 s s 2 This can be rewritten as J(w) = w t S B w w t S W w where S B = m 2! m 1 and ( )( m 2! m 1 ) t is the between-class variance S W = # x n! m 1 + # x n! m 2 is the within-class variance n"c 1 ( )( x n! m 1 ) t n"c 2 J(w) is maximized for w! S "1 W ( m 2 " m 1 ) ( )( x n! m 2 ) t 4 2 0!2!2 2 6
42 42 Connection between Least-Squares and FLD Change coding scheme used in least-squares method to t n = N N 1 for C 1 t n =! N N 2 for C 2 Then one can show that the ML w satisfies ( ) w! S W "1 m 2 " m 1
43 Least Squares Classifier 43 Problem #1: Sensitivity to outliers !2!2!4!4!6!6!8!4! !8!4!
44 Least Squares Classifier 44 Problem #2: Linear activation function is not a good fit to binary data. This can lead to problems !2!4!6!6!4!
45 Outline 45 The Perceptron Algorithm Least-Squares Classifiers Fisher s Linear Discriminant Logistic Classifiers Support Vector Machines
46 Logistic Regression (K = 2) 46 ( ) p( C 1! ) = y (!) = " w t! p( C 2! ) = 1# p( C 1! ) p ( C 1! ) = y (!) = " ( w t!) where! (a) =!" 1 1+ exp("a) w t! x 1
47 Logistic Regression 47 ( ) = y (!) = " w t! ( ) = 1# p ( C 1! ) p C 1! p C 2! where! (a) = 1 1+ exp("a) ( ) Number of parameters Logistic regression: M Gaussian model: 2M + 2M(M+1)/2 + 1 = M 2 +3M+1
48 ML for Logistic Regression 48 ( ) t = y n { n 1! y } 1!t n n p t w N " where t = t 1,,t N n=1 ( ) t and y n = p ( C 1! ) n We define the error function to be E(w) =! log p ( t w) Given y n =! ( a ) n and a n = w t " n, one can show that!e(w) = N $ ( y n " t n )# n n=1 Unfortunately, there is no closed form solution for w.
49 End of Lecture 12
50 ML for Logistic Regression: 50 Iterative Reweighted Least Squares Although there is no closed form solution for the ML estimate of w, fortunately, the error function is convex. Thus an appropriate iterative method is guaranteed to find the exact solution. A good method is to use a local quadratic approximation to the log likelihood function (Newton- Raphson update): w (new) = w (old )! H!1 "E(w) where H is the Hessian matrix of E(w)
51 ML for Logistic Regression 51 w (new) = w (old )! H!1 "E(w) where H is the Hessian matrix of E(w) : H =! t R! where R is the N " N diagonal weight matrix with R nn = y n ( 1# y n ) (Note that, since R nn! 0, R is positive semi-definite, and hence H is positive semi-definite Thus E(w) is convex.) Thus ( ) w new = w (old )! (" t R" )!1 " t y! t
52 ML for Logistic Regression 52 Iterative Reweighted Least Squares ( ) = y (!) = " w t! p C 1! ( )!! # " w 1 " # w 2 w t!
53 Logistic Regression 53 For K>2, we can generalize the activation function by modeling the posterior probabilities as p( C k! ) = y k (!) = exp a k " ( ) ( ) exp a j where the activations a k are given by a k = w k t! j
54 Example !2 2!4 4!6!6!4! Least-Squares Logistic
55 Outline 55 The Perceptron Algorithm Least-Squares Classifiers Fisher s Linear Discriminant Logistic Classifiers Support Vector Machines
56 Motivation 56 The perceptron algorithm is guaranteed to provide a linear decision surface that separates the training data, if one exists. However, if the data are linearly separable, there are in general an infinite number of solutions, and the solution returned by the perceptron algorithm depends in a complex way on the initial conditions, the learning rate and the order in which training data are processed. While all solutions achieve a perfect score on the training data, they won t all necessarily generalize as well to new inputs.
57 Which solution would you choose? 57
58 The Large Margin Classifier 58 Unlike the Perceptron Algorithm, Support Vector Machines solve a problem that has a unique solution: they return the linear classifier with the maximum margin, that is, the hyperplane that separates the data and is farthest from any of the training vectors. Why is this good?
59 Support Vector Machines 59 SVMs are based on the linear model y(x) = w t!(x) + b Assume training data x 1,,x N with corresponding target values t 1,,t N, t n!{"1,1}. x classified according to sign of y(x). Assume for the moment that the training data are linearly separable in feature space. Then!w,b : t n y ( x n ) > 0 "n #[1, N]
60 Maximum Margin Classifiers 60 When the training data are linearly separable, there are generally an infinite number of solutions for (w, b) that separate the classes exactly. The margin of such a classifier is defined as the orthogonal distance in feature space between the decision boundary and the closest training vector. SVMs are an example of a maximum margin classifer, which finds the linear classifier that maximizes the margin. y =1 y =0 y = 1 margin
61 Probabilistic Motivation 61 The maximum margin classifier has a probabilistic motivation. If we model the class-conditional densities with a KDE using Gaussian kernels with variance! 2, then in the limit as! " 0, the optimal linear decision boundary " maximum margin linear classifier. y =1 y =0 y = 1 margin
62 Two Class Discriminant Function 62 Recall: y>0 y =0 y<0 x 2 R 2 R 1 y(x) = w t x +w 0 y(x)! 0 " x assigned to C 1 w x y(x) w y(x) < 0 " x assigned to C 2 Thus y(x) = 0 defines the decision boundary x w 0 w x 1
63 Maximum Margin Classifiers 63 Distance of point x n from decision surface is given by: t n y x ( t n! ( x n ) + b) ( ) w = t n w w Thus we seek: argmax w,b &( ' )( 1 w min n " # t n w t! ( x n ) + b ( ) * $ ( % +,( y =1 y =0 y = 1 margin
64 Maximum Margin Classifiers 64 Distance of point x n from decision surface is given by: t n y x ( t n! ( x n ) + b) ( ) w = t n w w Note that rescaling w and b by the same factor leaves the distance to the decision surface unchanged. Thus, wlog, we consider only solutions that satisfy: t n ( w t! ( x n ) + b) = 1. for the point x n that is closest to the decision surface. y =1 y =0 y = 1 margin
65 Quadratic Programming Problem 65 Then all points x n satisfy t n ( w t! ( x n ) + b) " 1 Points for which equality holds are said to be active. All other points are inactive. Now argmax w,b &( ' )( argmin w 2 1 w min n " # t n w t! ( x n ) + b ( ) * $ ( % +,( y =1 y =0 y = 1 w Subject to t n ( w t! ( x n ) + b). 1 /x n This is a quadratic programming problem. margin Solving this problem will involve Lagrange multipliers.
66 Lagrange Multipliers
67 Lagrange Multipliers (Appendix C.4 in textbook) 67 Used to find stationary points of a function subject to one or more constraints. Example (equality constraint): f(x) Maximize f ( x) subject to g( x) = 0. g(x) x A Observations: g(x) =0 Joseph-Louis Lagrange At any point on the constraint surface,!g( x) must be orthogonal to the surface. 2. Let x * be a point on the constraint surface where f (x) is maximized. Then!f (x) is also orthogonal to the constraint surface. 3.! "# $ 0 such that %f (x) + #%g(x) = 0 at x *. # is called a Lagrange multiplier.
68 Lagrange Multipliers (Appendix C.4 in textbook) 68!" # 0 such that $f (x) + "$g(x) = 0 at x *. Defining the Lagrangian function as: ( ) = f (x) +!g(x) L x,! we then have and ( ) = 0.! x L x," ( )!"!L x," = 0. g(x) x A f(x) g(x) =0
69 Example 69 L( x,! ) = f (x) +!g(x) x 2 Find the stationary point of ( ) = 1! x 1 2! x 2 2 f x 1,x 2 (x 1,x 2) x 1 subject to g(x 1,x 2 )=0 g( x 1,x ) 2 = x 1 + x 2! 1= 0
70 End of Lecture 13
71 71 Inequality Constraints Maximize f ( x) subject to g( x)! 0. There are 2 cases: 1. x* on the interior (e.g., x B ) g(x) x A f(x) Here g(x) > 0 and the stationary condition is simply This corresponds to a stationary point of the Lagrangian where = x* on the boundary (e.g., x A )!f (x) = 0. Here g(x) = 0 and the stationary condition is!f (x) = "#!g(x), # > 0. This corresponds to a stationary point of the Lagrangian where > 0. g(x) > 0 x B g(x) =0 L( x,! ) = f (x) +!g(x) Thus the general problem can be expressed as maximizing the Lagrangian subject to 1. g(x)! 0 2. "! 0 Karush-Kuhn-Tucker (KKT) conditions 3. "g(x) = 0
72 Minimizing vs Maximizing 72 If we want to minimize f(x) subject to g(x) 0, then the Lagrangian becomes L( x,! ) = f (x) "!g(x) with! # 0.
73 Extension to Multiple Constraints 73 Suppose we wish to maximize f(x) subject to g j (x) = 0 for j = 1,,J h k (x)! 0 for k = 1,,K We then find the stationary points of J ( ) = f (x) +! j g j (x) L x,! subject to h k (x)! 0 µ k! 0 µ k h k (x) = 0 " + " µ k h k (x) j=1 K k=1
74 Back to our quadratic programming problem
75 Quadratic Programming Problem argmin w 2 w, subject to t ( n w t! ( x n ) + b) " 1 #x n Solve using Lagrange multipliers a n : L(w,b,a) = 1 2 w 2! a n t n w t " x n N # n=1 ( ( ) + b)! 1 { } y =1 y =0 y = 1 margin
76 Dual Representation 76 Solve using Lagrange multipliers a n : L(w,b,a) = 1 2 w 2! a n t n w t " x n N # n=1 ( ( ) + b)! 1 { } Setting derivatives with respect to w and b to 0, we get: N " w = a n t n!(x n ) N n=1 " a n t n = 0 n=1 y =1 y =0 y = 1 margin
77 Dual Representation 77 L(w,b,a) = 1 2 w 2! a n t n w t " x n N # n=1 ( ( ) + b)! 1 { } w = a n t n!(x n ) N N " n=1 " a n t n = 0 n=1 Substituting leads to the dual representation of the maximum margin problem, in which we maximize: N ( ) = a n!l a! " 1 2 n=1 N N!! n=1 m=1 with respect to a, subject to: a n # 0 $n N! a n t n = 0 n=1 and where k x, x% ( ) = &(x) t &( % a n a m t n t m k ( x n,x m ) x ) margin y =1 y =0 y = 1
78 Dual Representation 78 N Using w = " a n t n!(x n ), a new point x is classified by computing N n=1 y(x) = " a n t n k(x,x n ) + b n=1 The Karush-Kuhn-Tucker (KKT) conditions for this constrained optimization problem are: a n! 0 ( ) " 1! 0 t n y ( x n ) " 1 t n y x n a n { } = 0 ( ) = 1. Thus for every data point, either a n = 0 or t n y x n support vectors y = 1 y =0 y =1
79 Solving for the Bias 79 Once the optimal a is determined, the bias b can be computed by noting that any support vector x n satisfies t n y x n Using y(x) =! a n t n k(x,x n ) + b ( ) = 1. A more numerically accurate solution can be obtained by averaging over all support vectors: b = 1 $ ' # t N n! # a m t m k(x n,x m ) S % & ( ) n"s N n=1 N " % we have t n! a m t m k(x n,x m ) + b # $ & ' = 1 m=1 and so b = t n! a m t m k(x n,x m ) N " m=1 m"s where S is the index set of support vectors and N S is the number of support vectors.
80 Example (Gaussian Kernel) 80 Input Space x 2 x 1
81 Overlapping Class Distributions 81 The SVM for non-overlapping class distributions is determined by solving 1 2 argmin w 2 w N # E! y x n + $ w 2 n=1, subject to t n ( w t! ( x n ) + b) " 1 #x n Alternatively, this can be expressed as the minimization of ( ( )t n " 1) where E! (z) is 0 if z % 0, and! otherwise. ξ > 1 y = 1 y =0 y =1 This forces all points to lie on or outside the margins, on the correct side for their class. To allow for misclassified points, we have to relax this E! term. ξ < 1 ξ =0 ξ =0
82 Slack Variables 82 To this end, we introduce N slack variables! n " 0, n = 1, N.! n = 0 for points on or on the correct side of the margin boundary for their class! n = t n " y ( x n ) for all other points. Thus! n < 1 for points that are correctly classified! n > 1 for points that are incorrectly classified We now minimize C "! n w 2, where C > 0. subject to t n y x n N n=1 ( )! 1" # n, and # n! 0, n = 1, N ξ > 1 y = 1 y =0 y =1 ξ < 1 ξ =0 ξ =0
83 Dual Representation 83 This leads to a dual representation, where we maximize!l(a) = N! a n " 1 n=1 2 with constraints 0 # a n # C and N! a n t n = 0 n=1 N N!! a n a m t n t m k ( x n,x ) n n=1 m=1 ξ < 1 ξ > 1 y = 1 y =0 ξ =0 y =1 ξ =0
84 Support Vectors 84 Again, a new point x is classified by computing N y(x) =! a n t n k(x,x n ) + b n=1 For points that are on the correct side of the margin, a n = 0. Thus support vectors consist of points between their margin and the decision boundary, as well as misclassified points. In other words, all points that are not on the right side of their margin are support vectors. ξ > 1 y = 1 y =0 y =1 ξ < 1 ξ =0 ξ =0
85 Bias 85 Again, a new point x is classified by computing N y(x) =! a n t n k(x,x n ) + b n=1 Once the optimal a is determined, the bias b can be computed from b = 1 $ ' # t N n! # a m t m k(x n,x m ) M % & ( ) n"m m"s where S is the index set of support vectors N S is the number of support vectors M is the index set of points on the margins N M is the number of points on the margins ξ > 1 ξ < 1 y = 1 y =0 y =1 ξ =0 ξ =0
86 86 Solving the Quadratic Programming Problem Maximize! L(a) = N! a n " 1 n=1 2 Problem is convex. n=1 m=1 a n a m t n t m k ( x n,x m ) Standard solutions are generally O(N 3 ). N!! subject to 0 # a n # C and! a n t n = 0 N N n=1 Traditional quadratic programming techniques often infeasible due to computation and memory requirements. Instead, methods such as sequential minimal optimization can be used, that in practice are found to scale as O(N) - O(N 2 ).
87 Chunking 87 Maximize! L(a) = N! a n " 1 n=1 2 Conventional quadratic programming solution requires that matrices with N 2 elements be maintained in memory. K! O( N 2 ), where K nm = k x n,x m T! O( N 2 ), where T nm = t n t m A! O( N 2 ), where A nm = a n a m N!! n=1 m=1 subject to 0 # a n # C and! a n t n = 0 N N n=1 a n a m t n t m k ( x n,x m ) ( ) This becomes infeasible when N exceeds ~10,000.
88 Chunking 88 Maximize! L(a) = N! a n " 1 n=1 2 N!! n=1 m=1 subject to 0 # a n # C and! a n t n = 0 N N n=1 a n a m t n t m k ( x n,x m ) Chunking (Vapnik, 1982) exploits the fact that the value of the Lagrangian is unchanged if we remove the rows and columns of the kernel matrix where a n = 0 or a m = 0.
89 Chunking 89 N Minimize C "! n w 2, where C > 0. n=1! n = 0 for points on or on the correct side of the margin boundary for their class! n = t n " y ( x n ) for all other points. Chunking (Vapnik, 1982) 1. Select a small number (a chunk ) of training vectors 2. Solve the QP problem for this subset 3. Retain only the support vectors 4. Consider another chunk of the training data 5. Ignore the subset of vectors in all chunks considered so far that lie on the correct side of the margin, since these do not contribute to the cost function 6. Add the remainder to the current set of support vectors and solve the new QP problem 7. Return to Step 4 8. Repeat until the set of support vectors does not change. 2 This method reduces memory requirements to O( N S ), where N S is the number of support vectors. This may still be big!
90 Decomposition Methods 90 It can be shown that the global QP problem is solved when, for all training vectors, satisfy the following optimality conditions: a i = 0! t i y x i ( ) " 1. ( ) = 1. ( ) # 1. 0 < a i < C! t i y x i a i = C! t i y x i Decomposition methods decompose this large QP problem into a series of smaller subproblems. Decomposition (Osuna et al, 1997) Partition the training data into a small working subset B and a fixed subset N. Minimize the global objective function by adjusting the coefficients in B Swap 1 or more vectors in B for an equal number in N that fail to satisfy the optimality conditions Re-solve the global QP problem for B Each step is O(B) 2 in memory. Osuna et al (1997) proved that the objective function decreases on each step and will converge in a finite number of iterations.
91 Sequential Minimal Optimization 91 Sequential Minimal Optimization (Platt 1998) takes decomposition to the limit. On each iteration, the working set consists of just two vectors. The advantage is that in this case, the QP problem can be solved analytically. Memory requirement are O(N). Compute time is typically O(N) O(N 2 ).
92 LIBSVM 92 LIBSVM is a widely used library for SVMs developed by Chang & Lin (2001). Can be downloaded from MATLAB interface Uses SMO Will use for Assignment 2.
93 End of Lecture 14
94 LIBSVM Example: Face Detection 94 Face Preprocess: Subsample & Normalize µ = 0,! 2 = 1 svmtrain Non-Face Preprocess: Subsample & Normalize µ = 0,! 2 = 1
95 LIBSVM Example: MATLAB Interface 95 Selects linear SVM model=svmtrain(traint, trainx, '-t 0'); [predicted_label, accuracy, decision_values] = svmpredict(testt, testx, model); Accuracy = % (661/944) (classification)
96 Example 96 Input Space 2 x 2 0!2!2 0 2 x 1
97 Relation to Logistic Regression 97 The objective function for the soft-margin SVM can be written as: ( ) N! E SV y n t n + " w 2 n=1 where E SV and $% z & ' + = z if z ( 0 = 0 otherwise. ( z) = $% 1# z& '+ is the hinge error function, For t!{"1,1}, the objective function for a regularized version of logistic regression can be written as: ( ) N # E LR y n t n + $ w 2 n=1 where E LR ( z) = log( 1+ exp("z) ). E(z) E LR E SV z
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationPattern Recognition 2018 Support Vector Machines
Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLecture 16: Modern Classification (I) - Separating Hyperplanes
Lecture 16: Modern Classification (I) - Separating Hyperplanes Outline 1 2 Separating Hyperplane Binary SVM for Separable Case Bayes Rule for Binary Problems Consider the simplest case: two classes are
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationConstrained Optimization and Support Vector Machines
Constrained Optimization and Support Vector Machines Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/
More informationSupport Vector Machine
Support Vector Machine Kernel: Kernel is defined as a function returning the inner product between the images of the two arguments k(x 1, x 2 ) = ϕ(x 1 ), ϕ(x 2 ) k(x 1, x 2 ) = k(x 2, x 1 ) modularity-
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationCOMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37
COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationMachine Learning and Data Mining. Support Vector Machines. Kalev Kask
Machine Learning and Data Mining Support Vector Machines Kalev Kask Linear classifiers Which decision boundary is better? Both have zero training error (perfect training accuracy) But, one of them seems
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationSupport Vector Machines
Support Vector Machines Ryan M. Rifkin Google, Inc. 2008 Plan Regularization derivation of SVMs Geometric derivation of SVMs Optimality, Duality and Large Scale SVMs The Regularization Setting (Again)
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport vector machines
Support vector machines Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 SVM, kernel methods and multiclass 1/23 Outline 1 Constrained optimization, Lagrangian duality and KKT 2 Support
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationLECTURE 7 Support vector machines
LECTURE 7 Support vector machines SVMs have been used in a multitude of applications and are one of the most popular machine learning algorithms. We will derive the SVM algorithm from two perspectives:
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationLecture Notes on Support Vector Machine
Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationSupport Vector Machine
Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationLecture 12. Neural Networks Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 12 Neural Networks 10.12.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Andreas Maletti Technische Universität Dresden Fakultät Informatik June 15, 2006 1 The Problem 2 The Basics 3 The Proposed Solution Learning by Machines Learning
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationCOMP 875 Announcements
Announcements Tentative presentation order is out Announcements Tentative presentation order is out Remember: Monday before the week of the presentation you must send me the final paper list (for posting
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationAn introduction to Support Vector Machines
1 An introduction to Support Vector Machines Giorgio Valentini DSI - Dipartimento di Scienze dell Informazione Università degli Studi di Milano e-mail: valenti@dsi.unimi.it 2 Outline Linear classifiers
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find
More informationMachine Learning A Geometric Approach
Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron
More informationNatural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley
Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:
More informationMidterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.
CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationSparse Kernel Machines - SVM
Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Support
More informationSupport Vector Machine for Classification and Regression
Support Vector Machine for Classification and Regression Ahlame Douzal AMA-LIG, Université Joseph Fourier Master 2R - MOSIG (2013) November 25, 2013 Loss function, Separating Hyperplanes, Canonical Hyperplan
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationSupport Vector Machines
Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.
More informationLearning with kernels and SVM
Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationApril 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.
Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 9, 2018 Content 1 2 3 4 Outline 1 2 3 4 problems { C 1, y(x) threshold predict(x) = C 2, y(x) < threshold, with threshold
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationPolyhedral Computation. Linear Classifiers & the SVM
Polyhedral Computation Linear Classifiers & the SVM mcuturi@i.kyoto-u.ac.jp Nov 26 2010 1 Statistical Inference Statistical: useful to study random systems... Mutations, environmental changes etc. life
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationSupport Vector Machine. Industrial AI Lab.
Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationHyperplanes. Alex Sverdlov. A hyperplane is a fancy name for a plane in N-dimensions: this is just a line in 2D, and a
Hyperplanes Alex Sverdlov alex@theparticle.com Hyperplanes A hyperplane is a fancy name for a plane in N-dimensions: this is just a line in D, and a plane in D as illustrated in Figure. 0 0 0 0 0 0 5 0
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More informationNearest Neighbors Methods for Support Vector Machines
Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationA Tutorial on Support Vector Machine
A Tutorial on School of Computing National University of Singapore Contents Theory on Using with Other s Contents Transforming Theory on Using with Other s What is a classifier? A function that maps instances
More information