Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1
|
|
- Sharon Payne
- 5 years ago
- Views:
Transcription
1 Machine Learning 1 Linear Classifiers Marius Kloft Humboldt University of Berlin Summer Term 2014 Machine Learning 1 Linear Classifiers 1
2 Recap Past lectures: Machine Learning 1 Linear Classifiers 2
3 Recap Past lectures: L1 Examples of machine learning applications Machine Learning 1 Linear Classifiers 2
4 Recap Past lectures: L1 Examples of machine learning applications Formalization (Learning machine/algorithm: Learns from inputs x 1,..., x n and labels y 1,..., y n a function (classifier,predictor) predicting the unknown label y of a new input x) Machine Learning 1 Linear Classifiers 2
5 Recap Past lectures: L1 Examples of machine learning applications Formalization (Learning machine/algorithm: Learns from inputs x 1,..., x n and labels y 1,..., y n a function (classifier,predictor) predicting the unknown label y of a new input x) L2 Bayesian decision theory Playing god: What were the optimal decision if we knew everything? Machine Learning 1 Linear Classifiers 2
6 Recap Past lectures: L1 Examples of machine learning applications Formalization (Learning machine/algorithm: Learns from inputs x 1,..., x n and labels y 1,..., y n a function (classifier,predictor) predicting the unknown label y of a new input x) L2 Bayesian decision theory Playing god: What were the optimal decision if we knew everything? Bayes classifier = theoretical optimal classifier Given input x, predict f (x) := arg maxy P(Y = y X = x) Machine Learning 1 Linear Classifiers 2
7 Recap Past lectures: L1 Examples of machine learning applications Formalization (Learning machine/algorithm: Learns from inputs x 1,..., x n and labels y 1,..., y n a function (classifier,predictor) predicting the unknown label y of a new input x) L2 Bayesian decision theory Playing god: What were the optimal decision if we knew everything? Bayes classifier = theoretical optimal classifier Given input x, predict f (x) := arg maxy P(Y = y X = x) By Bayes rule, is equivalent to predicting f (x) := arg max y P(X=x Y=y)P(Y = y) Machine Learning 1 Linear Classifiers 2
8 Recap Past lectures: L1 Examples of machine learning applications Formalization (Learning machine/algorithm: Learns from inputs x 1,..., x n and labels y 1,..., y n a function (classifier,predictor) predicting the unknown label y of a new input x) L2 Bayesian decision theory Playing god: What were the optimal decision if we knew everything? Bayes classifier = theoretical optimal classifier Given input x, predict f (x) := arg maxy P(Y = y X = x) By Bayes rule, is equivalent to predicting f (x) := arg max y P(X=x Y=y)P(Y = y) L3 Gaussian Model: data comes from two Gaussians P(X = x Y = +1) = N(µ +, Σ + ) P(X = x Y = 1) = N(µ, Σ ) Machine Learning 1 Linear Classifiers 2
9 From the theoretical Bayes classifiers to practical classifiers... In practice Replace P(Y = +1) by its estimate n + /n, where n + := {i : y i = +1} Machine Learning 1 Linear Classifiers 3
10 From the theoretical Bayes classifiers to practical classifiers... In practice Replace P(Y = +1) by its estimate n + /n, where n + := {i : y i = +1} Replace parameters of Gaussian distribution by their estimates: ˆµ +, ˆµ, ˆΣ and ˆΣ ˆµ + := 1 n + i:y i =+1 x i ˆΣ + := 1 n + i:y i =+1 (x i ˆµ + )(x i ˆµ + ) Machine Learning 1 Linear Classifiers 3
11 From the theoretical Bayes classifiers to practical classifiers... In practice Replace P(Y = +1) by its estimate n + /n, where n + := {i : y i = +1} Replace parameters of Gaussian distribution by their estimates: ˆµ +, ˆµ, ˆΣ and ˆΣ ˆµ + := 1 n + i:y i =+1 x i ˆΣ + := 1 n + i:y i =+1 (x i ˆµ + )(x i ˆµ + ) Classify according to ˆf (x) := arg max y {,+} p µy,ˆσ y (x) ny where p µy,ˆσ y is the multivariate Gaussian pdf, pˆµ+,ˆσ + (x) := 1 (2π) d det(ˆσ +) exp( 1 2 (x ˆµ +) ˆΣ + 1 (x ˆµ+ )) n, Machine Learning 1 Linear Classifiers 3
12 Yield three different kind of classifiers differing in their assumptions on the covariance! Assumptions Both Gaussians are isotropic (no covariance, same variance) formally: ˆΣ+ = σ 2 +I and ˆΣ = σ 2 I, where I is the d d identity matrix Both Gaussian have equal covariance: ˆΣ+ = ˆΣ Classifier\ Assumptions isotropic equal covariance Nearest centroid classifier (NCC) Linear discriminant analysis (LDA) Quadratic discriminant analysis For simplicity, consider the case where n + = n (general case is a trivial extension). Machine Learning 1 Linear Classifiers 4
13 NCC: Nearest centroid classifier (formerly known as simple no-name classifier) Derivation in a nutshell Decision surface given by pˆµ+,ˆσ + n+ n = pˆµ,ˆσ n n Machine Learning 1 Linear Classifiers 5
14 NCC: Nearest centroid classifier (formerly known as simple no-name classifier) Derivation in a nutshell Decision surface given by pˆµ+,ˆσ + n+ n Inserting definition of Gaussian pdf, = pˆµ,ˆσ n n pˆµ+,ˆσ + (x) := 1 (2π) d det(ˆσ +) exp( 1 2 (x ˆµ +) ˆΣ+ 1 (x ˆµ+ )) Machine Learning 1 Linear Classifiers 5
15 NCC: Nearest centroid classifier (formerly known as simple no-name classifier) Derivation in a nutshell Decision surface given by pˆµ+,ˆσ + n+ n Inserting definition of Gaussian pdf, = pˆµ,ˆσ n n pˆµ+,ˆσ + (x) := 1 (2π) d det(ˆσ +) exp( 1 2 (x ˆµ +) ˆΣ+ 1 (x ˆµ+ )) Simplifies a lot because ˆΣ + = σ 2 I = ˆΣ Machine Learning 1 Linear Classifiers 5
16 NCC: Nearest centroid classifier (formerly known as simple no-name classifier) Derivation in a nutshell Decision surface given by pˆµ+,ˆσ + n+ n Inserting definition of Gaussian pdf, = pˆµ,ˆσ n n pˆµ+,ˆσ + (x) := 1 (2π) d det(ˆσ +) exp( 1 2 (x ˆµ +) ˆΣ+ 1 (x ˆµ+ )) Simplifies a lot because ˆΣ + = σ 2 I = ˆΣ Easy calculation: because of the assumptions, a lot of terms cancel out, and the decision surface simply boils down to x ˆµ + 2 = x ˆµ 2, or equivalently: (ˆµ + ˆµ ) x + 1 }{{} 2 ( ˆµ 2 ˆµ+ 2 ) = 0 }{{} =:w =:b Machine Learning 1 Linear Classifiers 5
17 NCC: Nearest centroid classifier (formerly known as simple no-name classifier) Derivation in a nutshell Decision surface given by pˆµ+,ˆσ + n+ n Inserting definition of Gaussian pdf, = pˆµ,ˆσ n n pˆµ+,ˆσ + (x) := 1 (2π) d det(ˆσ +) exp( 1 2 (x ˆµ +) ˆΣ+ 1 (x ˆµ+ )) Simplifies a lot because ˆΣ + = σ 2 I = ˆΣ Easy calculation: because of the assumptions, a lot of terms cancel out, and the decision surface simply boils down to x ˆµ + 2 = x ˆµ 2, or equivalently: (ˆµ + ˆµ ) x + 1 }{{} 2 ( ˆµ 2 ˆµ+ 2 ) = 0 }{{} =:w =:b A classifier of the form w x + b = 0 is called linear classifier. Machine Learning 1 Linear Classifiers 5
18 NCC: Nearest centroid classifier (continued) Training 1: function TRAINNCC(x 1,..., x n, y 1,..., y n ) 2: precompute ˆµ + and ˆµ (see Slide 3) 3: [extension: also compute n + and n ] 4: compute w and b (see previous slide) 5: return w, b 6: end function Machine Learning 1 Linear Classifiers 6
19 NCC: Nearest centroid classifier (continued) Training 1: function TRAINNCC(x 1,..., x n, y 1,..., y n ) 2: precompute ˆµ + and ˆµ (see Slide 3) 3: [extension: also compute n + and n ] 4: compute w and b (see previous slide) 5: return w, b 6: end function Prediction 1: function PREDICTLINEAR(x, w, b) 2: if w x + b > 0 then return y = +1 3: else return y = 1 4: end if 5: end function Machine Learning 1 Linear Classifiers 6
20 Linear discriminant analysis (LDA) Linear discriminant analysis Additionally assume equal covariance, ˆΣ + = ˆΣ Derivation similar to NCC Yields w = ˆΣ 1 + (ˆµ + ˆµ ) Machine Learning 1 Linear Classifiers 7
21 Linear discriminant analysis (LDA) Linear discriminant analysis Additionally assume equal covariance, ˆΣ + = ˆΣ Derivation similar to NCC Yields w = ˆΣ 1 + (ˆµ + ˆµ ) Assumption of equal covariance violated in practice. Machine Learning 1 Linear Classifiers 7
22 Fisher s discriminant analysis (FDA) FDA trick Put ˆΣ := 1 2 (ˆΣ + + ˆΣ ) Machine Learning 1 Linear Classifiers 8
23 Fisher s discriminant analysis (FDA) FDA trick Put ˆΣ := 1 2 (ˆΣ + + ˆΣ ) Predict via w = ˆΣ 1 (ˆµ + ˆµ ) Machine Learning 1 Linear Classifiers 8
24 Fisher s discriminant analysis (FDA) FDA trick Put ˆΣ := 1 2 (ˆΣ + + ˆΣ ) Predict via w = ˆΣ 1 (ˆµ + ˆµ ) compute b so that n i=1 1 y i =sign(w x i +b) is minimized Machine Learning 1 Linear Classifiers 8
25 Fisher s discriminant analysis (FDA) FDA trick Put ˆΣ := 1 2 (ˆΣ + + ˆΣ ) Predict via w = ˆΣ 1 (ˆµ + ˆµ ) compute b so that n i=1 1 y i =sign(w x i +b) is minimized Training 1: function LINEAR DISCRIMINANT ANALYSIS(x 1,..., x n, y 1,..., y n ) 2: precompute ˆµ + and ˆµ, as well as ˆΣ = 1 2 (ˆΣ + + ˆΣ ) 3: put w = ˆΣ 1 (ˆµ + ˆµ ) 4: return w, b 5: end function Machine Learning 1 Linear Classifiers 8
26 Fisher s discriminant analysis (FDA) FDA trick Put ˆΣ := 1 2 (ˆΣ + + ˆΣ ) Predict via w = ˆΣ 1 (ˆµ + ˆµ ) compute b so that n i=1 1 y i =sign(w x i +b) is minimized Training 1: function LINEAR DISCRIMINANT ANALYSIS(x 1,..., x n, y 1,..., y n ) 2: precompute ˆµ + and ˆµ, as well as ˆΣ = 1 2 (ˆΣ + + ˆΣ ) 3: put w = ˆΣ 1 (ˆµ + ˆµ ) 4: return w, b 5: end function Prediction again via function PREDICTLINEAR (see Slide 18). Machine Learning 1 Linear Classifiers 8
27 Linear Classifiers Generally classifiers of the form f (x) = sign(w x + b) are called linear classifiers. Machine Learning 1 Linear Classifiers 9
28 Linear Classifiers Generally classifiers of the form f (x) = sign(w x + b) are called linear classifiers. Remark: trick for computation of b (see previous slide) can be used for all linear classifiers Machine Learning 1 Linear Classifiers 9
29 Linear Classifiers Generally classifiers of the form f (x) = sign(w x + b) are called linear classifiers. Remark: trick for computation of b (see previous slide) can be used for all linear classifiers What are advantages and disadvantages of linear classifiers? Advantages Machine Learning 1 Linear Classifiers 9
30 Linear Classifiers Generally classifiers of the form f (x) = sign(w x + b) are called linear classifiers. Remark: trick for computation of b (see previous slide) can be used for all linear classifiers What are advantages and disadvantages of linear classifiers? Advantages + Easy to understand and interpret Machine Learning 1 Linear Classifiers 9
31 Linear Classifiers Generally classifiers of the form f (x) = sign(w x + b) are called linear classifiers. Remark: trick for computation of b (see previous slide) can be used for all linear classifiers What are advantages and disadvantages of linear classifiers? Advantages + Easy to understand and interpret + In practice: work well surprisingly often Machine Learning 1 Linear Classifiers 9
32 Linear Classifiers Generally classifiers of the form f (x) = sign(w x + b) are called linear classifiers. Remark: trick for computation of b (see previous slide) can be used for all linear classifiers What are advantages and disadvantages of linear classifiers? Advantages + Easy to understand and interpret + In practice: work well surprisingly often + Fast Machine Learning 1 Linear Classifiers 9
33 Linear Classifiers Generally classifiers of the form f (x) = sign(w x + b) are called linear classifiers. Remark: trick for computation of b (see previous slide) can be used for all linear classifiers What are advantages and disadvantages of linear classifiers? Advantages + Easy to understand and interpret + In practice: work well surprisingly often + Fast Disadvantages Machine Learning 1 Linear Classifiers 9
34 Linear Classifiers Generally classifiers of the form f (x) = sign(w x + b) are called linear classifiers. Remark: trick for computation of b (see previous slide) can be used for all linear classifiers What are advantages and disadvantages of linear classifiers? Advantages + Easy to understand and interpret + In practice: work well surprisingly often + Fast Disadvantages Suboptimal performance if true decision boundary is non-linear Machine Learning 1 Linear Classifiers 9
35 Linear Classifiers Generally classifiers of the form f (x) = sign(w x + b) are called linear classifiers. Remark: trick for computation of b (see previous slide) can be used for all linear classifiers What are advantages and disadvantages of linear classifiers? Advantages + Easy to understand and interpret + In practice: work well surprisingly often + Fast Disadvantages Suboptimal performance if true decision boundary is non-linear Occurs for very complex problems such as recognition problems and many others Machine Learning 1 Linear Classifiers 9
36 Roadmap Will introduce now linear support vector machines (SVM) Machine Learning 1 Linear Classifiers 10
37 Roadmap Will introduce now linear support vector machines (SVM) Coming lecture: non-linear SVMs Machine Learning 1 Linear Classifiers 10
38 Roadmap Will introduce now linear support vector machines (SVM) Coming lecture: non-linear SVMs SVM is a very successful state-of-the-art learning algorithm Machine Learning 1 Linear Classifiers 10
39 Linear Support Vector Machines Core idea: separate the data with large margin How can we formally describe this idea? Machine Learning 1 Linear Classifiers 11
40 Linear Support Vector Machines Core idea: separate the data with large margin How can we formally describe this idea? (Maximize margin such that all data points lie outside of the margin...) Machine Learning 1 Linear Classifiers 11
41 Linear Support Vector Machines Core idea: separate the data with large margin How can we formally describe this idea? (Maximize margin such that all data points lie outside of the margin...) Note: from now part of the lecture will take place at the board. Machine Learning 1 Linear Classifiers 11
42 Elemental Geometry Recap Recall from linear algebra the definition of the component of a vector a with respect to another vector b: [illustrated by board picture] comp b a := a b b Machine Learning 1 Linear Classifiers 12
43 Elemental Geometry Recap Recall from linear algebra the definition of the component of a vector a with respect to another vector b: [illustrated by board picture] Follows from comp b a := a b b Machine Learning 1 Linear Classifiers 12
44 Elemental Geometry Recap Recall from linear algebra the definition of the component of a vector a with respect to another vector b: [illustrated by board picture] Follows from Elementary geometry: cos( (a, b)) := comp b a a comp b a := a b b Machine Learning 1 Linear Classifiers 12
45 Elemental Geometry Recap Recall from linear algebra the definition of the component of a vector a with respect to another vector b: [illustrated by board picture] Follows from Elementary geometry: cos( (a, b)) := comp b a cos( (a, b)) = comp b a := a b b a a b a b [illustrated by board picture] Machine Learning 1 Linear Classifiers 12
46 Linear SVMs (continued) Formalizing the geometric intuition [see board picture]: Machine Learning 1 Linear Classifiers 13
47 Linear SVMs (continued) Formalizing the geometric intuition Denote margin by γ [see board picture]: Machine Learning 1 Linear Classifiers 13
48 Linear SVMs (continued) Formalizing the geometric intuition Denote margin by γ Task: Maximize the margin γ [see board picture]: Machine Learning 1 Linear Classifiers 13
49 Linear SVMs (continued) Formalizing the geometric intuition Denote margin by γ Task: [see board picture]: Maximize the margin γ such that all positive data points lie on one side, γ comp w x i for all i with y i = +1, Machine Learning 1 Linear Classifiers 13
50 Linear SVMs (continued) Formalizing the geometric intuition Denote margin by γ Task: [see board picture]: Maximize the margin γ such that all positive data points lie on one side, γ comp w x i for all i with y i = +1, and all negative points on the other, comp w x i γ for all i with y i = 1. Machine Learning 1 Linear Classifiers 13
51 Linear SVMs (continued) Formalizing the geometric intuition Denote margin by γ Task: [see board picture]: Maximize the margin γ such that all positive data points lie on one side, γ comp w x i for all i with y i = +1, and all negative points on the other, comp w x i γ for all i with y i = 1. The maximization is over the variables γ R and w R d. Machine Learning 1 Linear Classifiers 13
52 Linear SVMs (continued) Note: If y i = +1 then γ comp w x i is (multiplying y i on both sides of the inequality) the same as γ y i comp w x i Machine Learning 1 Linear Classifiers 14
53 Linear SVMs (continued) Note: If y i = +1 then γ comp w x i is (multiplying y i on both sides of the inequality) the same as γ y i comp w x i If y i = 1 then γ comp w x i is (multiplying y i on both sides of the inequality) the same as γ y i comp w x i Machine Learning 1 Linear Classifiers 14
54 Linear SVMs (continued) Note: If y i = +1 then γ comp w x i is (multiplying y i on both sides of the inequality) the same as γ y i comp w x i If y i = 1 then γ comp w x i is (multiplying y i on both sides of the inequality) the same as γ y i comp w x i So in both cases we have γ y i comp w x i Machine Learning 1 Linear Classifiers 14
55 Linear SVMs (continued) Note: If y i = +1 then γ comp w x i is (multiplying y i on both sides of the inequality) the same as γ y i comp w x i If y i = 1 then γ comp w x i is (multiplying y i on both sides of the inequality) the same as γ y i comp w x i So in both cases we have γ y i comp w x i By definition of the component of a vector with regard to another vector: comp w x i = w x i w Machine Learning 1 Linear Classifiers 14
56 Linear SVMs (continued) Note: If y i = +1 then γ comp w x i is (multiplying y i on both sides of the inequality) the same as γ y i comp w x i If y i = 1 then γ comp w x i is (multiplying y i on both sides of the inequality) the same as γ y i comp w x i So in both cases we have γ y i comp w x i By definition of the component of a vector with regard to another vector: comp w x i = w x i w Thus, the problem from the previous slide becomes: Linear SVM a first preliminary definition w x i max γ s.t. γ y i γ R,w R d w for all i = 1,... n Machine Learning 1 Linear Classifiers 14
57 Linear SVMs (continued) Note: If y i = +1 then γ comp w x i is (multiplying y i on both sides of the inequality) the same as γ y i comp w x i If y i = 1 then γ comp w x i is (multiplying y i on both sides of the inequality) the same as γ y i comp w x i So in both cases we have γ y i comp w x i By definition of the component of a vector with regard to another vector: comp w x i = w x i w Thus, the problem from the previous slide becomes: Linear SVM a first preliminary definition w x i max γ s.t. γ y i γ R,w R d w for all i = 1,... n Remark: we read s.t. as subject to the constraints. Machine Learning 1 Linear Classifiers 14
58 Linear SVMs (continued) More generally, we allow for positioning the hyperplane off the origin by introducing a so-called bias b: Hard-Margin SVM with bias ( w ) x i max γ s.t. γ y i γ,b R,w R d w + b for all i = 1,... n Machine Learning 1 Linear Classifiers 15
59 Linear SVMs (continued) More generally, we allow for positioning the hyperplane off the origin by introducing a so-called bias b: Hard-Margin SVM with bias ( w ) x i max γ s.t. γ y i γ,b R,w R d w + b for all i = 1,... n Problem: sometimes there the above problem is void because no separating hyperplane exists! Machine Learning 1 Linear Classifiers 15
60 Limitations of Hard-Margin SVMs Any three points in the plane (R 2 ) can be shattered (separated) by a hyperplane (= linear classifier). Machine Learning 1 Linear Classifiers 16
61 Limitations of Hard-Margin SVMs Any three points in the plane (R 2 ) can be shattered (separated) by a hyperplane (= linear classifier). But there are configurations of four points which no hyperplane can shatter. Machine Learning 1 Linear Classifiers 16
62 Limitations of Hard-Margin SVMs Any three points in the plane (R 2 ) can be shattered (separated) by a hyperplane (= linear classifier). But there are configurations of four points which no hyperplane can shatter. More generally: Any n + 1 points in R n can be shattered by a hyperplane. But there are configurations of n + 2 points which no hyperplane can shatter. Machine Learning 1 Linear Classifiers 16
63 Limitations Hard-Margin SVMs (continued) Another Problem is that of outliers potentially corrupting the SVM: Machine Learning 1 Linear Classifiers 17
64 Remedy: Soft-Margin SVMs Core idea: Introduce for each input x i a slack variable ξ i 0 that allows for some (slight violations of the margin separation): Machine Learning 1 Linear Classifiers 18
65 Remedy: Soft-Margin SVMs Core idea: Introduce for each input x i a slack variable ξ i 0 that allows for some (slight violations of the margin separation): Linear Hard-Margin SVM (with bias) max γ,b R,w R d,ξ 1,...,ξ n 0 γ C n i=1 ( w ) x i ξ i s.t. i : γ y i w + b +ξ i Machine Learning 1 Linear Classifiers 18
66 Remedy: Soft-Margin SVMs Core idea: Introduce for each input x i a slack variable ξ i 0 that allows for some (slight violations of the margin separation): Linear Hard-Margin SVM (with bias) max γ,b R,w R d,ξ 1,...,ξ n 0 γ C n i=1 ( w ) x i ξ i s.t. i : γ y i w + b +ξ i where we minimize also n i=1 ξ i to allow only for slight violations of the margin separation Machine Learning 1 Linear Classifiers 18
67 Remedy: Soft-Margin SVMs Core idea: Introduce for each input x i a slack variable ξ i 0 that allows for some (slight violations of the margin separation): Linear Hard-Margin SVM (with bias) max γ,b R,w R d,ξ 1,...,ξ n 0 γ C n i=1 ( w ) x i ξ i s.t. i : γ y i w + b +ξ i where we minimize also n i=1 ξ i to allow only for slight violations of the margin separation C is a trade-off parameter (to be set in advance): the higher C, the more we penalize violations of the margin separation Machine Learning 1 Linear Classifiers 18
68 Support Vectors Denote by γ and w the arguments of the linear SVM maximization of the previous slide, that is: n (γ, w ) := arg max γ C ξ i γ,b R,w R d i=1 s.t. γ y i ( w x i w + b ) + ξ i Machine Learning 1 Linear Classifiers 19
69 Support Vectors Denote by γ and w the arguments of the linear SVM maximization of the previous slide, that is: n (γ, w ) := arg max γ C ξ i γ,b R,w R d i=1 s.t. γ y i ( w x i w + b ) + ξ i ( ) All vectors x i with γ x y i w i w + b are called support vectors. Machine Learning 1 Linear Classifiers 19
70 Support Vectors Denote by γ and w the arguments of the linear SVM maximization of the previous slide, that is: n (γ, w ) := arg max γ C ξ i γ,b R,w R d i=1 s.t. γ y i ( w x i w + b ) + ξ i ( ) All vectors x i with γ x y i w i w + b are called support vectors. What does this mean geometrically? Machine Learning 1 Linear Classifiers 19
71 Support Vectors Denote by γ and w the arguments of the linear SVM maximization of the previous slide, that is: n (γ, w ) := arg max γ C ξ i γ,b R,w R d i=1 s.t. γ y i ( w x i w + b ) + ξ i ( ) All vectors x i with γ x y i w i w + b are called support vectors. What does this mean geometrically? Machine Learning 1 Linear Classifiers 19
72 SVM training How can we train SVMs, that is, how to solve the minimization task? Machine Learning 1 Linear Classifiers 20
73 Convex Optimization Problems It is known from decades of research in numerical mathematics that so-called convex optimization problems (to be introduced in detail in the next lecture) can be solved very efficiently. Machine Learning 1 Linear Classifiers 21
74 Convex Optimization Problems It is known from decades of research in numerical mathematics that so-called convex optimization problems (to be introduced in detail in the next lecture) can be solved very efficiently. Convex optimization problem min v R d f (v) s.t. f i (v) 0, i = 1,..., m h i (v) = 0, i = 1,..., l, where f, f 1,..., m are convex functions (introduced in the next lecture) and h 1,..., h l are linear functions. Machine Learning 1 Linear Classifiers 21
75 Convex Optimization Problems It is known from decades of research in numerical mathematics that so-called convex optimization problems (to be introduced in detail in the next lecture) can be solved very efficiently. Convex optimization problem min v R d f (v) s.t. f i (v) 0, i = 1,..., m h i (v) = 0, i = 1,..., l, where f, f 1,..., m are convex functions (introduced in the next lecture) and h 1,..., h l are linear functions. Can we translate the linear SVM maximization problem into a convex minimization problem? Machine Learning 1 Linear Classifiers 21
76 Convex Optimization Problems It is known from decades of research in numerical mathematics that so-called convex optimization problems (to be introduced in detail in the next lecture) can be solved very efficiently. Convex optimization problem min v R d f (v) s.t. f i (v) 0, i = 1,..., m h i (v) = 0, i = 1,..., l, where f, f 1,..., m are convex functions (introduced in the next lecture) and h 1,..., h l are linear functions. Can we translate the linear SVM maximization problem into a convex minimization problem? How to solve this problem? Machine Learning 1 Linear Classifiers 21
77 Convex Optimization Problems It is known from decades of research in numerical mathematics that so-called convex optimization problems (to be introduced in detail in the next lecture) can be solved very efficiently. Convex optimization problem min v R d f (v) s.t. f i (v) 0, i = 1,..., m h i (v) = 0, i = 1,..., l, where f, f 1,..., m are convex functions (introduced in the next lecture) and h 1,..., h l are linear functions. Can we translate the linear SVM maximization problem into a convex minimization problem? How to solve this problem? Machine Learning 1 Linear Classifiers 21
78 Conclusion Linear classifiers Machine Learning 1 Linear Classifiers 22
79 Conclusion Linear classifiers Classifiers motivated by Bayesian decision theory Machine Learning 1 Linear Classifiers 22
80 Conclusion Linear classifiers Classifiers motivated by Bayesian decision theory Nearest Centroid Classifier (NCC) Machine Learning 1 Linear Classifiers 22
81 Conclusion Linear classifiers Classifiers motivated by Bayesian decision theory Nearest Centroid Classifier (NCC) Linear Discriminant Analysis / Fisher s Linear Discrimiannt Machine Learning 1 Linear Classifiers 22
82 Conclusion Linear classifiers Classifiers motivated by Bayesian decision theory Nearest Centroid Classifier (NCC) Linear Discriminant Analysis / Fisher s Linear Discrimiannt Support Vector Machines Machine Learning 1 Linear Classifiers 22
83 Conclusion Linear classifiers Classifiers motivated by Bayesian decision theory Nearest Centroid Classifier (NCC) Linear Discriminant Analysis / Fisher s Linear Discrimiannt Support Vector Machines Motivated geometrically Machine Learning 1 Linear Classifiers 22
84 Conclusion Linear classifiers Classifiers motivated by Bayesian decision theory Nearest Centroid Classifier (NCC) Linear Discriminant Analysis / Fisher s Linear Discrimiannt Support Vector Machines Motivated geometrically Maximize the margin between positive and negative inputs Machine Learning 1 Linear Classifiers 22
85 Conclusion Linear classifiers Classifiers motivated by Bayesian decision theory Nearest Centroid Classifier (NCC) Linear Discriminant Analysis / Fisher s Linear Discrimiannt Support Vector Machines Motivated geometrically Maximize the margin between positive and negative inputs Can be described as numerical optimization problem. Machine Learning 1 Linear Classifiers 22
86 Conclusion Linear classifiers Classifiers motivated by Bayesian decision theory Nearest Centroid Classifier (NCC) Linear Discriminant Analysis / Fisher s Linear Discrimiannt Support Vector Machines Motivated geometrically Maximize the margin between positive and negative inputs Can be described as numerical optimization problem. How to optimize?? Machine Learning 1 Linear Classifiers 22
87 Conclusion Linear classifiers Classifiers motivated by Bayesian decision theory Nearest Centroid Classifier (NCC) Linear Discriminant Analysis / Fisher s Linear Discrimiannt Support Vector Machines Motivated geometrically Maximize the margin between positive and negative inputs Can be described as numerical optimization problem. How to optimize?? Will show can be formulated as a convex optimization problem. Machine Learning 1 Linear Classifiers 22
Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationL5 Support Vector Classification
L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander
More informationIntroduction to Machine Learning Spring 2018 Note 18
CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationSupport Vector Machines and Bayes Regression
Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationAnnouncements - Homework
Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationCOMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification
COMP 55 Applied Machine Learning Lecture 5: Generative models for linear classification Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material
More informationPattern Recognition 2018 Support Vector Machines
Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht
More informationLinear classifiers Lecture 3
Linear classifiers Lecture 3 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin ML Methodology Data: labeled instances, e.g. emails marked spam/ham
More informationLinear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26
Huiping Cao, Slide 1/26 Classification Linear SVM Huiping Cao linear hyperplane (decision boundary) that will separate the data Huiping Cao, Slide 2/26 Support Vector Machines rt Vector Find a linear Machines
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationBANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1
BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationLECTURE NOTE #3 PROF. ALAN YUILLE
LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationSVM optimization and Kernel methods
Announcements SVM optimization and Kernel methods w 4 is up. Due in a week. Kaggle is up 4/13/17 1 4/13/17 2 Outline Review SVM optimization Non-linear transformations in SVM Soft-margin SVM Goal: Find
More informationNeural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science
Neural Networks Prof. Dr. Rudolf Kruse Computational Intelligence Group Faculty for Computer Science kruse@iws.cs.uni-magdeburg.de Rudolf Kruse Neural Networks 1 Supervised Learning / Support Vector Machines
More informationSupport Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar
Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationLecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016
Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationSupport Vector Machine
Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
More informationLecture 10: A brief introduction to Support Vector Machine
Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationSupport Vector Machine. Natural Language Processing Lab lizhonghua
Support Vector Machine Natural Language Processing Lab lizhonghua Support Vector Machine Introduction Theory SVM primal and dual problem Parameter selection and practical issues Compare to other classifier
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationSupport Vector Machines. Machine Learning Fall 2017
Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationMSA220 Statistical Learning for Big Data
MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLecture 10: Support Vector Machine and Large Margin Classifier
Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More information7 Gaussian Discriminant Analysis (including QDA and LDA)
36 Jonathan Richard Shewchuk 7 Gaussian Discriminant Analysis (including QDA and LDA) GAUSSIAN DISCRIMINANT ANALYSIS Fundamental assumption: each class comes from normal distribution (Gaussian). X N(µ,
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationLecture 16: Modern Classification (I) - Separating Hyperplanes
Lecture 16: Modern Classification (I) - Separating Hyperplanes Outline 1 2 Separating Hyperplane Binary SVM for Separable Case Bayes Rule for Binary Problems Consider the simplest case: two classes are
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More information