Supervised Learning. 1. Optimization. without Scikit Learn. by Prof. Seungchul Lee Industrial AI Lab

Size: px
Start display at page:

Download "Supervised Learning. 1. Optimization. without Scikit Learn. by Prof. Seungchul Lee Industrial AI Lab"

Transcription

1 Supervised Learning without Scikit Learn by Prof. Seungchul Lee Industrial AI Lab POSECH able of Contents I.. Optimization II.. Linear Regression III. 3. Classification (Linear) I. 3.. Distance II. 3.. Illustrative Example I Optimization Formulation II Outlier III Optimization Formulation III Maximize Margin (Finally, it is Support Vector Machine) IV Logistic Regression. Optimization an important tool in ) engineering problem solving and ) decision science peolple optimize nature optimizes

2 (source: ( 3 key components. objective. decision variable or unknown 3. constraints Procedures. he process of identifying objective, variables, and constraints for a given problem is known as "modeling". Once the model has been formulated, optimization algorithm can be used to find its solutions. In mathematical expression min x subject to f(x) g i (x) 0, i =,, m Remarks) equivalent min f(x) x (x) 0 g i h(x) = 0 max f(x) x g i (x) 0 h(x) 0 and { h(x) 0 he good news: for many classes of optimization problems, people have already done all the "hardwork" of developing numerical algorithms

3 max x subject to x + x x + x 9 x + x 5 x x 5 min x subject to x [ ] [ ] x x [ ] [ 9 ] [ ] x 5 x [ ] [ ] [ ] 5 x In []: import numpy as np import matplotlib.pyplot as plt import cvxpy as cvx f = np.array([[, ]]) A = np.array([[, ], [, ]]) b = np.array([[9], [5]]) lb = np.array([[], [5]]) x = cvx.variable(,) obj = cvx.minimize(-f*x) const = [A*x <= b, lb <= x] prob = cvx.problem(obj, const).solve() print (x.value) [[.] [ 7.]]. Linear Regression Begin by considering linear regression (easy to extend to more comlex predictions later on) Given { x : inputs i : outputs, find θ and θ yi y^i : predicted output x x x =, y = = + x m y y y m y^i θ x i θ θ θ = [ ] : Model parameters θ y^i = f( x i, θ) in general

4 In many cases, a linear model to predict y i can be used m θ θ y^i = θx i + θ such that min ( y^i y i ), i=

5 In []: import numpy as np import matplotlib.pyplot as plt # data points in column vector [input, output] x = np.array([0., 0.4, 0.7,.,.3,.7,.,.8, 3.0, 4.0, 4.3, 4.4, 4. 9]).reshape(-, ) y = np.array([0.5, 0.9,.,.5,.5,.0,.,.8,.7, 3.0, 3.5, 3.7, 3. 9]).reshape(-, ) # to plot plt.figure(figsize=(0, 6)) plt.plot(x, y, 'ko', label="data") plt.xlabel('x', fontsize=5) plt.ylabel('y', fontsize=5) plt.axis('scaled') plt.grid(alpha=0.3) plt.xlim([0, 5]) plt.show()

6 Use CVXPY optimization (least squared) For convenience, we define a function that maps inputs to feature vectors, ϕ x i θ y^i = [ ] [ ] θ x i θ = [ ] [ ] = ϕ ( x i )θ θ, x i feature vector ϕ( x i ) = [ ] x ϕ ( x ) y^ x ϕ ( x ) y^ Φ = = = = Φθ y^ x m ϕ ( ) x m y^m Model parameter estimation min θ y^ y = min Φθ y θ In [3]: import cvxpy as cvx m = y.shape[0] #A = np.hstack([x, np.ones([m, ])]) A = np.hstack([x, x**0]) A = np.asmatrix(a) theta = cvx.variable(, ) obj = cvx.minimize(cvx.norm(a*theta-y, )) cvx.problem(obj,[]).solve() print('theta:\n', theta.value) theta: [[ ] [ ]]

7 In [4]: # to plot plt.figure(figsize=(0, 6)) plt.title('$l_$ Regression', fontsize=5) plt.xlabel('x', fontsize=5) plt.ylabel('y', fontsize=5) plt.plot(x, y, 'ko', label="data") # to plot a straight line (fitted line) xp = np.arange(0, 5, 0.0).reshape(-, ) theta = theta.value yp = theta[0,0]*xp + theta[,0] plt.plot(xp, yp, 'r', linewidth=, label="$l_$") plt.legend(fontsize=5) plt.axis('scaled') plt.grid(alpha=0.3) plt.xlim([0, 5]) plt.show() 3. Classification (Linear) Figure out, autonomously, which category (or class) an unknown item should be categorized into Number of categories / classes Binary: different classes Multiclass: more than classes Feature he measurable parts that make up the unknown item (or the information you have available to categorize)

8 Perceptron: make use of sign of data Discuss it later Logistic regression is a classification algorithm don't be confused o find a classification boundary Sign Sign with respect to a line ω ω = [ x ], x = [ ] g(x) = ω x + ω x + ω 0 = ω x + ω 0 ω x ω 0 ω ω = ω, x = x g(x) = ω 0 + ω x + ω x = ω x x

9 Perceptron Hyperplane Separates a D-dimensional space into two half-spaces Defined by an outward pointing normal vector ω is orthogonal to any vector lying on the hyperplane How to find ω All data in class g( x) > 0 ω All data in class 0 g( ω x) < 0

10 Perceptron Algorithm he perceptron implements h(x) = sign( ω x) Given the training set (, ), (, ),, (, ) where {, } x y x y x N y N y i ) pick a misclassified point sign( ω x n ) y n ) and update the weight vector ω ω + y n x n Why perceptron updates work? Let's look at a misclassified positive example ( ) perceptron (wrongly) thinks ω < 0 old x n y n = + updates would be ω new ω newx n = + = + ω old y n x n ω old x n = ( + = + ω old x n ) x n ω old x n x nx n hus ω newx n is less negative than ω old x n In [5]: import numpy as np import matplotlib.pyplot as plt % matplotlib inline

11 In [6]: #training data gerneration m = 00 x = 8*np.random.rand(m, ) x = 7*np.random.rand(m, ) - 4 g0 = 0.8*x + x - 3 g = g0 - g = g0 + In [7]: C = np.where(g >= 0) C = np.where(g < 0) print(c) (array([ 0,, 4, 5, 6, 7, 0, 7,, 4, 5, 7, 8, 30, 3, 38, 5, 5, 53, 56, 57, 6, 6, 64, 70, 78, 85, 86, 89, 98], dtype=int64), a rray([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64)) In [8]: C = np.where(g >= 0)[0] C = np.where(g < 0)[0] print(c.shape) print(c.shape) (30,) (4,) In [9]: plt.figure(figsize=(0, 6)) plt.plot(x[c], x[c], 'ro', label='c') plt.plot(x[c], x[c], 'bo', label='c') plt.title('linearly seperable classes', fontsize=5) plt.legend(loc='upper left', fontsize=5) plt.xlabel(r'$x_$', fontsize=0) plt.ylabel(r'$x_$', fontsize=0) plt.show()

12 x y = = = ( x () ) ( x () ) ( x (3) ) ( x (m) ) y () y () y (3) y (m) x () x () x (3) x (m) x () x () x (3) x (m) In [0]: X = np.hstack([np.ones([c.shape[0],]), x[c], x[c]]) X = np.hstack([np.ones([c.shape[0],]), x[c], x[c]]) X = np.vstack([x, X]) y = np.vstack([np.ones([c.shape[0],]), -np.ones([c.shape[0],])]) X = np.asmatrix(x) y = np.asmatrix(y) where (x, y) is a misclassified training point ω 3 ω = ω ω ω ω + yx In []: w = np.ones([3,]) w = np.asmatrix(w) n_iter = y.shape[0] for k in range(n_iter): for i in range(n_iter): if y[i,0]!= np.sign(x[i,:]*w)[0,0]: w += y[i,0]*x[i,:]. print(w) [[-3. ] [ ] [ ]] Not a unique solution g(x) = ω x + ω 0 = ω x + ω x + ω 0 = 0 ω x = ω 0 x ω ω

13 In []: xp = np.linspace(0,8,00).reshape(-,) xp = - w[,0]/w[,0]*xp - w[0,0]/w[,0] plt.figure(figsize=(0, 6)) plt.scatter(x[c], x[c], c='r', s=50, label='c') plt.scatter(x[c], x[c], c='b', s=50, label='c') plt.plot(xp, xp, c='k', label='perceptron') plt.xlim([0,8]) plt.xlabel('$x_$', fontsize = 0) plt.ylabel('$x_$', fontsize = 0) plt.legend(loc = 4, fontsize = 5) plt.show() Perceptron 3.. Distance ω = [ ], x = [ ] g(x) = x + = + + ω ω x x ω ω 0 ω x ω x ω 0

14 Find a distance between g(x) = and g(x) = suppose g( x ) =, g( x ) = ω x + ω 0 = ω ( x x ) = ω x + ω 0 = s =, = ( ) = ω ω x x ω ω x x ω

15 3.. Illustrative Example Binary classification: C and C Features: the coordinate of ith data x x = [ ] x Is it possible to distinguish between C and C by its coordinates? We need to find a separating hyperplane (or a line in D) ω x + ω x + ω 0 = 0 x [ ] [ ω ω ] + ω 0 = 0 x ω x + ω 0 = 0 In [3]: import numpy as np import matplotlib.pyplot as plt #training data gerneration x = 8*np.random.rand(00, ) x = 7*np.random.rand(00, ) - 4 g0 = 0.8*x + x - 3 g = g0 - g = g0 + C = np.where(g >= 0)[0] C = np.where(g < 0)[0]

16 In [4]: xp = np.linspace(0,8,00).reshape(-,) ypt = -0.8*xp + 3 plt.figure(figsize=(0, 6)) plt.plot(x[c], x[c], 'ro', label='c') plt.plot(x[c], x[c], 'bo', label='c') plt.plot(xp, ypt, '--k', label='rue') plt.title('linearly and strictly separable classes', fontweight = 'bold', fo ntsize = 5) plt.xlabel('$x_$', fontsize = 0) plt.ylabel('$x_$', fontsize = 0) plt.legend(loc = 4) plt.xlim([0, 8]) plt.ylim([-4, 3]) plt.show()

17 Given: Hyperplane defined by ω and ω 0 Animals coordinates (or features) x Decision making: ω x + ω0 > 0 x belongs to C ω x + ω 0 < 0 x belongs to C Find and such that given ω ω 0 or ω ω 0 x x + = 0 Find and such that given and given ω ω 0 ω ω 0 x C ω x + ω 0 > x C x + < Same problem if strictly separable ω x + ω 0 > b ω ω 0 b ω ω 0 x + > b x + >

18 In [5]: # see how data are generated xp = np.linspace(0,8,00).reshape(-,) ypt = -0.8*xp + 3 plt.figure(figsize=(0, 6)) plt.plot(x[c], x[c], 'ro', label='c') plt.plot(x[c], x[c], 'bo', label='c') plt.plot(xp, ypt, '--k', label='true') plt.plot(xp, ypt-, '-k') plt.plot(xp, ypt+, '-k') plt.title('linearly and strictly separable classes', fontweight = 'bold', fo ntsize = 5) plt.xlabel('$x_$', fontsize = 0) plt.ylabel('$x_$', fontsize = 0) plt.legend(loc = 4) plt.xlim([0, 8]) plt.ylim([-4, 3]) plt.show() 3... Optimization Formulation n (= ) m = N + M features data points in training set x (i) x = [ (i) ω ] with ω = [ ] or = with ω = x (i) x (i) ω x (i) x (i) ω 0 ω ω N belongs to C in training set M belongs to C in training set ω and ω 0 are the unknown variables

19 minimize something minimize something subject to + + ω x () ω 0 ω x () ω 0 ω x (N) ω 0 + or subject to ω x () ω x () ω x (N) ω x (N+) ω 0 ω x (N+) ω 0 ω x (N+M) ω ω x (N+) ω x (N+) ω x (N+M) Code (CVXPY) X ( x () ) ( x () ) = = ( x (N) ) x () x () x (N) x () x () x (N) X = = ( x (N+) ) ( x (N+) ) ( x (N+M) ) x (N+) x (N+) x (N+M) x (N+) x (N+) x (N+M) minimize subject to something X ω X ω minimize subject to something X ω X ω In [6]: # CVXPY using simple classification import cvxpy as cvx N = C.shape[0] M = C.shape[0] X = np.hstack([np.ones([n,]), x[c], x[c]]) X = np.hstack([np.ones([m,]), x[c], x[c]]) X = np.asmatrix(x) X = np.asmatrix(x)

20 In [7]: w = cvx.variable(3,) obj = cvx.minimize() const = [X*w >=, X*w <= -] prob = cvx.problem(obj, const).solve() w = w.value In [8]: xp = np.linspace(0,8,00).reshape(-,) yp = - w[,0]/w[,0]*xp - w[0,0]/w[,0] plt.figure(figsize=(0, 6)) plt.plot(x[:,], X[:,], 'ro', label='c') plt.plot(x[:,], X[:,], 'bo', label='c') plt.plot(xp, yp, 'k', label='svm') plt.plot(xp, ypt, '--k', label='true') plt.xlim([0,8]) plt.xlabel('$x_$', fontsize = 0) plt.ylabel('$x_$', fontsize = 0) plt.legend(loc = 4, fontsize = 5) plt.show() 3... Outlier Note that in the real world, you may have noise, errors, or outliers that do not accurately represent the actual phenomena Non-separable case No solutions (hyperplane) exist We will allow some training examples to be misclassified! but we want their number to be minimized

21 In [9]: X = np.hstack([np.ones([n,]), x[c], x[c]]) X = np.hstack([np.ones([m,]), x[c], x[c]]) outlier = np.array([, 3, ]).reshape(-,) X = np.vstack([x, outlier.]) X = np.asmatrix(x) X = np.asmatrix(x) plt.figure(figsize=(0, 6)) plt.plot(x[:,], X[:,], 'ro', label='c') plt.plot(x[:,], X[:,], 'bo', label='c') plt.xlim([0,8]) plt.xlabel('$x_$', fontsize = 0) plt.ylabel('$x_$', fontsize = 0) plt.legend(loc = 4, fontsize = 5) plt.show() minimize subject to something X ω X ω In [0]: w = cvx.variable(3,) obj = cvx.minimize() const = [X*w >=, X*w <= -] prob = cvx.problem(obj, const).solve() print(w.value) None No solutions (hyperplane) exist We will allow some training examples to be misclassified! but we want their number to be minimized

22 3..3. Optimization Formulation n (= ) m = N + M features data points in a training set x i = x (i) with ω = x (i) N belongs to C in training set M belongs to C in training set ω and ω 0 are the variables (unknown) ω 0 ω ω minimize subject to something X ω X ω For the non-separable case, we relex the above constraints Need slack variables u and υ where all are positive he optimization problem for the non-separable case minimize subject to N u i i= + M υ i i= ω x () u ω x () u ω x (N) u N ( ) ( ) ω x (N+) υ ω x (N+) υ ω x (N+M) { u 0 v 0 ( ) υ M

23 Expressed in a matrix form X ( x () ) ( x () ) = = ( x (N) ) x () x () x (N) x () x () x (N) X = = ( x (N+) ) ( x (N+) ) ( x (N+M) ) x (N+) x (N+) x (N+M) x (N+) x (N+) x (N+M) u υ = = u u N υ υ M minimize subject to u + υ X ω u X ω ( υ) u 0 υ 0

24 In []: X = np.hstack([np.ones([c.shape[0],]), x[c], x[c]]) X = np.hstack([np.ones([c.shape[0],]), x[c], x[c]]) outlier = np.array([,, ]).reshape(-,) X = np.vstack([x, outlier.]) X = np.asmatrix(x) X = np.asmatrix(x) N = X.shape[0] M = X.shape[0] w = cvx.variable(3,) u = cvx.variable(n,) v = cvx.variable(m,) obj = cvx.minimize(np.ones((,n))*u + np.ones((,m))*v) const = [X*w >= -u, X*w <= -(-v), u >= 0, v >= 0 ] prob = cvx.problem(obj, const).solve() w = w.value In []: xp = np.linspace(0,8,00).reshape(-,) yp = - w[,0]/w[,0]*xp - w[0,0]/w[,0] plt.figure(figsize=(0, 6)) plt.plot(x[:,], X[:,], 'ro', label='c') plt.plot(x[:,], X[:,], 'bo', label='c') plt.plot(xp, yp, '--k', label='svm') plt.plot(xp, yp-/w[,0], '-k') plt.plot(xp, yp+/w[,0], '-k') plt.xlim([0,8]) plt.xlabel('$x_$', fontsize = 0) plt.ylabel('$x_$', fontsize = 0) plt.legend(loc = 4, fontsize = 5) plt.show()

25 Further improvement Notice that hyperplane is not as accurately represent the division due to the outlier Can we do better when there are noise data or outliers? Yes, but we need to look beyond LP Idea: large margin leads to good generalization on the test data 3.3. Maximize Margin (Finally, it is Support Vector Machine) Distance (= margin) margin = ω Minimize ω to maximize the margin (closest samples from the decision line) maximize {minimum distance} Use gamma ( γ) as a weighting betwwen the followings: Bigger margin given robustness to outliers Hyperplane that has few (or no) errors minimize subject to ω + γ( u + υ) X ω + ω 0 u X ω + ω 0 ( υ) u 0 υ 0

26 In [3]: g = w = cvx.variable(3,) u = cvx.variable(n,) v = cvx.variable(m,) obj = cvx.minimize(cvx.norm(w,) + g*(np.ones((,n))*u + np.ones((,m))*v)) const = [X*w >= -u, X*w <= -(-v), u >= 0, v >= 0 ] prob = cvx.problem(obj, const).solve() w = w.value In [4]: xp = np.linspace(0,8,00).reshape(-,) yp = - w[,0]/w[,0]*xp - w[0,0]/w[,0] plt.figure(figsize=(0, 6)) plt.plot(x[:,], X[:,], 'ro', label='c') plt.plot(x[:,], X[:,], 'bo', label='c') plt.plot(xp, yp, '--k', label='svm') plt.plot(xp, yp-/w[,0], '-k') plt.plot(xp, yp+/w[,0], '-k') plt.xlim([0,8]) plt.xlabel('$x_$', fontsize = 0) plt.ylabel('$x_$', fontsize = 0) plt.legend(loc = 4, fontsize = 5) plt.show()

27 3.4. Logistic Regression Logistic regression is a classification algorithm don't be confused Perceptron: make use of sign of data SVM: make use of margin (minimum distance) Distance from a single data point We want to use distance information of ALL data points logistic regression Using Distances basic idea: to find the decision boundary (hyperplane) of optimization i h i g(x) = ω x = 0 such that maximizes Inequality of arithmetic and geometric means x + x + + x m x x x m m m and that equality holds if and only if x = x = = x m Roughly speaking, this optimization of two classes i max h i g(x) ω h = = x ω x ω ω tends to position a hyperplane in the middle of

28 Using all Distances with Outliers SVM vs. Logistic Regression

29 Sigmoid function (, + ) (0, ) We link or squeeze to for several reasons: If σ(z) is the sigmoid function, or the logistic function σ(z) = σ( ω x) = + e z + e ω x logistic function always generates a value between 0 and Crosses 0.5 at the origin, then flattens out Benefit of mapping via the logistic function monotonic: same or similar optimziation solution continuous and differentiable: good for gradient descent optimization probability or confidence: can be considered as probability Goal: we need to fit ω to our data P (y = + x, ω) = [0, ] + e ω x i max h i

30 Classified based on probability In [5]: m = 00 X0 = np.random.multivariate_normal([0, 0], np.eye(), m) X = np.random.multivariate_normal([0, 0], np.eye(), m) X = np.vstack([x0, X]) y = np.vstack([np.zeros([m,]), np.ones([m,])]) In [6]: from sklearn import linear_model clf = linear_model.logisticregression() clf.fit(x, np.ravel(y)) Out[6]: LogisticRegression(C=.0, class_weight=none, dual=false, fit_intercept=ru e, intercept_scaling=, max_iter=00, multi_class='ovr', n_jobs=, penalty='l', random_state=none, solver='liblinear', tol=0.000, verbose=0, warm_start=false) In [7]: X_new = np.array([, 0]).reshape(, -) pred = clf.predict(x_new) print(pred) [0.] In [8]: pred = clf.predict_proba(x_new) print(pred) [[ ]]

31 In [9]: plt.figure(figsize=(0, 6)) plt.plot(x0[:,0], X0[:,], '.b', label='class 0') plt.plot(x[:,0], X[:,], '.k', label='class ') plt.plot(x_new[0,0], X_new[0,], 'o', label='new Data', ms=5, mew=5) plt.title('logistic Regression', fontsize=5) plt.legend(loc='lower right', fontsize=5) plt.xlabel('x', fontsize=5) plt.ylabel('x', fontsize=5) plt.grid(alpha=0.3) plt.axis('equal') plt.show() In [30]: %%javascript $.getscript(' tebook_toc.js')

Support Vector Machine

Support Vector Machine Support Vector Machine by Prof. Seungchul Lee isystems Design Lab http://isystems.unist.ac.kr/ UNIS able of Contents I.. Classification (Linear) II.. Distance from a Line III. 3. Illustrative Example I.

More information

Perceptron. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents

Perceptron. by Prof. Seungchul Lee Industrial AI Lab  POSTECH. Table of Contents Perceptron by Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I.. Supervised Learning II.. Classification III. 3. Perceptron I. 3.. Linear Classifier II. 3..

More information

Linear Classification

Linear Classification Linear Classification by Prof. Seungchul Lee isystems Design Lab http://isystems.unist.ac.kr/ UNIS able of Contents I.. Supervised Learning II.. Classification III. 3. Perceptron I. 3.. Linear Classifier

More information

Logistic Regression. by Prof. Seungchul Lee isystems Design Lab UNIST. Table of Contents

Logistic Regression. by Prof. Seungchul Lee isystems Design Lab  UNIST. Table of Contents Logistic Regression by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Linear Classification: Logistic Regression I... Using all Distances II..2. Probabilistic

More information

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee Support Vector Machine Industrial AI Lab. Prof. Seungchul Lee Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories /

More information

Support Vector Machine. Industrial AI Lab.

Support Vector Machine. Industrial AI Lab. Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different

More information

(Artificial) Neural Networks in TensorFlow

(Artificial) Neural Networks in TensorFlow (Artificial) Neural Networks in TensorFlow By Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Recall Supervised Learning Setup II. 2. Artificial Neural

More information

(Artificial) Neural Networks in TensorFlow

(Artificial) Neural Networks in TensorFlow (Artificial) Neural Networks in TensorFlow By Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Recall Supervised Learning Setup II. 2. Artificial Neural

More information

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b. Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37 COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

ESS2222. Lecture 4 Linear model

ESS2222. Lecture 4 Linear model ESS2222 Lecture 4 Linear model Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Logistic Regression Predicting Continuous Target Variables Support Vector Machine (Some Details)

More information

Data Analysis and Machine Learning: Logistic Regression

Data Analysis and Machine Learning: Logistic Regression Data Analysis and Machine Learning: Logistic Regression Morten Hjorth-Jensen 1,2 1 Department of Physics, University of Oslo 2 Department of Physics and Astronomy and National Superconducting Cyclotron

More information

Gradient Descent Methods

Gradient Descent Methods Lab 18 Gradient Descent Methods Lab Objective: Many optimization methods fall under the umbrella of descent algorithms. The idea is to choose an initial guess, identify a direction from this point along

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

DM534 - Introduction to Computer Science

DM534 - Introduction to Computer Science Department of Mathematics and Computer Science University of Southern Denmark, Odense October 21, 2016 Marco Chiarandini DM534 - Introduction to Computer Science Training Session, Week 41-43, Autumn 2016

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

Linear and Logistic Regression. Dr. Xiaowei Huang

Linear and Logistic Regression. Dr. Xiaowei Huang Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics

More information

ESS2222. Lecture 5 Support Vector Machine

ESS2222. Lecture 5 Support Vector Machine ESS2222 Lecture 5 Support Vector Machine Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Support Vector Machine (SVM) Soft Margin SVM Multiclass Problems Image Recognition

More information

Training the linear classifier

Training the linear classifier 215, Training the linear classifier A natural way to train the classifier is to minimize the number of classification errors on the training data, i.e. choosing w so that the training error is minimized.

More information

Unsupervised Learning: Dimension Reduction

Unsupervised Learning: Dimension Reduction Unsupervised Learning: Diension Reduction by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Principal Coponent Analysis (PCA) II. 2. PCA Algorith I. 2..

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson Applied Machine Learning Lecture 5: Linear classifiers, continued Richard Johansson overview preliminaries logistic regression training a logistic regression classifier side note: multiclass linear classifiers

More information

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Binary Classification / Perceptron

Binary Classification / Perceptron Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

Lecture 16: Modern Classification (I) - Separating Hyperplanes

Lecture 16: Modern Classification (I) - Separating Hyperplanes Lecture 16: Modern Classification (I) - Separating Hyperplanes Outline 1 2 Separating Hyperplane Binary SVM for Separable Case Bayes Rule for Binary Problems Consider the simplest case: two classes are

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

INTRODUCTION TO DATA SCIENCE

INTRODUCTION TO DATA SCIENCE INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem

More information

Logistic Regression Trained with Different Loss Functions. Discussion

Logistic Regression Trained with Different Loss Functions. Discussion Logistic Regression Trained with Different Loss Functions Discussion CS640 Notations We restrict our discussions to the binary case. g(z) = g (z) = g(z) z h w (x) = g(wx) = + e z = g(z)( g(z)) + e wx =

More information

MIRA, SVM, k-nn. Lirong Xia

MIRA, SVM, k-nn. Lirong Xia MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

MAS212 Scientific Computing and Simulation

MAS212 Scientific Computing and Simulation MAS212 Scientific Computing and Simulation Dr. Sam Dolan School of Mathematics and Statistics, University of Sheffield Autumn 2017 http://sam-dolan.staff.shef.ac.uk/mas212/ G18 Hicks Building s.dolan@sheffield.ac.uk

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

The Perceptron Algorithm

The Perceptron Algorithm The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output

More information

Machine Learning A Geometric Approach

Machine Learning A Geometric Approach Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Support Vector Machines

Support Vector Machines Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781

More information

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7. Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by

More information

Lecture 11. Linear Soft Margin Support Vector Machines

Lecture 11. Linear Soft Margin Support Vector Machines CS142: Machine Learning Spring 2017 Lecture 11 Instructor: Pedro Felzenszwalb Scribes: Dan Xiang, Tyler Dae Devlin Linear Soft Margin Support Vector Machines We continue our discussion of linear soft margin

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

Optimization with Scipy (2)

Optimization with Scipy (2) Optimization with Scipy (2) Unconstrained Optimization Cont d & 1D optimization Harry Lee February 5, 2018 CEE 696 Table of contents 1. Unconstrained Optimization 2. 1D Optimization 3. Multi-dimensional

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Machine Learning and Data Mining. Linear classification. Kalev Kask

Machine Learning and Data Mining. Linear classification. Kalev Kask Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Exercise_set7_programming_exercises

Exercise_set7_programming_exercises Exercise_set7_programming_exercises May 13, 2018 1 Part 1(d) We start by defining some function that will be useful: The functions defining the system, and the Hamiltonian and angular momentum. In [1]:

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target

More information

Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!

Regression and Classification with Linear Models CMPSCI 383 Nov 15, 2011! Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011! 1 Todayʼs topics" Learning from Examples: brief review! Univariate Linear Regression! Batch gradient descent! Stochastic gradient

More information

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs) ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

x k+1 = x k + α k p k (13.1)

x k+1 = x k + α k p k (13.1) 13 Gradient Descent Methods Lab Objective: Iterative optimization methods choose a search direction and a step size at each iteration One simple choice for the search direction is the negative gradient,

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

Data Analysis and Machine Learning: Linear Regression and more Advanced Regression Analysis

Data Analysis and Machine Learning: Linear Regression and more Advanced Regression Analysis Data Analysis and Machine Learning: Linear Regression and more Advanced Regression Analysis Morten Hjorth-Jensen 1,2 1 Department of Physics, University of Oslo 2 Department of Physics and Astronomy and

More information

Constrained Optimization and Support Vector Machines

Constrained Optimization and Support Vector Machines Constrained Optimization and Support Vector Machines Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/

More information

Kernelized Perceptron Support Vector Machines

Kernelized Perceptron Support Vector Machines Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:

More information

Kernel Logistic Regression and the Import Vector Machine

Kernel Logistic Regression and the Import Vector Machine Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Linear, Binary SVM Classifiers

Linear, Binary SVM Classifiers Linear, Binary SVM Classifiers COMPSCI 37D Machine Learning COMPSCI 37D Machine Learning Linear, Binary SVM Classifiers / 6 Outline What Linear, Binary SVM Classifiers Do 2 Margin I 3 Loss and Regularized

More information