Supervised Learning. 1. Optimization. without Scikit Learn. by Prof. Seungchul Lee Industrial AI Lab
|
|
- Clinton Ball
- 6 years ago
- Views:
Transcription
1 Supervised Learning without Scikit Learn by Prof. Seungchul Lee Industrial AI Lab POSECH able of Contents I.. Optimization II.. Linear Regression III. 3. Classification (Linear) I. 3.. Distance II. 3.. Illustrative Example I Optimization Formulation II Outlier III Optimization Formulation III Maximize Margin (Finally, it is Support Vector Machine) IV Logistic Regression. Optimization an important tool in ) engineering problem solving and ) decision science peolple optimize nature optimizes
2 (source: ( 3 key components. objective. decision variable or unknown 3. constraints Procedures. he process of identifying objective, variables, and constraints for a given problem is known as "modeling". Once the model has been formulated, optimization algorithm can be used to find its solutions. In mathematical expression min x subject to f(x) g i (x) 0, i =,, m Remarks) equivalent min f(x) x (x) 0 g i h(x) = 0 max f(x) x g i (x) 0 h(x) 0 and { h(x) 0 he good news: for many classes of optimization problems, people have already done all the "hardwork" of developing numerical algorithms
3 max x subject to x + x x + x 9 x + x 5 x x 5 min x subject to x [ ] [ ] x x [ ] [ 9 ] [ ] x 5 x [ ] [ ] [ ] 5 x In []: import numpy as np import matplotlib.pyplot as plt import cvxpy as cvx f = np.array([[, ]]) A = np.array([[, ], [, ]]) b = np.array([[9], [5]]) lb = np.array([[], [5]]) x = cvx.variable(,) obj = cvx.minimize(-f*x) const = [A*x <= b, lb <= x] prob = cvx.problem(obj, const).solve() print (x.value) [[.] [ 7.]]. Linear Regression Begin by considering linear regression (easy to extend to more comlex predictions later on) Given { x : inputs i : outputs, find θ and θ yi y^i : predicted output x x x =, y = = + x m y y y m y^i θ x i θ θ θ = [ ] : Model parameters θ y^i = f( x i, θ) in general
4 In many cases, a linear model to predict y i can be used m θ θ y^i = θx i + θ such that min ( y^i y i ), i=
5 In []: import numpy as np import matplotlib.pyplot as plt # data points in column vector [input, output] x = np.array([0., 0.4, 0.7,.,.3,.7,.,.8, 3.0, 4.0, 4.3, 4.4, 4. 9]).reshape(-, ) y = np.array([0.5, 0.9,.,.5,.5,.0,.,.8,.7, 3.0, 3.5, 3.7, 3. 9]).reshape(-, ) # to plot plt.figure(figsize=(0, 6)) plt.plot(x, y, 'ko', label="data") plt.xlabel('x', fontsize=5) plt.ylabel('y', fontsize=5) plt.axis('scaled') plt.grid(alpha=0.3) plt.xlim([0, 5]) plt.show()
6 Use CVXPY optimization (least squared) For convenience, we define a function that maps inputs to feature vectors, ϕ x i θ y^i = [ ] [ ] θ x i θ = [ ] [ ] = ϕ ( x i )θ θ, x i feature vector ϕ( x i ) = [ ] x ϕ ( x ) y^ x ϕ ( x ) y^ Φ = = = = Φθ y^ x m ϕ ( ) x m y^m Model parameter estimation min θ y^ y = min Φθ y θ In [3]: import cvxpy as cvx m = y.shape[0] #A = np.hstack([x, np.ones([m, ])]) A = np.hstack([x, x**0]) A = np.asmatrix(a) theta = cvx.variable(, ) obj = cvx.minimize(cvx.norm(a*theta-y, )) cvx.problem(obj,[]).solve() print('theta:\n', theta.value) theta: [[ ] [ ]]
7 In [4]: # to plot plt.figure(figsize=(0, 6)) plt.title('$l_$ Regression', fontsize=5) plt.xlabel('x', fontsize=5) plt.ylabel('y', fontsize=5) plt.plot(x, y, 'ko', label="data") # to plot a straight line (fitted line) xp = np.arange(0, 5, 0.0).reshape(-, ) theta = theta.value yp = theta[0,0]*xp + theta[,0] plt.plot(xp, yp, 'r', linewidth=, label="$l_$") plt.legend(fontsize=5) plt.axis('scaled') plt.grid(alpha=0.3) plt.xlim([0, 5]) plt.show() 3. Classification (Linear) Figure out, autonomously, which category (or class) an unknown item should be categorized into Number of categories / classes Binary: different classes Multiclass: more than classes Feature he measurable parts that make up the unknown item (or the information you have available to categorize)
8 Perceptron: make use of sign of data Discuss it later Logistic regression is a classification algorithm don't be confused o find a classification boundary Sign Sign with respect to a line ω ω = [ x ], x = [ ] g(x) = ω x + ω x + ω 0 = ω x + ω 0 ω x ω 0 ω ω = ω, x = x g(x) = ω 0 + ω x + ω x = ω x x
9 Perceptron Hyperplane Separates a D-dimensional space into two half-spaces Defined by an outward pointing normal vector ω is orthogonal to any vector lying on the hyperplane How to find ω All data in class g( x) > 0 ω All data in class 0 g( ω x) < 0
10 Perceptron Algorithm he perceptron implements h(x) = sign( ω x) Given the training set (, ), (, ),, (, ) where {, } x y x y x N y N y i ) pick a misclassified point sign( ω x n ) y n ) and update the weight vector ω ω + y n x n Why perceptron updates work? Let's look at a misclassified positive example ( ) perceptron (wrongly) thinks ω < 0 old x n y n = + updates would be ω new ω newx n = + = + ω old y n x n ω old x n = ( + = + ω old x n ) x n ω old x n x nx n hus ω newx n is less negative than ω old x n In [5]: import numpy as np import matplotlib.pyplot as plt % matplotlib inline
11 In [6]: #training data gerneration m = 00 x = 8*np.random.rand(m, ) x = 7*np.random.rand(m, ) - 4 g0 = 0.8*x + x - 3 g = g0 - g = g0 + In [7]: C = np.where(g >= 0) C = np.where(g < 0) print(c) (array([ 0,, 4, 5, 6, 7, 0, 7,, 4, 5, 7, 8, 30, 3, 38, 5, 5, 53, 56, 57, 6, 6, 64, 70, 78, 85, 86, 89, 98], dtype=int64), a rray([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64)) In [8]: C = np.where(g >= 0)[0] C = np.where(g < 0)[0] print(c.shape) print(c.shape) (30,) (4,) In [9]: plt.figure(figsize=(0, 6)) plt.plot(x[c], x[c], 'ro', label='c') plt.plot(x[c], x[c], 'bo', label='c') plt.title('linearly seperable classes', fontsize=5) plt.legend(loc='upper left', fontsize=5) plt.xlabel(r'$x_$', fontsize=0) plt.ylabel(r'$x_$', fontsize=0) plt.show()
12 x y = = = ( x () ) ( x () ) ( x (3) ) ( x (m) ) y () y () y (3) y (m) x () x () x (3) x (m) x () x () x (3) x (m) In [0]: X = np.hstack([np.ones([c.shape[0],]), x[c], x[c]]) X = np.hstack([np.ones([c.shape[0],]), x[c], x[c]]) X = np.vstack([x, X]) y = np.vstack([np.ones([c.shape[0],]), -np.ones([c.shape[0],])]) X = np.asmatrix(x) y = np.asmatrix(y) where (x, y) is a misclassified training point ω 3 ω = ω ω ω ω + yx In []: w = np.ones([3,]) w = np.asmatrix(w) n_iter = y.shape[0] for k in range(n_iter): for i in range(n_iter): if y[i,0]!= np.sign(x[i,:]*w)[0,0]: w += y[i,0]*x[i,:]. print(w) [[-3. ] [ ] [ ]] Not a unique solution g(x) = ω x + ω 0 = ω x + ω x + ω 0 = 0 ω x = ω 0 x ω ω
13 In []: xp = np.linspace(0,8,00).reshape(-,) xp = - w[,0]/w[,0]*xp - w[0,0]/w[,0] plt.figure(figsize=(0, 6)) plt.scatter(x[c], x[c], c='r', s=50, label='c') plt.scatter(x[c], x[c], c='b', s=50, label='c') plt.plot(xp, xp, c='k', label='perceptron') plt.xlim([0,8]) plt.xlabel('$x_$', fontsize = 0) plt.ylabel('$x_$', fontsize = 0) plt.legend(loc = 4, fontsize = 5) plt.show() Perceptron 3.. Distance ω = [ ], x = [ ] g(x) = x + = + + ω ω x x ω ω 0 ω x ω x ω 0
14 Find a distance between g(x) = and g(x) = suppose g( x ) =, g( x ) = ω x + ω 0 = ω ( x x ) = ω x + ω 0 = s =, = ( ) = ω ω x x ω ω x x ω
15 3.. Illustrative Example Binary classification: C and C Features: the coordinate of ith data x x = [ ] x Is it possible to distinguish between C and C by its coordinates? We need to find a separating hyperplane (or a line in D) ω x + ω x + ω 0 = 0 x [ ] [ ω ω ] + ω 0 = 0 x ω x + ω 0 = 0 In [3]: import numpy as np import matplotlib.pyplot as plt #training data gerneration x = 8*np.random.rand(00, ) x = 7*np.random.rand(00, ) - 4 g0 = 0.8*x + x - 3 g = g0 - g = g0 + C = np.where(g >= 0)[0] C = np.where(g < 0)[0]
16 In [4]: xp = np.linspace(0,8,00).reshape(-,) ypt = -0.8*xp + 3 plt.figure(figsize=(0, 6)) plt.plot(x[c], x[c], 'ro', label='c') plt.plot(x[c], x[c], 'bo', label='c') plt.plot(xp, ypt, '--k', label='rue') plt.title('linearly and strictly separable classes', fontweight = 'bold', fo ntsize = 5) plt.xlabel('$x_$', fontsize = 0) plt.ylabel('$x_$', fontsize = 0) plt.legend(loc = 4) plt.xlim([0, 8]) plt.ylim([-4, 3]) plt.show()
17 Given: Hyperplane defined by ω and ω 0 Animals coordinates (or features) x Decision making: ω x + ω0 > 0 x belongs to C ω x + ω 0 < 0 x belongs to C Find and such that given ω ω 0 or ω ω 0 x x + = 0 Find and such that given and given ω ω 0 ω ω 0 x C ω x + ω 0 > x C x + < Same problem if strictly separable ω x + ω 0 > b ω ω 0 b ω ω 0 x + > b x + >
18 In [5]: # see how data are generated xp = np.linspace(0,8,00).reshape(-,) ypt = -0.8*xp + 3 plt.figure(figsize=(0, 6)) plt.plot(x[c], x[c], 'ro', label='c') plt.plot(x[c], x[c], 'bo', label='c') plt.plot(xp, ypt, '--k', label='true') plt.plot(xp, ypt-, '-k') plt.plot(xp, ypt+, '-k') plt.title('linearly and strictly separable classes', fontweight = 'bold', fo ntsize = 5) plt.xlabel('$x_$', fontsize = 0) plt.ylabel('$x_$', fontsize = 0) plt.legend(loc = 4) plt.xlim([0, 8]) plt.ylim([-4, 3]) plt.show() 3... Optimization Formulation n (= ) m = N + M features data points in training set x (i) x = [ (i) ω ] with ω = [ ] or = with ω = x (i) x (i) ω x (i) x (i) ω 0 ω ω N belongs to C in training set M belongs to C in training set ω and ω 0 are the unknown variables
19 minimize something minimize something subject to + + ω x () ω 0 ω x () ω 0 ω x (N) ω 0 + or subject to ω x () ω x () ω x (N) ω x (N+) ω 0 ω x (N+) ω 0 ω x (N+M) ω ω x (N+) ω x (N+) ω x (N+M) Code (CVXPY) X ( x () ) ( x () ) = = ( x (N) ) x () x () x (N) x () x () x (N) X = = ( x (N+) ) ( x (N+) ) ( x (N+M) ) x (N+) x (N+) x (N+M) x (N+) x (N+) x (N+M) minimize subject to something X ω X ω minimize subject to something X ω X ω In [6]: # CVXPY using simple classification import cvxpy as cvx N = C.shape[0] M = C.shape[0] X = np.hstack([np.ones([n,]), x[c], x[c]]) X = np.hstack([np.ones([m,]), x[c], x[c]]) X = np.asmatrix(x) X = np.asmatrix(x)
20 In [7]: w = cvx.variable(3,) obj = cvx.minimize() const = [X*w >=, X*w <= -] prob = cvx.problem(obj, const).solve() w = w.value In [8]: xp = np.linspace(0,8,00).reshape(-,) yp = - w[,0]/w[,0]*xp - w[0,0]/w[,0] plt.figure(figsize=(0, 6)) plt.plot(x[:,], X[:,], 'ro', label='c') plt.plot(x[:,], X[:,], 'bo', label='c') plt.plot(xp, yp, 'k', label='svm') plt.plot(xp, ypt, '--k', label='true') plt.xlim([0,8]) plt.xlabel('$x_$', fontsize = 0) plt.ylabel('$x_$', fontsize = 0) plt.legend(loc = 4, fontsize = 5) plt.show() 3... Outlier Note that in the real world, you may have noise, errors, or outliers that do not accurately represent the actual phenomena Non-separable case No solutions (hyperplane) exist We will allow some training examples to be misclassified! but we want their number to be minimized
21 In [9]: X = np.hstack([np.ones([n,]), x[c], x[c]]) X = np.hstack([np.ones([m,]), x[c], x[c]]) outlier = np.array([, 3, ]).reshape(-,) X = np.vstack([x, outlier.]) X = np.asmatrix(x) X = np.asmatrix(x) plt.figure(figsize=(0, 6)) plt.plot(x[:,], X[:,], 'ro', label='c') plt.plot(x[:,], X[:,], 'bo', label='c') plt.xlim([0,8]) plt.xlabel('$x_$', fontsize = 0) plt.ylabel('$x_$', fontsize = 0) plt.legend(loc = 4, fontsize = 5) plt.show() minimize subject to something X ω X ω In [0]: w = cvx.variable(3,) obj = cvx.minimize() const = [X*w >=, X*w <= -] prob = cvx.problem(obj, const).solve() print(w.value) None No solutions (hyperplane) exist We will allow some training examples to be misclassified! but we want their number to be minimized
22 3..3. Optimization Formulation n (= ) m = N + M features data points in a training set x i = x (i) with ω = x (i) N belongs to C in training set M belongs to C in training set ω and ω 0 are the variables (unknown) ω 0 ω ω minimize subject to something X ω X ω For the non-separable case, we relex the above constraints Need slack variables u and υ where all are positive he optimization problem for the non-separable case minimize subject to N u i i= + M υ i i= ω x () u ω x () u ω x (N) u N ( ) ( ) ω x (N+) υ ω x (N+) υ ω x (N+M) { u 0 v 0 ( ) υ M
23 Expressed in a matrix form X ( x () ) ( x () ) = = ( x (N) ) x () x () x (N) x () x () x (N) X = = ( x (N+) ) ( x (N+) ) ( x (N+M) ) x (N+) x (N+) x (N+M) x (N+) x (N+) x (N+M) u υ = = u u N υ υ M minimize subject to u + υ X ω u X ω ( υ) u 0 υ 0
24 In []: X = np.hstack([np.ones([c.shape[0],]), x[c], x[c]]) X = np.hstack([np.ones([c.shape[0],]), x[c], x[c]]) outlier = np.array([,, ]).reshape(-,) X = np.vstack([x, outlier.]) X = np.asmatrix(x) X = np.asmatrix(x) N = X.shape[0] M = X.shape[0] w = cvx.variable(3,) u = cvx.variable(n,) v = cvx.variable(m,) obj = cvx.minimize(np.ones((,n))*u + np.ones((,m))*v) const = [X*w >= -u, X*w <= -(-v), u >= 0, v >= 0 ] prob = cvx.problem(obj, const).solve() w = w.value In []: xp = np.linspace(0,8,00).reshape(-,) yp = - w[,0]/w[,0]*xp - w[0,0]/w[,0] plt.figure(figsize=(0, 6)) plt.plot(x[:,], X[:,], 'ro', label='c') plt.plot(x[:,], X[:,], 'bo', label='c') plt.plot(xp, yp, '--k', label='svm') plt.plot(xp, yp-/w[,0], '-k') plt.plot(xp, yp+/w[,0], '-k') plt.xlim([0,8]) plt.xlabel('$x_$', fontsize = 0) plt.ylabel('$x_$', fontsize = 0) plt.legend(loc = 4, fontsize = 5) plt.show()
25 Further improvement Notice that hyperplane is not as accurately represent the division due to the outlier Can we do better when there are noise data or outliers? Yes, but we need to look beyond LP Idea: large margin leads to good generalization on the test data 3.3. Maximize Margin (Finally, it is Support Vector Machine) Distance (= margin) margin = ω Minimize ω to maximize the margin (closest samples from the decision line) maximize {minimum distance} Use gamma ( γ) as a weighting betwwen the followings: Bigger margin given robustness to outliers Hyperplane that has few (or no) errors minimize subject to ω + γ( u + υ) X ω + ω 0 u X ω + ω 0 ( υ) u 0 υ 0
26 In [3]: g = w = cvx.variable(3,) u = cvx.variable(n,) v = cvx.variable(m,) obj = cvx.minimize(cvx.norm(w,) + g*(np.ones((,n))*u + np.ones((,m))*v)) const = [X*w >= -u, X*w <= -(-v), u >= 0, v >= 0 ] prob = cvx.problem(obj, const).solve() w = w.value In [4]: xp = np.linspace(0,8,00).reshape(-,) yp = - w[,0]/w[,0]*xp - w[0,0]/w[,0] plt.figure(figsize=(0, 6)) plt.plot(x[:,], X[:,], 'ro', label='c') plt.plot(x[:,], X[:,], 'bo', label='c') plt.plot(xp, yp, '--k', label='svm') plt.plot(xp, yp-/w[,0], '-k') plt.plot(xp, yp+/w[,0], '-k') plt.xlim([0,8]) plt.xlabel('$x_$', fontsize = 0) plt.ylabel('$x_$', fontsize = 0) plt.legend(loc = 4, fontsize = 5) plt.show()
27 3.4. Logistic Regression Logistic regression is a classification algorithm don't be confused Perceptron: make use of sign of data SVM: make use of margin (minimum distance) Distance from a single data point We want to use distance information of ALL data points logistic regression Using Distances basic idea: to find the decision boundary (hyperplane) of optimization i h i g(x) = ω x = 0 such that maximizes Inequality of arithmetic and geometric means x + x + + x m x x x m m m and that equality holds if and only if x = x = = x m Roughly speaking, this optimization of two classes i max h i g(x) ω h = = x ω x ω ω tends to position a hyperplane in the middle of
28 Using all Distances with Outliers SVM vs. Logistic Regression
29 Sigmoid function (, + ) (0, ) We link or squeeze to for several reasons: If σ(z) is the sigmoid function, or the logistic function σ(z) = σ( ω x) = + e z + e ω x logistic function always generates a value between 0 and Crosses 0.5 at the origin, then flattens out Benefit of mapping via the logistic function monotonic: same or similar optimziation solution continuous and differentiable: good for gradient descent optimization probability or confidence: can be considered as probability Goal: we need to fit ω to our data P (y = + x, ω) = [0, ] + e ω x i max h i
30 Classified based on probability In [5]: m = 00 X0 = np.random.multivariate_normal([0, 0], np.eye(), m) X = np.random.multivariate_normal([0, 0], np.eye(), m) X = np.vstack([x0, X]) y = np.vstack([np.zeros([m,]), np.ones([m,])]) In [6]: from sklearn import linear_model clf = linear_model.logisticregression() clf.fit(x, np.ravel(y)) Out[6]: LogisticRegression(C=.0, class_weight=none, dual=false, fit_intercept=ru e, intercept_scaling=, max_iter=00, multi_class='ovr', n_jobs=, penalty='l', random_state=none, solver='liblinear', tol=0.000, verbose=0, warm_start=false) In [7]: X_new = np.array([, 0]).reshape(, -) pred = clf.predict(x_new) print(pred) [0.] In [8]: pred = clf.predict_proba(x_new) print(pred) [[ ]]
31 In [9]: plt.figure(figsize=(0, 6)) plt.plot(x0[:,0], X0[:,], '.b', label='class 0') plt.plot(x[:,0], X[:,], '.k', label='class ') plt.plot(x_new[0,0], X_new[0,], 'o', label='new Data', ms=5, mew=5) plt.title('logistic Regression', fontsize=5) plt.legend(loc='lower right', fontsize=5) plt.xlabel('x', fontsize=5) plt.ylabel('x', fontsize=5) plt.grid(alpha=0.3) plt.axis('equal') plt.show() In [30]: %%javascript $.getscript(' tebook_toc.js')
Support Vector Machine
Support Vector Machine by Prof. Seungchul Lee isystems Design Lab http://isystems.unist.ac.kr/ UNIS able of Contents I.. Classification (Linear) II.. Distance from a Line III. 3. Illustrative Example I.
More informationPerceptron. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents
Perceptron by Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I.. Supervised Learning II.. Classification III. 3. Perceptron I. 3.. Linear Classifier II. 3..
More informationLinear Classification
Linear Classification by Prof. Seungchul Lee isystems Design Lab http://isystems.unist.ac.kr/ UNIS able of Contents I.. Supervised Learning II.. Classification III. 3. Perceptron I. 3.. Linear Classifier
More informationLogistic Regression. by Prof. Seungchul Lee isystems Design Lab UNIST. Table of Contents
Logistic Regression by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Linear Classification: Logistic Regression I... Using all Distances II..2. Probabilistic
More informationSupport Vector Machine. Industrial AI Lab. Prof. Seungchul Lee
Support Vector Machine Industrial AI Lab. Prof. Seungchul Lee Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories /
More informationSupport Vector Machine. Industrial AI Lab.
Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different
More information(Artificial) Neural Networks in TensorFlow
(Artificial) Neural Networks in TensorFlow By Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Recall Supervised Learning Setup II. 2. Artificial Neural
More information(Artificial) Neural Networks in TensorFlow
(Artificial) Neural Networks in TensorFlow By Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Recall Supervised Learning Setup II. 2. Artificial Neural
More informationConjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.
Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationCOMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37
COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationESS2222. Lecture 4 Linear model
ESS2222 Lecture 4 Linear model Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Logistic Regression Predicting Continuous Target Variables Support Vector Machine (Some Details)
More informationData Analysis and Machine Learning: Logistic Regression
Data Analysis and Machine Learning: Logistic Regression Morten Hjorth-Jensen 1,2 1 Department of Physics, University of Oslo 2 Department of Physics and Astronomy and National Superconducting Cyclotron
More informationGradient Descent Methods
Lab 18 Gradient Descent Methods Lab Objective: Many optimization methods fall under the umbrella of descent algorithms. The idea is to choose an initial guess, identify a direction from this point along
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationDM534 - Introduction to Computer Science
Department of Mathematics and Computer Science University of Southern Denmark, Odense October 21, 2016 Marco Chiarandini DM534 - Introduction to Computer Science Training Session, Week 41-43, Autumn 2016
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationLinear and Logistic Regression. Dr. Xiaowei Huang
Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics
More informationESS2222. Lecture 5 Support Vector Machine
ESS2222 Lecture 5 Support Vector Machine Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Support Vector Machine (SVM) Soft Margin SVM Multiclass Problems Image Recognition
More informationTraining the linear classifier
215, Training the linear classifier A natural way to train the classifier is to minimize the number of classification errors on the training data, i.e. choosing w so that the training error is minimized.
More informationUnsupervised Learning: Dimension Reduction
Unsupervised Learning: Diension Reduction by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Principal Coponent Analysis (PCA) II. 2. PCA Algorith I. 2..
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationApplied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson
Applied Machine Learning Lecture 5: Linear classifiers, continued Richard Johansson overview preliminaries logistic regression training a logistic regression classifier side note: multiclass linear classifiers
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationBinary Classification / Perceptron
Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationCSC 411: Lecture 04: Logistic Regression
CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic
More informationLecture 16: Modern Classification (I) - Separating Hyperplanes
Lecture 16: Modern Classification (I) - Separating Hyperplanes Outline 1 2 Separating Hyperplane Binary SVM for Separable Case Bayes Rule for Binary Problems Consider the simplest case: two classes are
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationINTRODUCTION TO DATA SCIENCE
INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem
More informationLogistic Regression Trained with Different Loss Functions. Discussion
Logistic Regression Trained with Different Loss Functions Discussion CS640 Notations We restrict our discussions to the binary case. g(z) = g (z) = g(z) z h w (x) = g(wx) = + e z = g(z)( g(z)) + e wx =
More informationMIRA, SVM, k-nn. Lirong Xia
MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationMAS212 Scientific Computing and Simulation
MAS212 Scientific Computing and Simulation Dr. Sam Dolan School of Mathematics and Statistics, University of Sheffield Autumn 2017 http://sam-dolan.staff.shef.ac.uk/mas212/ G18 Hicks Building s.dolan@sheffield.ac.uk
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationThe Perceptron Algorithm
The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output
More informationMachine Learning A Geometric Approach
Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationComments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert
Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More informationLecture 11. Linear Soft Margin Support Vector Machines
CS142: Machine Learning Spring 2017 Lecture 11 Instructor: Pedro Felzenszwalb Scribes: Dan Xiang, Tyler Dae Devlin Linear Soft Margin Support Vector Machines We continue our discussion of linear soft margin
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationLecture Notes on Support Vector Machine
Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is
More informationOptimization with Scipy (2)
Optimization with Scipy (2) Unconstrained Optimization Cont d & 1D optimization Harry Lee February 5, 2018 CEE 696 Table of contents 1. Unconstrained Optimization 2. 1D Optimization 3. Multi-dimensional
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationMachine Learning and Data Mining. Linear classification. Kalev Kask
Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More informationExercise_set7_programming_exercises
Exercise_set7_programming_exercises May 13, 2018 1 Part 1(d) We start by defining some function that will be useful: The functions defining the system, and the Hamiltonian and angular momentum. In [1]:
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationRegression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!
Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011! 1 Todayʼs topics" Learning from Examples: brief review! Univariate Linear Regression! Batch gradient descent! Stochastic gradient
More informationSupport Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationConvex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)
ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationx k+1 = x k + α k p k (13.1)
13 Gradient Descent Methods Lab Objective: Iterative optimization methods choose a search direction and a step size at each iteration One simple choice for the search direction is the negative gradient,
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationData Analysis and Machine Learning: Linear Regression and more Advanced Regression Analysis
Data Analysis and Machine Learning: Linear Regression and more Advanced Regression Analysis Morten Hjorth-Jensen 1,2 1 Department of Physics, University of Oslo 2 Department of Physics and Astronomy and
More informationConstrained Optimization and Support Vector Machines
Constrained Optimization and Support Vector Machines Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationKernel Logistic Regression and the Import Vector Machine
Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationMachine Learning Basics
Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationLinear, Binary SVM Classifiers
Linear, Binary SVM Classifiers COMPSCI 37D Machine Learning COMPSCI 37D Machine Learning Linear, Binary SVM Classifiers / 6 Outline What Linear, Binary SVM Classifiers Do 2 Margin I 3 Loss and Regularized
More information