Logistic Regression. by Prof. Seungchul Lee isystems Design Lab UNIST. Table of Contents

Size: px

Start display at page:

Download "Logistic Regression. by Prof. Seungchul Lee isystems Design Lab UNIST. Table of Contents"

Beatrix Newton
6 years ago
Views:

1 Logistic Regression by Prof. Seungchul Lee isystes Design Lab UNIST Table of Contents I.. Linear Classification: Logistic Regression I... Using all Distances II..2. Probabilistic Approach (or MLE) III..3. CVXPY II. 2. Multiclass Classification III. 3. Non-linear Classification

2 . Linear Classification: Logistic Regression Logistic regression is a classification algorith - don't be confused.. Using all Distances Perceptron: ake use of sign of data SVM: ake use of argin (iniu distance) We want to use distance inforation of all data points logistic regression basic idea: to find the decision boundary (hyperplane) of i h i Inequality of arithetic and geoetric eans h + h 2 h h 2 and that equality holds if and only if h h 2 Roughly speaking, this optiization of two classes (, + ) (0, ) 2 = ax i h i We link or squeeze to for several reasons: g(x) = ω T x = 0 g(x) ω h = = T x ω x ω ω T such that axiizes tends to position a hyperplane in the iddle of

3 If σ(z) is the sigoid function, or the logistic function σ(z) = σ( ω x) = + T e z + e ωt x logistic function always generates a value between 0 and Crosses 0.5 at the origin, then flattens out In []: iport nupy as np iport atplotlib.pyplot as plt %atplotlib inline In [2]: z = np.linspace(-0,0,00) s = /(+np.exp(-z)) plt.figure(figsize=(0,6)) plt.plot(z, s) plt.xli([-0, 0]) plt.yli([-0.,.]) plt.show()

4 Benefit of apping via the logistic function onotonic: sae or siilar optiziation solution continuous and differentiable: good for gradient descent optiization probability or confidence: can be considered as probability Often we do note care about predicting the label Rather, we want to predict the label probabilities the probability that the label is the probability that the label is Goal: we need to fit ω to our data P (y = + x, ω) = [0, ] + e ωt x y P (y x, ω) + P (y = + x, ω) 0 P (y = 0 x, ω) = P (y = + x, ω)

5 .2. Probabilistic Approach (or MLE) Consider a rando variable where p [0, ], and is assued to depend on a vector of explanatory variables Then, the logistic odel has the for We can re-order the training data so y {0, } P(y = +) = p, x,, x q y = + p = = + e x p = e ωt x + for, the outcoe is, and for x q+,, x, the outcoe is y = 0 P(y = 0) = p e ω T + ωt ωt x e x x R n The likelihood function q L = ( ) ( ) i= p i i=q+ p i i h i the log likelihood function l q l(ω) = log L = log + log( ) i= q i= p i i=q+ q exp( ω T x i ) = log + log + exp( ) ω T x i i=q+ = ( ) log( + exp( )) i= ω T x i i= ω T x i Since is a concave function of ω, the logistic regression proble can be solved as a convex optiization proble p i + exp( ) ω T x i ω^ = arg ax l(ω) ω

6 .3. CVXPY ω ω ω 2 ω 3 =, x = x x 2 X ( ) = = ( x (3) ) T ( x () ) T x (2) T x () x (2) x (3) x () 2 x (2) 2 x (3) 2 Source: Section 7.. fro (

7 In [3]: = 00 w = np.array([[-4], [2], []]) X = np.hstack([np.ones([,]), 2*np.rando.rand(,), 4*np.rando.rand(,)]) w = np.asatrix(w) X = np.asatrix(x) y = (np.exp(x*w)/(+np.exp(x*w))) > 0.5 C = np.where(y == True)[0] C2 = np.where(y == False)[0] y = np.epty([,]) y[c] = y[c2] = 0 y = np.asatrix(y) plt.figure(figsize = (0,6)) plt.plot(x[c,], X[C,2], 'ro', label='c') plt.plot(x[c2,], X[C2,2], 'bo', label='c2') plt.legend() plt.show()

8 q l(ω) = log L = log + log( ) i= q Refer to cvx functions ( i= scalar function: cvx.su_entries(x) = ij x ij eleentwise function: cvx.logistic(x) = p i i=q+ q exp( ω T x i ) = log + log + exp( ) ω T x i i=q+ = ( ) log( + exp( )) i= ω T x i i= log( + e x ) ω T x i p i + exp( ) ω T x i In [4]: iport cvxpy as cvx w = cvx.variable(3, ) obj = cvx.maxiize(y.t*x*w - cvx.su_entries(cvx.logistic(x*w))) prob = cvx.proble(obj).solve() w = w.value xp = np.linspace(0,2,00).reshape(-,) yp = - w[,0]/w[2,0]*xp - w[0,0]/w[2,0] plt.figure(figsize = (0,6)) plt.plot(x[c,], X[C,2], 'ro', label='c') plt.plot(x[c2,], X[C2,2], 'bo', label='c2') plt.plot(xp, yp, 'k', label='logistic Regression') plt.legend() plt.show()

9 In a ore copact for Change y {0, +} y {, +} Consider the following function Log-likelihood for copuational convenience P(y = +) = p = σ( ω T x), P(y = ) = p = σ( ω T x) = σ( ω T x) P (y x, ω) = σ (yω T x) = [0, ] + exp( yω T x) n= n= n= l(ω) = log L = log P (y x, ω) = log P (, ω) y n x n = log P (, ω) = log y n x n + exp( ) y n ω T x n = log( + exp( )) n= y n ω T x n MLE solution ω^ = arg ax log( + exp( )) ω n= y n ω T x n = arg in log( + exp( )) ω n= y n ω T x n

10 In [5]: y = np.epty([,]) y[c] = y[c2] = - y = np.asatrix(y) w = cvx.variable(3, ) obj = cvx.miniize(cvx.su_entries(cvx.logistic(-cvx.ul_elewise(y,x*w)))) prob = cvx.proble(obj).solve() w = w.value xp = np.linspace(0,2,00).reshape(-,) yp = - w[,0]/w[2,0]*xp - w[0,0]/w[2,0] plt.figure(figsize = (0,6)) plt.plot(x[c,], X[C,2], 'ro', label='c') plt.plot(x[c2,], X[C2,2], 'bo', label='c2') plt.plot(xp, yp, 'k', label='logistic Regression') plt.legend() plt.show()

11 2. Multiclass Classification Generalization to ore than 2 classes is straightforward one vs. all (one vs. rest) one vs. one Using the soft-ax function instead of the logistic function (refer to UFLDL Tutorial ( see the as probability exp ( ω T k x) P (y = k x, ω) = [0, ] exp ( x) k ω T k We aintain a separator weight vector ω k for each class k 3. Non-linear Classification Sae idea as for linear regression: non-linear features, either explicit or iplicit Kernels

12 In [6]: X = np.array([[-., 0], [-0.3, 0.], [-0.9, ],[0.8, 0.4],[0.4, 0.9],[0.3,-0.6], [-0.5, 0.3], [-0.8, 0.6],[-0.5, -0.5]]) X2 = np.array([[-, -.3], [-.6, 2.2], [0.9, -0.7],[.6, 0.5],[.8, -.],[.6,.6],[-.6, -.7], [-.4,.8],[.6, -0.9],[0, -.6],[0.3,.7],[-.6, 0],[-2.,0.2]]) X = np.asatrix(x) X2 = np.asatrix(x2) plt.figure(figsize=(0, 6)) plt.plot(x[:, 0], X[:,], 'ro', label='c') plt.plot(x2[:, 0], X2[:,], 'bo', label='c2') plt.axis([-3,3,-3,3]) plt.legend(loc = 4, fontsize = 5) plt.show() x x = [ ] z = ϕ(x) = x 2 2 x 2 x 2 x 2 2 x x 2 x 2

13 In [7]: N = X.shape[0] M = X2.shape[0] X = np.vstack([x, X2]) y = np.vstack([np.ones([n,]), -np.ones([m,])]) X = np.asatrix(x) y = np.asatrix(y) = N + M Z = np.hstack([np.ones([,]), np.sqrt(2)*x[:,0], np.sqrt(2)*x[:,], np.square(x[:,0]), \ np.sqrt(2)*np.ultiply(x[:,0],x[:,]), np.square(x[:,])]) w = cvx.variable(6, ) obj = cvx.miniize(cvx.su_entries(cvx.logistic(-cvx.ul_elewise(y,z*w)))) prob = cvx.proble(obj).solve() w = w.value

14 In [8]: # to plot [Xgr, X2gr] = np.eshgrid(np.arange(-3,3,0.), np.arange(-3,3,0.)) test_x = np.hstack([xgr.reshape(-,), X2gr.reshape(-,)]) test_x = np.asatrix(test_x) = test_x.shape[0] test_z = np.hstack([np.ones([,]), np.sqrt(2)*test_x[:,0], np.sqrt(2)*test_x[:,], np.square(test_x[:,0]), \ np.sqrt(2)*np.ultiply(test_x[:,0],test_x[:,]), np.square(test_x[:,])]) q = test_z*w B = [] for i in range(): if q[i,0] > 0: B.append(test_X[i,:]) B = np.vstack(b) plt.figure(figsize=(0, 6)) plt.plot(x[:,0], X[:,], 'ro', label='c') plt.plot(x2[:,0], X2[:,], 'bo', label='c2') plt.plot(b[:,0], B[:,], 'k.', label='logistic Regression') plt.legend() plt.show() In [9]: %%javascript $.getscript(' js')

Support Vector Machine

Support Vector Machine by Prof. Seungchul Lee isystems Design Lab http://isystems.unist.ac.kr/ UNIS able of Contents I.. Classification (Linear) II.. Distance from a Line III. 3. Illustrative Example I.