Aprendizagem Automática. Logistic Regression. Ludwig Krippahl

Size: px

Start display at page:

Download "Aprendizagem Automática. Logistic Regression. Ludwig Krippahl"

Naomi Collins
6 years ago
Views:

1 Aprendizagem Automática Logistic Regression Ludwig Krippahl

2 Logistic Regression Summary Classification, introduction Linear separability Logistic regression, and playing in higher dimensions 1

3 Logistic Regression Separability 2

4 Separability Classes are linearly separable if they can be separated by some linear combination of feature values (a hyperplane). 3

5 Separability This frontier is a linear discriminant. 4

6 Separability Otherwise, it is not linearly separable. 5

7 Separability Classification In regression, we defined a straight line with two parameters: y = θ 1 x + θ 2 But in classification we need to know which side of the line is which More generally, a hyperplane in N dimensions This we can do if we define the hyperplane with a perpendicular vector: Note: x + = 0 w T x w 0 is an N dimensional vector 6

8 Separability + = 0 w T x w 0 7

9 Separability Hyperplane Frontier given by function y( x ) = w T x + w 0 that is positive on one side of the hyperplane and negative on the other. So one class is the positive one, the other the negative one. How do we find the discriminant? 8

10 Logistic Regression Wrong Answer 9

11 Wrong Answer Fitting with LMS w w 0 y( ) = + x w T x w 0 We want to find the best and for This function is negative one one side and positive on the other So we can try minimizing the squared error, considering classes 1 and -1: = N E j=1 ) 2 (y( x j ) tj 10

12 Wrong Answer Fitting with LMS Data on gene expression (Uri Alon et. al.,pnas, 96(12), 1999) Carbonic anhydrase IV gene (M83670) Guanylate cyclase activator 2A gene (M97496) Tumour (1) or Normal (0)

13 Wrong Answer Preprocessing the data Need to rescale the values Normalization : Standardization : x new x new = = x min(x) max(x) min(x) x μ(x) σ(x) Important: store these parameters and apply to all new points import numpy as np mat = np.loadtxt('gene_data.txt',delimiter='\t') Ys = mat[:,[-1]] Xs = mat[:,:-1] means = np.mean(xs,0) stdevs = np.std(xs,0) Xs = (Xs-means)/stdevs 12

14 Wrong Answer Simplifying the expression Instead of y( x ) = w T x + w 0 Let's merge the parameters and add a 1 to the feature vectors: Add the 1s w = ( w, w 0 ), x = ( x, 1), y( x ) = w T x def expand_features(x): """append a columns of 1 """ X_exp = np.ones((x.shape[0],x.shape[1]+1)) X_exp[:,:-1] = X return X_exp 13

15 Wrong Answer Fitting with LMS Measure the error Note that the class values are 0 and 1, so change to -1 and 1 def quad_cost(theta,x,y): """return error value comparing signed distance with y """ coefs = np.zeros((len(theta),1)) coefs[:,0] = theta vals = np.dot(x,coefs) return np.mean((vals-(2*y-1))**2) 14

16 Wrong Answer Fitting with LMS Minimize the error from scipy.optimize import minimize import matplotlib.pyplot as plt X_exp = expand_features(xs) coefs = np.ones(x_exp.shape[1]) opt = minimize(quad_cost,coefs,(x_exp,ys),tol= ) coefs = opt.x # plot the chart 15

17 Wrong Answer Not a good result... 16

18 Wrong Answer Fitting with LMS Not a good result... Minimizing the squared error makes points away from the discriminant weigh more This is good in regression but bad in classification, as it pulls the discriminant towards distant points Regression Fit the data as closely as possible to predict continuous values Classification Find discriminant between discrete classes 17

19 Logistic Regression Logistic Regression 18

20 Logistic Regression Logistic Regression Assume there is a function: We want our hyperplane to be at Regression on probabilities, but we'll use it as a classifier. Also note that dimensional vector Assumed: Plane: g( x, w ) = P( C 1 x ) P( C 1 x ) = P( C 0 x ) = 1 P( C 1 x ) g( x, w ) = P( C 1 x ) P( C 1 x ) = P( C 0 x ) = 1 P( C 1 x ) x is an N ln P( C 1 x ) 1 P( C 1 x ) = 0 = ln g( x, w ) 1 g( x, w ) 19

21 Logistic Regression Logistic Regression Assumed: g( x, w ) = P( C 1 x ) Plane: P( C 1 x ) = P( C 0 x ) = 1 P( C 1 x ) g( x, w ) ln = w T x + w 0 1 g( x, w ) g( x, w ) ln = w T x + w 0 g( x, w ) = 1 g( x, w ) e ( w T x + w 0 ) 20

22 Logistic Regression Logistic Function: f(x) = 1 1+e k(x x 0 ) 21

23 Logistic Regression Logistic Function Unlike quadratic curve, logistic function levels with distance 1 g( x, w ) = 1 + e ( w T x + w 0 ) This solves the problem of points farther away pulling the discriminant 22

24 Logistic Regression w Given: Find by maximum likelihood g( x, w ) = P( t n = 1 x ) and {0, 1} N n=1 t n L( w X) = [ g t n(1 g n ) 1 t n] N n=1 l( w X) = [ ln + (1 ) ln(1 )] n t n g n t n g n Maximize likelihood is to minimize the error in predicting probabilities (logistic loss or cross entropy): N 1 E( w ) = [ ln + (1 ) ln(1 )] N n=1 t n g n t n g n 1 g n = 1 + e ( w T x + ) n w0 23

25 Logistic Regression Example 24

26 Example Load and standardize the data (as before) Write logistic function (Note: you won't be doing this) def logistic(x): """return logistic function of vector X""" den = np.e ** (-1.0 * X) return 1.0 / den And the logistic cost function to minimize def log_cost(theta,x,y): """return logistic error value X is matrix, one example per row and 1 in last column y is a vector of classes 0 or 1 """ coefs = np.zeros((len(theta),1)) coefs[:,0] = theta sig_vals = logistic(np.dot(x,coefs)) log_1 = np.log(sig_vals)*y log_0 = np.log((1-sig_vals))*(1-y) return -np.mean(log_0+log_1) 25

27 Example Minimizing this function, we get a better result: 26

28 Logistic Regression Nonlinear Separability 27

29 Nonlinear Separability This set is not linearly separable actin-binding protein X53416, smooth muscle cell Ca binding protein U

30 Nonlinear Separability This set is not linearly separable but we can expand the features. 29

31 Nonlinear Separability First, we add a term x 1 x 2 def poly_3features(x): """append a column with the product of the two first features """ X_exp = np.zeros((x.shape[0],x.shape[1]+1)) X_exp[:,:-1] = X X_exp[:,-1] = X[:,0]*X[:,1] return X_exp Now we do a logistic regression in 3D from sklearn.linear_model import LogisticRegression #load and standardize data X_exp = poly_3features(xs) reg = LogisticRegression(C=1e12, tol=1e-10) reg.fit(x_exp,ys[:,0]) 30

32 Nonlinear Separability And we get a plane instead of a line 31

33 Nonlinear Separability Project back into the original plane 32

34 Nonlinear Separability Expand more:,,,, x 1 x 2 x 1 x 2 x 2 1 x2 2 33

35 Nonlinear Separability Expand more:,,,,,, x 1 x 2 x 1 x 2 x 2 1 x2 2 x3 1 x3 2 34

36 Nonlinear Separability Expand more: x 1, x 2, x 1 x 2, x 2 1, x2 2, x3 1, x3 2, x2 1 x 2, x 1 x

37 Nonlinear Separability Is this too much? Overfitting? 36

38 Logistic Regression Summary 37

39 Logistic Regression Summary Linear separability Linear discriminant (hyperplane) Fitting the discriminant with LMS Logistic Regression Linear separability in higher dimensions Next lecture: overfitting in classification Further reading Bishop, Sections 4.1.1, and

Machine Learning Lecture 7

Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant