Perceptron. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents

Size: px

Start display at page:

Download "Perceptron. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents"

Ann Washington
5 years ago
Views:

1 Perceptron by Prof. Seungchul Lee Industrial AI Lab POSTECH Table of Contents I.. Supervised Learning II.. Classification III. 3. Perceptron I. 3.. Linear Classifier II. 3.. Perceptron Algorithm III Iterations of Perceptron IV The best hyperplane separator? V Python Example VI XOR Problem

2 . Supervised Learning. Classification where y is a discrete value develop the classification algorithm to determine which class a new input should fall into start with binary class problems Later look at multiclass classification problem, although this is just an extension of binary classification We could use linear regression Then, threshold the classifier output (i.e. anything over some value is yes, else no) linear regression with thresholding seems to work

3 3. Perceptron x For input x = x d ω weights ω = ω d 'attributes of a customer' Approve credit if Deny credit if d ω i x i i= d ω i x i i= > threshold, < threshold. d h(x) = sign (( ) threshold) = sign (( ) + ) ω i x i i= = Introduce an artificial coordinate x 0 : In vector form, the perceptron implements d h(x) = sign ( ) ω i x i i=0 h(x) = sign ( ω T x) d ω i x i ω 0 i=

4 Hyperplane Separates a D-dimensional space into two half-spaces Defined by an outward pointing normal vector ω ω is orthogonal to any vector lying on the hyperplane assume the hyperplane passes through origin, ω T x = 0 with x 0 =

5 3.. Linear Classifier represent the decision boundary by a hyperplane ω The linear classifier is a way of combining expert opinion. In this case, each opinion is made by a binary "expert" Goal: to learn the hyperplane ω using the training data 3.. Perceptron Algorithm The perceptron implements h(x) = sign ( ω T x) Given the training set (, ), (, ),, (, ) where {, } x y x y x N y N y i ) pick a misclassified point sign ( ω T x n ) y n ) and update the weight vector ω ω + y n x n

6 Why perceptron updates work? = + Let's look at a misclassified positive example ( ) perceptron (wrongly) thinks ω T old x n < 0 y n updates would be ω new ω T new x n = + = + ω old y n x n ω old x n = ( + = + ω old x n ) T x n ω T old x n x T n x n Thus ω T new x n is less negative than ω T old x n

3.3. Iterations of Perceptron. Randomly assign ω. One iteration of the PLA (perceptron learning algorithm) where (x, y) is a misclassified training point t =,, 3,, ω ω + yx 3.

7 3.3. Iterations of Perceptron. Randomly assign ω. One iteration of the PLA (perceptron learning algorithm) where (x, y) is a misclassified training point t =,, 3,, ω ω + yx 3. At iteration pick a misclassified point from x y x y x N y N (, ), (, ),, (, ) 4. and run a PLA iteration on it 5. That's it! 3.4. The best hyperplane separator? Perceptron finds one of the many possible hyperplanes separating the data if one exists Of the many possible choices, which one is the best? Utilize distance information as well Intuitively we want the hyperplane having the maximum margin Large margin leads to good generalization on the test data we will see this formally when we cover Support Vector Machine

8 3.5. Python Example ω = ω ω ω 3 x y ( x () ) T ( x () ) T ( x (3) ) T ( x (m) ) T = = = y () y () y (3) y (m) x () x () x (3) x (m) x () x () x (3) x (m) In []: import numpy as np import matplotlib.pyplot as plt % matplotlib inline In []: #training data gerneration m = 00 x = 8*np.random.rand(m, ) x = 7*np.random.rand(m, ) - 4 g0 = 0.8*x + x - 3 g = g0 - g = g0 + In [3]: C = np.where(g >= 0) C = np.where(g < 0) print(c) (array([ 5, 9, 0, 3, 3, 30, 3, 3, 34, 4, 43, 48, 49, 5, 5, 54, 5 8, 60, 63, 64, 65, 7, 74, 77, 79, 84, 85, 87, 88, 93, 94]), array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))

9 In [4]: C = np.where(g >= 0)[0] C = np.where(g < 0)[0] print(c.shape) print(c.shape) (3,) (4,) In [5]: plt.figure(figsize=(0, 6)) plt.plot(x[c], x[c], 'ro', label='c') plt.plot(x[c], x[c], 'bo', label='c') plt.title('linearly seperable classes', fontsize=5) plt.legend(loc='upper left', fontsize=5) plt.xlabel(r'$x_$', fontsize=0) plt.ylabel(r'$x_$', fontsize=0) plt.show()

10 x y ( x () ) T = = = ( x () ) T ( x (3) ) T ( x (m) ) T y () y () y (3) y (m) x () x () x (3) x (m) x () x () x (3) x (m) In [6]: X = np.hstack([np.ones([c.shape[0],]), x[c], x[c]]) X = np.hstack([np.ones([c.shape[0],]), x[c], x[c]]) X = np.vstack([x, X]) y = np.vstack([np.ones([c.shape[0],]), -np.ones([c.shape[0],])]) X = np.asmatrix(x) y = np.asmatrix(y) where (x, y) is a misclassified training point ω = ω ω ω 3 ω ω + yx In [7]: w = np.ones([3,]) w = np.asmatrix(w) n_iter = y.shape[0] for k in range(n_iter): for i in range(n_iter): if y[i,0]!= np.sign(x[i,:]*w)[0,0]: w += y[i,0]*x[i,:].t print(w) [[-. ] [ ] [ ]] g(x) = ω T x + ω 0 = ω x + ω x + ω 0 = 0 x = ω ω 0 x ω ω

11 In [8]: xp = np.linspace(0,8,00).reshape(-,) xp = - w[,0]/w[,0]*xp - w[0,0]/w[,0] plt.figure(figsize=(0, 6)) plt.scatter(x[c], x[c], c='r', s=50, label='c') plt.scatter(x[c], x[c], c='b', s=50, label='c') plt.plot(xp, xp, c='k', label='perceptron') plt.xlim([0,8]) plt.xlabel('$x_$', fontsize = 0) plt.ylabel('$x_$', fontsize = 0) plt.legend(loc = 4, fontsize = 5) plt.show()

12 3.6. XOR Problem Minsky-Papert Controversy on XOR not linearly separable limitation of perceptron x x x XOR x In [9]: %%javascript $.getscript(' js')

Linear Classification

Linear Classification by Prof. Seungchul Lee isystems Design Lab http://isystems.unist.ac.kr/ UNIS able of Contents I.. Supervised Learning II.. Classification III. 3. Perceptron I. 3.. Linear Classifier