More about the Perceptron

Size: px

Start display at page:

Download "More about the Perceptron"

Elisabeth McCarthy
5 years ago
Views:

1 More about the Perceptron CMSC 422 MARINE CARPUAT Credit: figures by Piyush Rai and Hal Daume III

2 Recap: Perceptron for binary classification Classifier = hyperplane that separates positive from negative examples y = sign(w T x + b) Perceptron training Finds such a hyperplane Online & error-driven

3 Recap: Perceptron updates Update for a misclassified positive example:

4 Recap: Perceptron updates Update for a misclassified negative example:

5 Today Example of perceptron + averaged perceptron training Perceptron convergence proof Fundamental Machine Learning Concepts Linear separability and margin of a data set

6 Standard Perceptron: predict based on final parameters

7 Averaged Perceptron: predict based on average of intermediate parameters

8 Convergence of Perceptron The perceptron has converged if it can classify every training example correctly i.e. if it has found a hyperplane that correctly separates positive and negative examples Under which conditions does the perceptron converge and how long does it take?

9 Convergence of Perceptron Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, Then perceptron training converges after R2 errors during training (assuming ( x < R for all x). γ 2

10 Margin of a data set D Distance between the hyperplane (w,b) and the nearest point in D Largest attainable margin on D

11 Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, then perceptron training converges after R2 γ2 errors during training (assuming ( x < R for all x).

12 Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, then perceptron training converges after R2 γ2 errors during training (assuming ( x < R for all x).

13 Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, then perceptron training converges after R2 γ2 errors during training (assuming ( x < R for all x).

14 Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, then perceptron training converges after R2 γ2 errors during training (assuming ( x < R for all x).

15 Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, then perceptron training converges after R2 γ2 errors during training (assuming ( x < R for all x).

16 Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, then perceptron training converges after R2 γ2 errors during training (assuming ( x < R for all x).

17 Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, then perceptron training converges after R2 γ2 errors during training (assuming ( x < R for all x).

18 Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, then perceptron training converges after R2 γ2 errors during training (assuming ( x < R for all x). What does this mean? Perceptron converges quickly when the margin is large, slowly when it is small Bound doesn t depend on the number of examples N, nor on their dimension d! Important note: proof guarantees that perceptron converges, but it doesn t necessarily converge to the max margin separator!!

19 What you should know Perceptron training and prediction algorithms (standard, voting, averaged) Convergence theorem and what practical guarantees it gives us Draw/describe the decision boundary of a perceptron classifier Fundamental ML concepts: Determine whether a data set is linearly separable, and define its margin Determine whether a training algorithm is error-driven or not, online or not

Perceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015

Perceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015 Perceptron Subhransu Maji CMPSCI 689: Machine Learning 3 February 2015 5 February 2015 So far in the class Decision trees Inductive bias: use a combination of small number of features Nearest neighbor