More about the Perceptron

Similar documents
Perceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

CPSC 340: Machine Learning and Data Mining

The Perceptron Algorithm

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Warm up: risk prediction with logistic regression

CSC 411 Lecture 17: Support Vector Machine

Binary Classification / Perceptron

MIRA, SVM, k-nn. Lirong Xia

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

Support Vector Machine. Industrial AI Lab.

Lecture 4: Linear predictors and the Perceptron

ML in Practice: CMSC 422 Slides adapted from Prof. CARPUAT and Prof. Roth

Linear Classifiers. Michael Collins. January 18, 2012

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

Modelli Lineari (Generalizzati) e SVM

Perceptron (Theory) + Linear Regression

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

A Course in Machine Learning

Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig

CSC321 Lecture 4 The Perceptron Algorithm

Linear & nonlinear classifiers

Logistic Regression & Neural Networks

The Perceptron Algorithm

Training the linear classifier

COMS 4771 Introduction to Machine Learning. Nakul Verma

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Support Vector Machines. Machine Learning Fall 2017

The Perceptron Algorithm 1

Inf2b Learning and Data

Support Vector Machines

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Support Vector Machines

The Perceptron algorithm

Logistic Regression Logistic

Machine Learning : Support Vector Machines

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Linear discriminant functions

Single layer NN. Neuron Model

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Linear Discrimination Functions

Chapter 2 Single Layer Feedforward Networks

Minimax risk bounds for linear threshold functions

EE 511 Online Learning Perceptron

Lecture 6. Notes on Linear Algebra. Perceptron

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

6.867 Machine learning

Machine Learning Linear Models

Name (NetID): (1 Point)

Support Vector Machines

Support Vector Machine

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

Kernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Part of the slides are adapted from Ziko Kolter

Machine Learning & Data Mining CMS/CS/CNS/EE 155. Lecture 2: Perceptron & Gradient Descent

Warm up. Regrade requests submitted directly in Gradescope, do not instructors.

Perceptron. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents

4 THE PERCEPTRON. 4.1 Bio-inspired Learning. Learning Objectives:

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Lecture 3: Multiclass Classification

a b = a T b = a i b i (1) i=1 (Geometric definition) The dot product of two Euclidean vectors a and b is defined by a b = a b cos(θ a,b ) (2)

The Perceptron. Volker Tresp Summer 2014

Support Vector Machines for Classification and Regression

Machine Learning A Geometric Approach

Linear & nonlinear classifiers

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

Logistic regression and linear classifiers COMS 4771

Classification, Linear Models, Naïve Bayes

Introduction to Machine Learning

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Linear, Binary SVM Classifiers

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

1 Learning Linear Separators

L5 Support Vector Classification

Brief Introduction to Machine Learning

GRADIENT DESCENT AND LOCAL MINIMA

COMP 875 Announcements

Multiclass Classification-1

Introduction to Machine Learning

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Online Passive- Aggressive Algorithms

Linear models and the perceptron algorithm

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

TDT4173 Machine Learning

Support vector machines Lecture 4

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

1 Machine Learning Concepts (16 points)

Support Vector and Kernel Methods

Support Vector Machines

ESS2222. Lecture 4 Linear model

Transcription:

More about the Perceptron CMSC 422 MARINE CARPUAT marine@cs.umd.edu Credit: figures by Piyush Rai and Hal Daume III

Recap: Perceptron for binary classification Classifier = hyperplane that separates positive from negative examples y = sign(w T x + b) Perceptron training Finds such a hyperplane Online & error-driven

Recap: Perceptron updates Update for a misclassified positive example:

Recap: Perceptron updates Update for a misclassified negative example:

Today Example of perceptron + averaged perceptron training Perceptron convergence proof Fundamental Machine Learning Concepts Linear separability and margin of a data set

Standard Perceptron: predict based on final parameters

Averaged Perceptron: predict based on average of intermediate parameters

Convergence of Perceptron The perceptron has converged if it can classify every training example correctly i.e. if it has found a hyperplane that correctly separates positive and negative examples Under which conditions does the perceptron converge and how long does it take?

Convergence of Perceptron Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, Then perceptron training converges after R2 errors during training (assuming ( x < R for all x). γ 2

Margin of a data set D Distance between the hyperplane (w,b) and the nearest point in D Largest attainable margin on D

Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, then perceptron training converges after R2 γ2 errors during training (assuming ( x < R for all x).

Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, then perceptron training converges after R2 γ2 errors during training (assuming ( x < R for all x).

Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, then perceptron training converges after R2 γ2 errors during training (assuming ( x < R for all x).

Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, then perceptron training converges after R2 γ2 errors during training (assuming ( x < R for all x).

Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, then perceptron training converges after R2 γ2 errors during training (assuming ( x < R for all x).

Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, then perceptron training converges after R2 γ2 errors during training (assuming ( x < R for all x).

Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, then perceptron training converges after R2 γ2 errors during training (assuming ( x < R for all x).

Theorem (Block & Novikoff, 1962) If the training data D = { x 1, y 1,, x N, y N } is linearly separable with margin γ by a unit norm hyperplane w ( w = 1 with b = 0, then perceptron training converges after R2 γ2 errors during training (assuming ( x < R for all x). What does this mean? Perceptron converges quickly when the margin is large, slowly when it is small Bound doesn t depend on the number of examples N, nor on their dimension d! Important note: proof guarantees that perceptron converges, but it doesn t necessarily converge to the max margin separator!!

What you should know Perceptron training and prediction algorithms (standard, voting, averaged) Convergence theorem and what practical guarantees it gives us Draw/describe the decision boundary of a perceptron classifier Fundamental ML concepts: Determine whether a data set is linearly separable, and define its margin Determine whether a training algorithm is error-driven or not, online or not