Learning: Binary Perceptron. Examples: Perceptron. Separable Case. In the space of feature vectors

Similar documents
CS 188: Artificial Intelligence. Outline

Errors, and What to Do. CS 188: Artificial Intelligence Fall What to Do About Errors. Later On. Some (Simplified) Biology

CS 188: Artificial Intelligence Fall 2011

CS 343: Artificial Intelligence

CS 5522: Artificial Intelligence II

MIRA, SVM, k-nn. Lirong Xia

CS 188: Artificial Intelligence. Machine Learning

CS 188: Artificial Intelligence Fall 2008

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering

EE 511 Online Learning Perceptron

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Perceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning

Machine Learning. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 8 May 2012

CS 343: Artificial Intelligence

Statistical NLP Spring A Discriminative Approach

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Machine Learning and Data Mining. Linear classification. Kalev Kask

Linear Classifiers. Michael Collins. January 18, 2012

1 Machine Learning Concepts (16 points)

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

Logistic Regression. Machine Learning Fall 2018

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Support vector machines Lecture 4

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson

Lecture 3: Multiclass Classification

CS 188: Artificial Intelligence Learning :

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Warm up: risk prediction with logistic regression

Classification. Chris Amato Northeastern University. Some images and slides are used from: Rob Platt, CS188 UC Berkeley, AIMA

The Perceptron algorithm

ECE521 Lecture7. Logistic Regression

Nonlinear Classification

CS 340 Lec. 16: Logistic Regression

CS 188: Artificial Intelligence Spring Today

COMS 4771 Introduction to Machine Learning. Nakul Verma

Machine Learning Lecture 5

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor

Logistic Regression & Neural Networks

Support Vector Machines

Machine Learning for NLP

Multiclass Classification-1

Non-Linearity. CS 188: Artificial Intelligence. Non-Linear Separators. Non-Linear Separators. Deep Learning I

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Classification Based on Probability

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018

ECE521 week 3: 23/26 January 2017

Logistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Machine Learning Basics

Stochastic gradient descent; Classification

Linear and Logistic Regression. Dr. Xiaowei Huang

Neural Networks: Introduction

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

ESS2222. Lecture 4 Linear model

Announcements - Homework

Perceptron (Theory) + Linear Regression

Predicting Sequences: Structured Perceptron. CS 6355: Structured Prediction

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

Binary Classification / Perceptron

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Lecture 6. Notes on Linear Algebra. Perceptron

CSC321 Lecture 4: Learning a Classifier

Introduction to Logistic Regression

Approximate Q-Learning. Dan Weld / University of Washington

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

Artificial Neural Networks

CSC242: Intro to AI. Lecture 21

Lecture 15: Logistic Regression

CSC 411: Lecture 04: Logistic Regression

Ch 4. Linear Models for Classification

Introduction to Logistic Regression and Support Vector Machine

CMU-Q Lecture 24:

Feed-forward Network Functions

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Artifical Neural Networks

Rapid Introduction to Machine Learning/ Deep Learning

CSC321 Lecture 4: Learning a Classifier

CSCI567 Machine Learning (Fall 2018)

Artificial Intelligence

CSC321 Lecture 4 The Perceptron Algorithm

Introduction to Neural Networks

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

The Perceptron Algorithm

Natural Language Processing. Classification. Margin. Linear Models: Perceptron. Issues with Perceptrons. Problems with Perceptrons.

CSCI-567: Machine Learning (Spring 2019)

Lecture 11. Linear Soft Margin Support Vector Machines

COMP-4360 Machine Learning Neural Networks

Linear discriminant functions

Algorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Classification. Statistical NLP Spring Example: Text Classification. Some Definitions. Block Feature Vectors.

Transcription:

Linear Classifiers CS 88 Artificial Intelligence Perceptrons and Logistic Regression Pieter Abbeel & Dan Klein University of California, Berkeley Feature Vectors Some (Simplified) Biology Very loose inspiration human neurons Hello, Do you want free printr cartriges? Why pay more when you can get m ABSOLUTELY FREE! Just PIXEL-7, PIXEL-7,3 NUM_LOOPS SPAM or + Linear Classifiers Binary case compare features to a weight vector Learning figure out weight vector from examples Inputs are feature values Each feature has a weight Sum is activation If activation is Positive, output + Negative, output - Weights f f f3 4 - -3 w w w3 S >? Dot product positive means positive class

Decision Rules Binary Decision Rule Examples are points Any weight vector is a hyperplane One side corresponds to Y=+ Or corresponds to Y=- + = SPAM -3 free 4 money Weight Updates money In space of feature vectors - = HAM free Learning Binary Perceptron Start with weights = For each training instance Classify with current weights If correct (i.e., y=y*), no change! If wrong adjust weight vector Learning Binary Perceptron Start with weights = For each training instance Classify with current weights If correct (i.e., y=y*), no change! If wrong adjust weight vector by adding or subtracting feature vector. Subtract if y* is -. Examples Perceptron Separable Case

Multiclass Decision Rule Learning Multiclass Perceptron Start with all weights = Pick up training examples one by one Predict with current weights If we have multiple classes A weight vector for each class Score (activation) of a class y If correct, no change! If wrong lower score of wrong answer, raise score of right answer Prediction highest score s Binary = multiclass where negative class has weight zero Example Multiclass Perceptron Properties of Perceptrons election Convergence if training is separable, perceptron will eventually converge (binary case) Mistake Bound maximum number of mistakes (binary case) related to margin or degree of separability Problems with Perceptron Noise if data isn t separable, weights might thrash Averaging weight vectors over time can help (averaged perceptron) Mediocre generalization finds a barely separating solution Overtraining test / held-out accuracy usually rises, n falls Overtraining is a kind of overfitting Separable Separability true if some parameters get training set perfectly correct Non-Separable Improving Perceptron

Non-Separable Case Deterministic Decision Non-Separable Case Probabilistic Decision Even best linear boundary makes at least one mistake.9..7.3.5.5.3.7..9 How to get probabilistic decisions? Perceptron scoring If very positive à want probability going to If very negative à want probability going to Best w? Maximum likelihood estimation Sigmoid function with = Logistic Regression Separable Case Deterministic Decision Many Options Separable Case Probabilistic Decision Clear Preference.7.3.5.5.3.7.7.3.5.5.3.7

Recall Perceptron A weight vector for each class Multiclass Logistic Regression Best w? Maximum likelihood estimation Score (activation) of a class y Prediction highest score s How to make scores into probabilities? with original activations softmax activations = Multi-Class Logistic Regression Next Lecture Optimization i.e., how do we solve