Linear Classifiers: Expressiveness

Similar documents
The Perceptron Algorithm

Midterm: CS 6375 Spring 2015 Solutions

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Machine Learning (CS 567) Lecture 3

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning

Least Mean Squares Regression

Machine Learning

Machine Learning

Neural Networks Introduction CIS 32

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell

Computational Learning Theory. CS534 - Machine Learning

Answers Machine Learning Exercises 2

Linear and Logistic Regression. Dr. Xiaowei Huang

Variance Reduction and Ensemble Methods

Machine Learning: Exercise Sheet 2

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CSC321 Lecture 4 The Perceptron Algorithm

Perceptron (Theory) + Linear Regression

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Qualifying Exam in Machine Learning

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Statistical and Computational Learning Theory

ECE 5984: Introduction to Machine Learning

Machine Learning (CSE 446): Neural Networks

Support Vector Machines. Machine Learning Fall 2017

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Outline. [read Chapter 2] Learning from examples. General-to-specic ordering over hypotheses. Version spaces and candidate elimination.

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

Neural Networks: Introduction

Machine Learning (CS 567) Lecture 2

Proving simple set properties...

Name (NetID): (1 Point)

Decision Trees. Introduction. Some facts about decision trees: They represent data-classification models.

Max Margin-Classifier

CSCI567 Machine Learning (Fall 2014)

Question of the Day? Machine Learning 2D1431. Training Examples for Concept Enjoy Sport. Outline. Lecture 3: Concept Learning

CS 6375: Machine Learning Computational Learning Theory

Concept Learning through General-to-Specific Ordering

Answers Machine Learning Exercises 4

CSC 411: Lecture 03: Linear Classification

CS446: Machine Learning Spring Problem Set 4

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Support Vector Machine. Industrial AI Lab.

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

Statistical Pattern Recognition

EECS 349:Machine Learning Bryan Pardo

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Stochastic Gradient Descent

CSC 411 Lecture 7: Linear Classification

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

Midterm Exam, Spring 2005

Support Vector Machines

Multilayer Perceptron

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

CS145: INTRODUCTION TO DATA MINING

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

CS 188: Artificial Intelligence. Outline

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center

Midterm, Fall 2003

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Support Vector Machines

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

CSC242: Intro to AI. Lecture 21

10701/15781 Machine Learning, Spring 2007: Homework 2

Support vector machines Lecture 4

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Learning with multiple models. Boosting.

Computational Learning Theory

Machine Learning and Data Mining. Linear classification. Kalev Kask

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Machine Learning Lecture 7

TDT4173 Machine Learning

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Learning from Examples

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Learning Theory Continued

PAC-learning, VC Dimension and Margin-based Bounds

Online Learning With Kernel

CS 6375 Machine Learning

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

Lecture 2: Foundations of Concept Learning

Lecture 6. Notes on Linear Algebra. Perceptron

Recitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14

Artificial neural networks

Deep Learning for Computer Vision

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

Support Vector Machine

Introduction to Machine Learning

Decision Tree Learning Lecture 2

Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

TDT4173 Machine Learning

10-701/ Machine Learning, Fall

Computational Learning Theory

Linear discriminant functions

CSC321 Lecture 5: Multilayer Perceptrons

Transcription:

Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1

Lecture outline Linear classifiers: Introduction What functions do linear classifiers express? Least Squares Method for Regression 2

Where are we? Linear classifiers: Introduction What functions do linear classifiers express? Conjunctions and disjunctions mofn functions Not all functions are linearly separable Feature space transformations Exercises Least Squares Method for Regression 3

Which Boolean functions can linear classifiers represent? Linear classifiers are an expressive hypothesis class Many Boolean functions are linearly separable Not all though Recall: Decision trees can represent any Boolean function 4

Conjunctions and disjunctions y = x 1 x 2 x 3 x 1 x 2 x 3 y 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 5

Conjunctions and disjunctions y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 3 x 1 x 2 x 3 y 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 6

Conjunctions and disjunctions y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 3 x 1 x 2 x 3 y x 1 x 2 x 3 3 sign 0 0 0 0 3 0 0 0 1 0 2 0 0 1 0 0 2 0 0 1 1 0 1 0 1 0 0 0 2 0 1 0 1 0 1 0 1 1 0 0 1 0 1 1 1 1 0 1 7

Conjunctions and disjunctions y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 3 x 1 x 2 x 3 y x 1 x 2 x 3 3 sign 0 0 0 0 3 0 0 0 1 0 2 0 0 1 0 0 2 0 0 1 1 0 1 0 1 0 0 0 2 0 1 0 1 0 1 0 1 1 0 0 1 0 1 1 1 1 0 1 How about negations? y = x 1 x 2 x 3 8

Conjunctions and disjunctions y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 3 x 1 x 2 x 3 y x 1 x 2 x 3 3 sign 0 0 0 0 3 0 0 0 1 0 2 0 0 1 0 0 2 0 0 1 1 0 1 0 1 0 0 0 2 0 1 0 1 0 1 0 1 1 0 0 1 0 1 1 1 1 0 1 Negations are okay too. In general, use 1x in the linear threshold unit if x is negated y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 2 9

Conjunctions and disjunctions y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 3 x 1 x 2 x 3 y x 1 x 2 x 3 3 sign 0 0 0 0 3 0 0 0 1 0 2 0 0 1 0 0 2 0 0 1 1 0 1 0 1 0 0 0 2 0 1 0 1 0 1 0 1 1 0 0 1 0 1 1 1 1 0 1 Negations are okay too. In general, use 1x in the linear threshold unit if x is negated y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 2 Exercise: What would the linear threshold function be if the conjunctions here were replaced with disjunctions? 10

Conjunctions and disjunctions y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 3 x 1 x 2 x 3 y x 1 x 2 x 3 3 sign 0 0 0 0 3 0 0 0 1 0 2 0 0 1 0 0 2 0 0 1 1 0 1 0 1 0 0 0 2 0 1 0 1 0 1 0 1 1 0 0 1 0 1 1 1 1 0 1 Questions? Negations are okay too. In general, use 1x in the linear threshold unit if x is negated y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 2 Exercise: What would the linear threshold function be if the conjunctions here were replaced with disjunctions? 11

mofn functions mofn rules There is a fixed set of n variables y = true if, and only if, at least m of them are true All other variables are ignored Suppose there are five Boolean variables: x 1, x 2, x 3, x 4, x 5 What is a linear threshold unit that is equivalent to the classification rule at least 2 of {x 1, x 2, x 3 }? 12

mofn functions mofn rules There is a fixed set of n variables y = true if, and only if, at least m of them are true All other variables are ignored Suppose there are five Boolean variables: x 1, x 2, x 3, x 4, x 5 What is a linear threshold unit that is equivalent to the classification rule at least 2 of {x 1, x 2, x 3 }? x 1 x 2 x 3 2 13

mofn functions mofn rules There is a fixed set of n variables y = true if, and only if, at least m of them are true All other variables are ignored Suppose there are five Boolean variables: x 1, x 2, x 3, x 4, x 5 What is a linear threshold unit that is equivalent to the classification rule at least 2 of {x 1, x 2, x 3 }? x 1 x 2 x 3 2 Questions? 14

Not all functions are linearly separable Parity is not linearly separable Can t draw a line to separate the two classes x 1 x 2 Questions? 15

Not all functions are linearly separable XOR is not linear y = (x 1 x 2 ) ( x 1 x 2 ) Parity cannot be represented as a linear classifier f(x) = 1 if the number of 1 s is even Many nontrivial Boolean functions y = (x 1 x 2 ) (x 3 x 4 ) The function is not linear in the four variables 16

Even these functions can be made linear These points are not separable in 1dimension by a line What is a onedimensional line, by the way? The trick: Change the representation 17

The blown up feature space The trick: Use feature conjunctions Transform points: Represent each point x in 2 dimensions by (x, x 2 ) 18

The blown up feature space The trick: Use feature conjunctions Transform points: Represent each point x in 2 dimensions by (x, x 2 ) 19

The blown up feature space The trick: Use feature conjunctions Transform points: Represent each point x in 2 dimensions by (x, x 2 ) Now the data is linearly separable in this space! 20

The blown up feature space The trick: Use feature conjunctions Transform points: Represent each point x in 2 dimensions by (x, x 2 ) (2, 4) 2 Now the data is linearly separable in this space! 21

Exercise How would you use the feature transformation idea to make XOR in two dimensions linearly separable in a new space? To answer this question, you need to think about a function that maps examples from two dimensional space to a higher dimensional space. 22

Almost linearly separable data sgn(b w 1 x 1 w 2 x 2 ) Training data is almost separable, except for some noise How much noise do we allow for? x 1 x 2 23

Almost linearly separable data b w 1 x 1 w 2 x 2 =0 sgn(b w 1 x 1 w 2 x 2 ) Training data is almost separable, except for some noise How much noise do we allow for? x 1 x 2 24

Linear classifiers: An expressive hypothesis class Many functions are linear Often a good guess for a hypothesis space Some functions are not linear The XOR function Nontrivial Boolean functions But there are ways of making them linear in a higher dimensional feature space 25

Why is the bias term needed? b w 1 x 1 w 2 x 2 =0 x 1 x 2 26

Why is the bias term needed? If b is zero, then we are restricting the learner only to hyperplanes that go through the origin May not be expressive enough x 1 x 2 27

Why is the bias term needed? w 1 x 1 w 2 x 2 =0 If b is zero, then we are restricting the learner only to hyperplanes that go through the origin May not be expressive enough x 1 x 2 28