Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Similar documents
CSC 411 Lecture 7: Linear Classification

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 4 The Perceptron Algorithm

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

CSC321 Lecture 5 Learning in a Single Neuron

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 6: Backpropagation

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

CSC321 Lecture 7: Optimization

CSC321 Lecture 8: Optimization

Course 395: Machine Learning - Lectures

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSC 411 Lecture 10: Neural Networks

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1

The Perceptron. Volker Tresp Summer 2016

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

CSC 411 Lecture 6: Linear Regression

Simple Neural Nets For Pattern Classification

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

SGD and Deep Learning

Neural networks and support vector machines

The Perceptron algorithm

Binary Classification / Perceptron

Lecture 4: Training a Classifier

Chapter 2 Single Layer Feedforward Networks

Intelligent Systems Discriminative Learning, Neural Networks

17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk

Perceptron. (c) Marcin Sydow. Summary. Perceptron

Perceptron (Theory) + Linear Regression

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Linear Classifiers. Michael Collins. January 18, 2012

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

Linear models and the perceptron algorithm

CSC242: Intro to AI. Lecture 21

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Single layer NN. Neuron Model

CSC321 Lecture 9: Generalization

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

The Perceptron Algorithm

Nonlinear Classification

Lecture 4: Training a Classifier

Artificial Intelligence

IMBALANCED DATA. Phishing. Admin 9/30/13. Assignment 3: - how did it go? - do the experiments help? Assignment 4. Course feedback

In the Name of God. Lecture 11: Single Layer Perceptrons

18.6 Regression and Classification with Linear Models

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Optimization and Gradient Descent

Linear Models for Classification

Answers Machine Learning Exercises 4

Least Mean Squares Regression

Lecture 9: Generalization

CSC 411 Lecture 17: Support Vector Machine

Linear Classifiers: Expressiveness

CSC Neural Networks. Perceptron Learning Rule

Machine Learning (CSE 446): Neural Networks

The Perceptron. Volker Tresp Summer 2014

CSC321 Lecture 9: Generalization

Sections 18.6 and 18.7 Artificial Neural Networks

CS 188: Artificial Intelligence Spring Announcements

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

More about the Perceptron

Deep Learning for Computer Vision

CS 188: Artificial Intelligence Spring Announcements

The Perceptron Algorithm 1

The Perceptron. Volker Tresp Summer 2018

Lecture 4: Perceptrons and Multilayer Perceptrons

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Machine Learning A Geometric Approach

Error Functions & Linear Regression (2)

Neural Networks: Backpropagation

Name (NetID): (1 Point)

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Perceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015

Part of the slides are adapted from Ziko Kolter

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Lecture 2: Linear regression

Neural networks. Chapter 20, Section 5 1

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

18.9 SUPPORT VECTOR MACHINES

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Supervised Learning. George Konidaris

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Machine Learning Basics Lecture 3: Perceptron. Princeton University COS 495 Instructor: Yingyu Liang

Sections 18.6 and 18.7 Artificial Neural Networks

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Input layer. Weight matrix [ ] Output layer

Lecture 6. Notes on Linear Algebra. Perceptron

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Artificial Neural Networks

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018

Learning Linear Detectors

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!

Linear Discrimination Functions

Neural networks. Chapter 19, Sections 1 5 1

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

Linear & nonlinear classifiers

Transcription:

Vote Vote on timing for night section: Option 1 (what we have now) Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9 Option 2 Lecture, 6:10-7 10 minute break Lecture, 7:10-8 10 minute break Tutorial, 8:10-9 Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 1 / 24

CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? Roger Grosse Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 2 / 24

Overview Classification: predicting a discrete-valued target In this lecture, we focus on binary classification: predicting a binary-valued target Examples predict whether a patient has a disease, given the presence or absence of various symptoms classify e-mails as spam or non-spam predict whether a financial transaction is fraudulent Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 3 / 24

Overview Design choices so far Task: regression, classification Model/Architecture: linear Loss function: squared error Optimization algorithm: direct solution, gradient descent, perceptron Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 4 / 24

Overview Binary linear classification classification: predict a discrete-valued target binary: predict a binary target t {0, 1} Training examples with t = 1 are called positive examples, and training examples with t = 0 are called negative examples. Sorry. linear: model is a linear function of x, followed by a threshold: z = w T x + b { 1 if z r y = 0 if z < r Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 5 / 24

Some simplifications Eliminating the threshold We can assume WLOG that the threshold r = 0: w T x + b r w T x + b r }{{} b 0. Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 6 / 24

Some simplifications Eliminating the threshold We can assume WLOG that the threshold r = 0: Eliminating the bias w T x + b r w T x + b r }{{} b 0. Add a dummy feature x 0 which always takes the value 1. The weight w 0 is equivalent to a bias. Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 6 / 24

Some simplifications Eliminating the threshold We can assume WLOG that the threshold r = 0: Eliminating the bias w T x + b r w T x + b r }{{} b 0. Add a dummy feature x 0 which always takes the value 1. The weight w 0 is equivalent to a bias. Simplified model z = w T x { 1 if z 0 y = 0 if z < 0 Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 6 / 24

As a neuron This is basically a special case of the neuron-like processing unit from Lecture 1. y output output bias i'th weight w 1 w2 w3 weights inputs y = g b + nonlinearity x 1 x 2 x 3 i x i w i i'th input Today s question: what can we do with a single unit? Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 7 / 24

Examples NOT x 0 x 1 t 1 0 1 1 1 0 Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 8 / 24

Examples NOT x 0 x 1 t 1 0 1 1 1 0 b = 1, w = 2 Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 8 / 24

Examples AND x 0 x 1 x 2 t 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 9 / 24

Examples AND x 0 x 1 x 2 t 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 b = 1.5, w 1 = 1, w 2 = 1 Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 9 / 24

The Geometric Picture Input Space Here we re visualizing the NOT example Training examples are points Hypotheses are half-spaces whose boundaries pass through the origin The boundary is the decision boundary In 2-D, it s a line, but think of it as a hyperplane If the training examples can be separated by a linear decision rule, they are linearly separable. Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 10 / 24

The Geometric Picture Weight Space Hypotheses are points Training examples are half-spaces whose boundaries pass through the origin The region satisfying all the constraints is the feasible region; if this region is nonempty, the problem is feasible Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 11 / 24

The Geometric Picture The AND example requires three dimensions, including the dummy one. To visualize data space and weight space for a 3-D example, we can look at a 2-D slice: The visualizations are similar, except that the decision boundaries and the constraints need not pass through the origin. Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 12 / 24

The Geometric Picture Visualizations of the AND example Data Space Weight Space Slice for x 0 = 1 Slice for w 0 = 1 What happened to the fourth constraint? Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 13 / 24

The Geometric Picture Some datasets are not linearly separable, e.g. XOR Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 14 / 24

The Perceptron Learning Rule Let s mention a classic classification algorithm from the 1950s: the perceptron - Frank Rosenblatt, with the image sensor (left) of the Mark I Perceptron40 Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 15 / 24

The Perceptron Learning Rule The idea: If t = 1 and z = w x > 0 then y = 1, so no need to change anything. Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 16 / 24

The Perceptron Learning Rule The idea: If t = 1 and z = w x > 0 then y = 1, so no need to change anything. If t = 1 and z < 0 then y = 0, so we want to make z larger. Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 16 / 24

The Perceptron Learning Rule The idea: If t = 1 and z = w x > 0 then y = 1, so no need to change anything. If t = 1 and z < 0 then y = 0, so we want to make z larger. Update: w w + x Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 16 / 24

The Perceptron Learning Rule The idea: If t = 1 and z = w x > 0 then y = 1, so no need to change anything. If t = 1 and z < 0 then y = 0, so we want to make z larger. Update: w w + x Justification: w T x = (w + x) T x = w T x + x T x = w T x + x 2. Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 16 / 24

The Perceptron Learning Rule For convenience, let targets be { 1, 1} instead of our usual {0, 1}. Perceptron Learning Rule: For each training case (x (i), t (i) ), z (i) w T x (i) If z (i) t (i) 0, w w + t (i) x (i) Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 17 / 24

The Perceptron Learning Rule Compare: SGD for linear regression w w α(y t) x perceptron z w T x If zt 0, w w + tx Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 18 / 24

The Perceptron Learning Rule Under certain conditions, if the problem is feasible, the perceptron rule is guaranteed to find a feasible solution after a finite number of steps. If the problem is infeasible, all bets are off. Stay tuned... The perceptron algorithm caused lots of hype in the 1950s, then people got disillusioned and gave up on neural nets. People were discouraged about fundamental limitations of linear classifiers. Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 19 / 24

Limits of Linear Classification Visually, it s obvious that XOR is not linearly separable. But how to show this? Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 20 / 24

Limits of Linear Classification Convex Sets A set S is convex if any line segment connecting points in S lies entirely within S. Mathematically, x 1, x 2 S = λx 1 + (1 λ)x 2 S for 0 λ 1. A simple inductive argument shows that for x 1,..., x N S, weighted averages, or convex combinations, lie within the set: λ 1 x 1 + + λ N x N S for λ i > 0, λ 1 + λ N = 1. Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 21 / 24

Limits of Linear Classification Showing that XOR is not linearly separable Half-spaces are obviously convex. Suppose there were some feasible hypothesis. If the positive examples are in the positive half-space, then the green line segment must be as well. Similarly, the red line segment must line within the negative half-space. But the intersection can t lie in both half-spaces. Contradiction! Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 22 / 24

Limits of Linear Classification A more troubling example pattern A pattern A pattern A pattern A pattern A pattern A pattern B pattern B pattern B Want to distinguish patterns A and B in all possible translations (with wrap-around) Translation invariance is commonly desired in vision! Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 23 / 24

Limits of Linear Classification A more troubling example pattern A pattern A pattern A pattern A pattern A pattern A pattern B pattern B pattern B Want to distinguish patterns A and B in all possible translations (with wrap-around) Translation invariance is commonly desired in vision! Suppose there s a feasible solution. The average of all translations of A is the vector (0.25, 0.25,..., 0.25). Therefore, this point must be classified as A. Similarly, the average of all translations of B is also (0.25, 0.25,..., 0.25). Therefore, it must be classified as B. Contradiction! Credit: Geoffrey Hinton Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 23 / 24

Limits of Linear Classification Sometimes we can overcome this limitation using feature maps, just like for linear regression. E.g., for XOR: x 1 φ(x) = x 2 x 1 x 2 x 1 x 2 φ 1 (x) φ 2 (x) φ 3 (x) t 0 0 0 0 0 0 0 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 1 0 This is linearly separable. (Try it!) Not a general solution: it can be hard to pick good basis functions. Instead, we ll use neural nets to learn nonlinear hypotheses directly. Roger Grosse CSC321 Lecture 3: Linear Classifiers or What good is a single neuron? 24 / 24