Linear Classification: Perceptron

Size: px
Start display at page:

Download "Linear Classification: Perceptron"

Transcription

1 Linear Classification: Perceptron Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 18 Y Tao Linear Classification: Perceptron

2 In this lecture, we will consider an important special version of the classification task called linear classification defined as follows. Definition 1. Let R d denote the d-dimensional space where the domain of each dimension is the set R of real values. Let P be a set of points in R d, each of which is colored either red or blue. The goal of the linear classification problem is to determine whether there is a d-dimensional plane x 1 c 1 + x 2 c x d c d = λ which separates the red points from the blue points in P. In other words, all the red points must fall on the same side of the plane, while all the blue points must fall on the other side. No point is allowed to fall on the plane. If the plane exists, then P is said to be linearly separable. Otherwise, P is linearly non-separable. 2 / 18 Y Tao Linear Classification: Perceptron

3 Example 2. Linearly separable Linearly non-separable 3 / 18 Y Tao Linear Classification: Perceptron

4 Think Why do we call the linear classification problem a special case of the classification task? In particular, what is the model, and how do we perform classification on an unknown object? Some reasons why linear classification is important: The datasets in many classification tasks of reality are linearly separable. This is especially true when the objects of the same class form a cluster such that the yes and no clusters are far apart. Linear classification is a scientific problem that has a unique correct answer (i.e., whether the input set is linearly separable). 4 / 18 Y Tao Linear Classification: Perceptron

5 We will first look at a slightly different version of the problem: Definition 3. Let R d denote the d-dimensional space where the domain of each dimension is the set R of real values. Let P be a set of points in R d, each of which is colored either red or blue. The goal is to determine whether there is a d-dimensional plane x 1 c 1 + x 2 c x d c d = 0 such that all the red points must fall on the same side of the plane, while all the blue points must fall on the other side. No point is allowed to fall on the plane. Note that we have restricted the target plane to pass the origin (0, 0,..., 0) of R d. It turns out that the original problem in Definition 1 can be converted into the problem in Definition 3. We will come back to this issue later. 5 / 18 Y Tao Linear Classification: Perceptron

6 We assume that no point in P is at the origin. We further require that if a point p = (x 1, x 2,..., x d ) is red, then it must hold that x 1 c 1 + x 2 c x d c d > 0 Otherwise (i.e., p is blue), then it must hold that x 1 c 1 + x 2 c x d c d < 0 Think Why are we allowed to make these requirements without worrying about which color should be assigned to the > 0 case? 6 / 18 Y Tao Linear Classification: Perceptron

7 Next, we will discuss an algorithm called perceptron. Strictly speaking, this algorithm does not really settle the linear classification problem because when the input set P is linearly non-separable, the algorithm runs forever and never terminates. However, this shortcoming does not prevent perceptron from being a classical method in machine learning. As we will see, the algorithm is surprisingly simple, and guarantees to find a separation plane on a linearly separable dataset. 7 / 18 Y Tao Linear Classification: Perceptron

8 To facilitate our presentation, let us introduce some (standard) definitions and notations on vectors: Define a vector v to be a sequence of d real values (v 1, v 2,..., v d ). Given two vectors v 1 = (a 1, a 2,..., a d ) and v 2 = (b 1, b 2,..., b d ), define: v 1 v 2 as the real value d i=1 a ib i. v 1 + v 2 as the vector (a 1 + b 1, a 2 + b 2,..., a d + b d ). v 1 v 2 as the vector (a 1 b 1, a 2 b 2,..., a d b d ). Each point p(x 1,..., x d ) corresponds to a vector p = (x 1,..., x d ). Denote c = (c 1,..., c d ) where c 1,..., c d are the coefficients of the plane in Definition 3. Hence, if p is a red point, we require p c > 0; otherwise, we require p c < 0. 8 / 18 Y Tao Linear Classification: Perceptron

9 Perceptron The algorithm starts with c = (0, 0,..., 0), and then runs in iterations. In each iteration, it simply checks whether any point in p P violates our requirement according to c. If so, the algorithm adjusts c as follows: If p is red, then c c + p. If p is blue, then c c p. As soon as c has been adjusted, the current iteration finishes; and a new iteration starts. The algorithm finishes if the iteration finds all points of P on the right side of the plane. 9 / 18 Y Tao Linear Classification: Perceptron

10 Example 4. Suppose that P has four 2d points: p 1 = (1, 0), p 2 = (0, 1), p 3 = (0, 1), and p 4 = ( 1, 0). Points p 1 and p 3 are red, and the other are blue. The algorithm starts with c = (0, 0,..., 0). Iteration 1: p 1 c = 0 violating our requirement p 1 c > 0. Hence, we update c to c + p 1 = (1, 0). Iteration 2: p 2 c = 0 violating our requirement p 2 c < 0. Hence, we update c to c p 2 = (1, 0) (0, 1) = (1, 1). Iteration 3: all points satisfy our requirements. Thus, the algorithm finishes with c = (1, 1). 10 / 18 Y Tao Linear Classification: Perceptron

11 Next, we will prove an important theorem, for which purpose we introduce one more notion on a vector v = (v 1,..., v d ): The length of v, denoted as v, is defined to be v v = d i=1 v 2 i. It is well-known that dot product has the following property: v 1 v 2 v 1 v 2 for any v 1 and v 2. Theorem 5. Perceptron always terminates on a linearly separable dataset P Proof. Since P is linearly separable, there is a c such that for all the red points p r P, we have p r c > 0, while for all the blue points p b P, we have p b c < 0. Furthermore, we can assume that c = 1 (think: why?). Next, we will use u to denote this vector c (because we will use the notation c for other purposes). 11 / 18 Y Tao Linear Classification: Perceptron

12 Proof (cont.). Define: γ = min{ p u } p P Note that γ > 0. Also define: R = max p P { p } Recall that the perceptron algorithm adjusts c in each iteration. Let c i (i 1) be the c after the i-th iteration. Also, let c 0 = (0,..., 0) be the initial c before the first iteration. Also, let k be the total number of iterations. 12 / 18 Y Tao Linear Classification: Perceptron

13 Proof (cont.). We will first prove, for any i 0, c i+1 u c i u + γ. Recall that c i+1 was adjusted from c i in one of the following cases: A red point p r violates our requirement, namely, p r c i < 0. In this case, c i+1 = c i + p r ; and hence, c i+1 u = c i u + p r u. From the definition of γ, we know that p r u γ. Therefore, c i+1 u c i u + γ. A blue point p b violates our requirement, namely, p b c i > 0. In this case, c i+1 = c i p b ; and hence, c i+1 u = c i u p b u. From the definition of γ, we know that p b u γ. Therefore, c i+1 u c i u + γ. It follows that c k u c k 1 u + γ... c k 2 u + 2γ c 0 u + kγ = kγ. (1) 13 / 18 Y Tao Linear Classification: Perceptron

14 Proof (cont.). We will now prove, for any i 0, c i+1 2 c i 2 + R 2. Recall that c i+1 was adjusted from c i in one of the following cases: A red point p r violates our requirement, namely, p r c i < 0. In this case, c i+1 = c i + p r. Thus: c i+1 2 = c i+1 c i+1 = ( c i + p r ) ( c i + p r ) = c i c i + 2 c i p r + p r (by def. of R) c i c i p r + R 2 c i 2 + R 2 where the last step used the fact that p r c < 0. A blue point p b violates our requirement, namely, p b c i > 0. The proof is similar, and omitted (a good exercise for you). It follows that c k 2 c k R 2 c k R 2... c kr 2 = kr 2. (2) 14 / 18 Y Tao Linear Classification: Perceptron

15 Proof (cont.). Now we combine (1) and (2) to obtain an upper bound on k. From (1), we know that c k = c k u c k u kγ. Therefore, c k 2 k 2 γ 2. Comparing this to (2) gives: kr 2 k 2 γ 2 k R2 γ / 18 Y Tao Linear Classification: Perceptron

16 There is only one issue remaining. Recall that our original goal was to solve the problem in Definition 1. We instead turned to solve the problem in Definition 3, and claimed that it is ok. Next, we will establish this claim. Let P be a d-dimensional dataset which is the input to Definition 1. We create another dataset P of dimensionality d + 1 for Definition 3 as follows. For each point p(x 1,..., x d ) in P, create a point p (x 1,..., x d, 1) in P, namely, add one more dimension on which the coordinates of all points of P are fixed to 1. The color of p is the same as that of p. 16 / 18 Y Tao Linear Classification: Perceptron

17 Now we prove that the problem of Definition 1 can be converted to that of Definition 3. Lemma 6. P is linearly separable if and only if P is linearly separable. Proof. Direction If. Suppose that x 1 c 1 + x 2 c x d c d + x d+1 c d+1 = 0 is a separation plane in Definition 3. This means that for every red point p(x 1, x 2,..., x d, 1) in P, it holds that x 1 c 1 + x 2 c x d c d + 1 c d+1 > 0. Also, for every blue point p(x 1, x 2,..., x d, 1) in P, it holds that x 1 c 1 + x 2 c x d c d + 1 c d+1 < 0. This means that x 1 c 1 + x 2 c x d c d + c d+1 = 0 is a separation plane in Definition / 18 Y Tao Linear Classification: Perceptron

18 Proof (cont.). Direction Only-If. Suppose that x 1 c 1 + x 2 c x d c d + c d+1 = 0 is a separation plane in Definition 1. This means that for every red point p(x 1, x 2,..., x d ) in P, it holds that x 1 c 1 + x 2 c x d c d + c d+1 > 0. Also, for every blue point p(x 1, x 2,..., x d ) in P, it holds that x 1 c 1 + x 2 c x d c d + c d+1 < 0. This means that x 1 c 1 + x 2 c x d c d + x d+1 c d+1 = 0 is a separation plane in Definition / 18 Y Tao Linear Classification: Perceptron

Linear Classification: Linear Programming

Linear Classification: Linear Programming Linear Classification: Linear Programming Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 21 Y Tao Linear Classification: Linear Programming Recall the definition

More information

Linear Classification: Linear Programming

Linear Classification: Linear Programming Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong Recall the definition of linear classification. Definition 1. Let R d denote the d-dimensional space where the domain

More information

c i r i i=1 r 1 = [1, 2] r 2 = [0, 1] r 3 = [3, 4].

c i r i i=1 r 1 = [1, 2] r 2 = [0, 1] r 3 = [3, 4]. Lecture Notes: Rank of a Matrix Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk 1 Linear Independence Definition 1. Let r 1, r 2,..., r m

More information

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent. Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u

More information

A = , A 32 = n ( 1) i +j a i j det(a i j). (1) j=1

A = , A 32 = n ( 1) i +j a i j det(a i j). (1) j=1 Lecture Notes: Determinant of a Square Matrix Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk 1 Determinant Definition Let A [a ij ] be an

More information

Lecture Notes: Solving Linear Systems with Gauss Elimination

Lecture Notes: Solving Linear Systems with Gauss Elimination Lecture Notes: Solving Linear Systems with Gauss Elimination Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk 1 Echelon Form and Elementary

More information

Multidimensional Divide and Conquer 1 Skylines

Multidimensional Divide and Conquer 1 Skylines Yufei Tao ITEE University of Queensland The next few lectures will be dedicated to an important technique: divide and conquer. You may have encountered the technique in an earlier algorithm course, but

More information

. The following is a 3 3 orthogonal matrix: 2/3 1/3 2/3 2/3 2/3 1/3 1/3 2/3 2/3

. The following is a 3 3 orthogonal matrix: 2/3 1/3 2/3 2/3 2/3 1/3 1/3 2/3 2/3 Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. An n n matrix

More information

CSC321 Lecture 4 The Perceptron Algorithm

CSC321 Lecture 4 The Perceptron Algorithm CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:

More information

Inexact Search is Good Enough

Inexact Search is Good Enough Inexact Search is Good Enough Advanced Machine Learning for NLP Jordan Boyd-Graber MATHEMATICAL TREATMENT Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 1 of 1 Preliminaries:

More information

Lecture Notes: Eigenvalues and Eigenvectors. 1 Definitions. 2 Finding All Eigenvalues

Lecture Notes: Eigenvalues and Eigenvectors. 1 Definitions. 2 Finding All Eigenvalues Lecture Notes: Eigenvalues and Eigenvectors Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk 1 Definitions Let A be an n n matrix. If there

More information

Usually, when we first formulate a problem in mathematics, we use the most familiar

Usually, when we first formulate a problem in mathematics, we use the most familiar Change of basis Usually, when we first formulate a problem in mathematics, we use the most familiar coordinates. In R, this means using the Cartesian coordinates x, y, and z. In vector terms, this is equivalent

More information

Lecture 6. Notes on Linear Algebra. Perceptron

Lecture 6. Notes on Linear Algebra. Perceptron Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors

More information

Linear Classifiers and the Perceptron

Linear Classifiers and the Perceptron Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers Let s assume that every instance is an n-dimensional vector of real numbers x R n, and there are only two possible

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

Lecture Notes: Matrix Inverse. 1 Inverse Definition. 2 Inverse Existence and Uniqueness

Lecture Notes: Matrix Inverse. 1 Inverse Definition. 2 Inverse Existence and Uniqueness Lecture Notes: Matrix Inverse Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Inverse Definition We use I to represent identity matrices,

More information

High-Dimensional Indexing by Distributed Aggregation

High-Dimensional Indexing by Distributed Aggregation High-Dimensional Indexing by Distributed Aggregation Yufei Tao ITEE University of Queensland In this lecture, we will learn a new approach for indexing high-dimensional points. The approach borrows ideas

More information

The Perceptron Algorithm 1

The Perceptron Algorithm 1 CS 64: Machine Learning Spring 5 College of Computer and Information Science Northeastern University Lecture 5 March, 6 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu Introduction The Perceptron

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

Linear Classifiers and the Perceptron Algorithm

Linear Classifiers and the Perceptron Algorithm Linear Classifiers and the Perceptron Algorithm 36350, Data Mining 10 November 2008 Contents 1 Linear Classifiers 1 2 The Perceptron Algorithm 3 1 Linear Classifiers Notation: x is a vector of realvalued

More information

Lecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012

Lecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012 CS-62 Theory Gems September 27, 22 Lecture 4 Lecturer: Aleksander Mądry Scribes: Alhussein Fawzi Learning Non-Linear Classifiers In the previous lectures, we have focused on finding linear classifiers,

More information

Perceptron (Theory) + Linear Regression

Perceptron (Theory) + Linear Regression 10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Unit 2, Section 3: Linear Combinations, Spanning, and Linear Independence Linear Combinations, Spanning, and Linear Independence

Unit 2, Section 3: Linear Combinations, Spanning, and Linear Independence Linear Combinations, Spanning, and Linear Independence Linear Combinations Spanning and Linear Independence We have seen that there are two operations defined on a given vector space V :. vector addition of two vectors and. scalar multiplication of a vector

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Pre-sessional Mathematics for Big Data MSc Class 2: Linear Algebra

Pre-sessional Mathematics for Big Data MSc Class 2: Linear Algebra Pre-sessional Mathematics for Big Data MSc Class 2: Linear Algebra Yuri Kalnishkan September 22, 2018 Linear algebra is vitally important for applied mathematics We will approach linear algebra from a

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

What if There Were No Law of Large Numbers?

What if There Were No Law of Large Numbers? What if There Were No Law of Large Numbers? We said that the Law of Large Numbers applies whenever we make independent observations on a random variable X that has an expected value. In those cases the

More information

LECTURE 10: REVIEW OF POWER SERIES. 1. Motivation

LECTURE 10: REVIEW OF POWER SERIES. 1. Motivation LECTURE 10: REVIEW OF POWER SERIES By definition, a power series centered at x 0 is a series of the form where a 0, a 1,... and x 0 are constants. For convenience, we shall mostly be concerned with the

More information

Linear Programming and its Extensions Prof. Prabha Shrama Department of Mathematics and Statistics Indian Institute of Technology, Kanpur

Linear Programming and its Extensions Prof. Prabha Shrama Department of Mathematics and Statistics Indian Institute of Technology, Kanpur Linear Programming and its Extensions Prof. Prabha Shrama Department of Mathematics and Statistics Indian Institute of Technology, Kanpur Lecture No. # 03 Moving from one basic feasible solution to another,

More information

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning 0.1 Linear Functions So far, we have been looking at Linear Functions { as a class of functions which can 1 if W1 X separate some data and not

More information

Answers Machine Learning Exercises 4

Answers Machine Learning Exercises 4 Answers Machine Learning Exercises 4 Tim van Erven November, 007 Exercises. The following Boolean functions take two Boolean features x and x as input. The features can take on the values and, where represents

More information

6. The scalar multiple of u by c, denoted by c u is (also) in V. (closure under scalar multiplication)

6. The scalar multiple of u by c, denoted by c u is (also) in V. (closure under scalar multiplication) Definition: A subspace of a vector space V is a subset H of V which is itself a vector space with respect to the addition and scalar multiplication in V. As soon as one verifies a), b), c) below for H,

More information

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37 COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

1 Randomized complexity

1 Randomized complexity 80240233: Complexity of Computation Lecture 6 ITCS, Tsinghua Univesity, Fall 2007 23 October 2007 Instructor: Elad Verbin Notes by: Zhang Zhiqiang and Yu Wei 1 Randomized complexity So far our notion of

More information

Efficient Bandit Algorithms for Online Multiclass Prediction

Efficient Bandit Algorithms for Online Multiclass Prediction Efficient Bandit Algorithms for Online Multiclass Prediction Sham Kakade, Shai Shalev-Shwartz and Ambuj Tewari Presented By: Nakul Verma Motivation In many learning applications, true class labels are

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

The Perceptron algorithm

The Perceptron algorithm The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following

More information

The Perceptron Algorithm

The Perceptron Algorithm The Perceptron Algorithm Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Outline The Perceptron Algorithm Perceptron Mistake Bound Variants of Perceptron 2 Where are we? The Perceptron

More information

Error Functions & Linear Regression (2)

Error Functions & Linear Regression (2) Error Functions & Linear Regression (2) John Kelleher & Brian Mac Namee Machine Learning @ DIT Overview 1 Introduction Overview 2 Linear Classifiers Threshold Function Perceptron Learning Rule Training/Learning

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.

More information

Kernelized Perceptron Support Vector Machines

Kernelized Perceptron Support Vector Machines Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:

More information

LINEAR ALGEBRA: THEORY. Version: August 12,

LINEAR ALGEBRA: THEORY. Version: August 12, LINEAR ALGEBRA: THEORY. Version: August 12, 2000 13 2 Basic concepts We will assume that the following concepts are known: Vector, column vector, row vector, transpose. Recall that x is a column vector,

More information

CSC 5170: Theory of Computational Complexity Lecture 4 The Chinese University of Hong Kong 1 February 2010

CSC 5170: Theory of Computational Complexity Lecture 4 The Chinese University of Hong Kong 1 February 2010 CSC 5170: Theory of Computational Complexity Lecture 4 The Chinese University of Hong Kong 1 February 2010 Computational complexity studies the amount of resources necessary to perform given computations.

More information

Notes on Complexity Theory Last updated: December, Lecture 2

Notes on Complexity Theory Last updated: December, Lecture 2 Notes on Complexity Theory Last updated: December, 2011 Jonathan Katz Lecture 2 1 Review The running time of a Turing machine M on input x is the number of steps M takes before it halts. Machine M is said

More information

CSC 5170: Theory of Computational Complexity Lecture 5 The Chinese University of Hong Kong 8 February 2010

CSC 5170: Theory of Computational Complexity Lecture 5 The Chinese University of Hong Kong 8 February 2010 CSC 5170: Theory of Computational Complexity Lecture 5 The Chinese University of Hong Kong 8 February 2010 So far our notion of realistic computation has been completely deterministic: The Turing Machine

More information

2. Two binary operations (addition, denoted + and multiplication, denoted

2. Two binary operations (addition, denoted + and multiplication, denoted Chapter 2 The Structure of R The purpose of this chapter is to explain to the reader why the set of real numbers is so special. By the end of this chapter, the reader should understand the difference between

More information

Lecture 4: Completion of a Metric Space

Lecture 4: Completion of a Metric Space 15 Lecture 4: Completion of a Metric Space Closure vs. Completeness. Recall the statement of Lemma??(b): A subspace M of a metric space X is closed if and only if every convergent sequence {x n } X satisfying

More information

Homogeneous Linear Systems and Their General Solutions

Homogeneous Linear Systems and Their General Solutions 37 Homogeneous Linear Systems and Their General Solutions We are now going to restrict our attention further to the standard first-order systems of differential equations that are linear, with particular

More information

a 2n = . On the other hand, the subsequence a 2n+1 =

a 2n = . On the other hand, the subsequence a 2n+1 = Math 316, Intro to Analysis subsequences. This is another note pack which should last us two days. Recall one of our arguments about why a n = ( 1) n diverges. Consider the subsequence a n = It converges

More information

SVMs, Duality and the Kernel Trick

SVMs, Duality and the Kernel Trick SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today

More information

Security Analytics. Topic 6: Perceptron and Support Vector Machine

Security Analytics. Topic 6: Perceptron and Support Vector Machine Security Analytics Topic 6: Perceptron and Support Vector Machine Purdue University Prof. Ninghui Li Based on slides by Prof. Jenifer Neville and Chris Clifton Readings Principle of Data Mining Chapter

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

Lecture 4: Linear predictors and the Perceptron

Lecture 4: Linear predictors and the Perceptron Lecture 4: Linear predictors and the Perceptron Introduction to Learning and Analysis of Big Data Kontorovich and Sabato (BGU) Lecture 4 1 / 34 Inductive Bias Inductive bias is critical to prevent overfitting.

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

106 CHAPTER 3. TOPOLOGY OF THE REAL LINE. 2. The set of limit points of a set S is denoted L (S)

106 CHAPTER 3. TOPOLOGY OF THE REAL LINE. 2. The set of limit points of a set S is denoted L (S) 106 CHAPTER 3. TOPOLOGY OF THE REAL LINE 3.3 Limit Points 3.3.1 Main Definitions Intuitively speaking, a limit point of a set S in a space X is a point of X which can be approximated by points of S other

More information

1.2 Functions What is a Function? 1.2. FUNCTIONS 11

1.2 Functions What is a Function? 1.2. FUNCTIONS 11 1.2. FUNCTIONS 11 1.2 Functions 1.2.1 What is a Function? In this section, we only consider functions of one variable. Loosely speaking, a function is a special relation which exists between two variables.

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

Warm up. Regrade requests submitted directly in Gradescope, do not instructors.

Warm up. Regrade requests submitted directly in Gradescope, do not  instructors. Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required

More information

Least Mean Squares Regression

Least Mean Squares Regression Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues

Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues O. L. Mangasarian and E. W. Wild Presented by: Jun Fang Multisurface Proximal Support Vector Machine Classification

More information

Error Correcting Codes Prof. Dr. P Vijay Kumar Department of Electrical Communication Engineering Indian Institute of Science, Bangalore

Error Correcting Codes Prof. Dr. P Vijay Kumar Department of Electrical Communication Engineering Indian Institute of Science, Bangalore (Refer Slide Time: 00:54) Error Correcting Codes Prof. Dr. P Vijay Kumar Department of Electrical Communication Engineering Indian Institute of Science, Bangalore Lecture No. # 05 Cosets, Rings & Fields

More information

MTH 2032 SemesterII

MTH 2032 SemesterII MTH 202 SemesterII 2010-11 Linear Algebra Worked Examples Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education December 28, 2011 ii Contents Table of Contents

More information

Kernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin

Kernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin Kernels Machine Learning CSE446 Carlos Guestrin University of Washington October 28, 2013 Carlos Guestrin 2005-2013 1 Linear Separability: More formally, Using Margin Data linearly separable, if there

More information

1 Learning Linear Separators

1 Learning Linear Separators 8803 Machine Learning Theory Maria-Florina Balcan Lecture 3: August 30, 2011 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being

More information

Time-bounded computations

Time-bounded computations Lecture 18 Time-bounded computations We now begin the final part of the course, which is on complexity theory. We ll have time to only scratch the surface complexity theory is a rich subject, and many

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

1111: Linear Algebra I

1111: Linear Algebra I 1111: Linear Algebra I Dr. Vladimir Dotsenko (Vlad) Lecture 13 Dr. Vladimir Dotsenko (Vlad) 1111: Linear Algebra I Lecture 13 1 / 8 The coordinate vector space R n We already used vectors in n dimensions

More information

Classification with Perceptrons. Reading:

Classification with Perceptrons. Reading: Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters

More information

Turing Machines, diagonalization, the halting problem, reducibility

Turing Machines, diagonalization, the halting problem, reducibility Notes on Computer Theory Last updated: September, 015 Turing Machines, diagonalization, the halting problem, reducibility 1 Turing Machines A Turing machine is a state machine, similar to the ones we have

More information

Lecture 1: Period Three Implies Chaos

Lecture 1: Period Three Implies Chaos Math 7h Professor: Padraic Bartlett Lecture 1: Period Three Implies Chaos Week 1 UCSB 2014 (Source materials: Period three implies chaos, by Li and Yorke, and From Intermediate Value Theorem To Chaos,

More information

Linear Classifiers: Expressiveness

Linear Classifiers: Expressiveness Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture outline Linear classifiers: Introduction What functions do linear classifiers express?

More information

Ordinary Differential Equations Prof. A. K. Nandakumaran Department of Mathematics Indian Institute of Science Bangalore

Ordinary Differential Equations Prof. A. K. Nandakumaran Department of Mathematics Indian Institute of Science Bangalore Ordinary Differential Equations Prof. A. K. Nandakumaran Department of Mathematics Indian Institute of Science Bangalore Module - 3 Lecture - 10 First Order Linear Equations (Refer Slide Time: 00:33) Welcome

More information

Lecture 3: the classification of equivalence relations and the definition of a topological space

Lecture 3: the classification of equivalence relations and the definition of a topological space Lecture 3: the classification of equivalence relations and the definition of a topological space Saul Glasman September 12, 2016 If there s a bijection f : X Y, we ll often say that X and Y are in bijection,

More information

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018 CS 301 Lecture 18 Decidable languages Stephen Checkoway April 2, 2018 1 / 26 Decidable language Recall, a language A is decidable if there is some TM M that 1 recognizes A (i.e., L(M) = A), and 2 halts

More information

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller 2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks Todd W. Neller Machine Learning Learning is such an important part of what we consider "intelligence" that

More information

2. Introduction to commutative rings (continued)

2. Introduction to commutative rings (continued) 2. Introduction to commutative rings (continued) 2.1. New examples of commutative rings. Recall that in the first lecture we defined the notions of commutative rings and field and gave some examples of

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

i=1 α ip i, where s The analogue of subspaces

i=1 α ip i, where s The analogue of subspaces Definition: Let X = {P 1,...,P s } be an affine basis for A. If we write P = s i=1 α ip i, where s i=1 α i = 1 then the uniquely determined coefficients, α i, are called the barycentric coordinates of

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Metric spaces and metrizability

Metric spaces and metrizability 1 Motivation Metric spaces and metrizability By this point in the course, this section should not need much in the way of motivation. From the very beginning, we have talked about R n usual and how relatively

More information

HOMEWORK 4: SVMS AND KERNELS

HOMEWORK 4: SVMS AND KERNELS HOMEWORK 4: SVMS AND KERNELS CMU 060: MACHINE LEARNING (FALL 206) OUT: Sep. 26, 206 DUE: 5:30 pm, Oct. 05, 206 TAs: Simon Shaolei Du, Tianshu Ren, Hsiao-Yu Fish Tung Instructions Homework Submission: Submit

More information

Consequences of the Completeness Property

Consequences of the Completeness Property Consequences of the Completeness Property Philippe B. Laval KSU Today Philippe B. Laval (KSU) Consequences of the Completeness Property Today 1 / 10 Introduction In this section, we use the fact that R

More information

This last statement about dimension is only one part of a more fundamental fact.

This last statement about dimension is only one part of a more fundamental fact. Chapter 4 Isomorphism and Coordinates Recall that a vector space isomorphism is a linear map that is both one-to-one and onto. Such a map preserves every aspect of the vector space structure. In other

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

4 Limit and Continuity of Functions

4 Limit and Continuity of Functions Module 2 : Limits and Continuity of Functions Lecture 4 : Limit at a point Objectives In this section you will learn the following The sequential concept of limit of a function The definition of the limit

More information

Lecture #5. Dependencies along the genome

Lecture #5. Dependencies along the genome Markov Chains Lecture #5 Background Readings: Durbin et. al. Section 3., Polanski&Kimmel Section 2.8. Prepared by Shlomo Moran, based on Danny Geiger s and Nir Friedman s. Dependencies along the genome

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

2. Prime and Maximal Ideals

2. Prime and Maximal Ideals 18 Andreas Gathmann 2. Prime and Maximal Ideals There are two special kinds of ideals that are of particular importance, both algebraically and geometrically: the so-called prime and maximal ideals. Let

More information

FIXED POINT ITERATION

FIXED POINT ITERATION FIXED POINT ITERATION The idea of the fixed point iteration methods is to first reformulate a equation to an equivalent fixed point problem: f (x) = 0 x = g(x) and then to use the iteration: with an initial

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 3 THE PERCEPTRON Learning Objectives: Describe the biological motivation behind the perceptron. Classify learning algorithms based on whether they are error-driven

More information