Linear Classification: Perceptron
|
|
- Blake Hines
- 6 years ago
- Views:
Transcription
1 Linear Classification: Perceptron Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 18 Y Tao Linear Classification: Perceptron
2 In this lecture, we will consider an important special version of the classification task called linear classification defined as follows. Definition 1. Let R d denote the d-dimensional space where the domain of each dimension is the set R of real values. Let P be a set of points in R d, each of which is colored either red or blue. The goal of the linear classification problem is to determine whether there is a d-dimensional plane x 1 c 1 + x 2 c x d c d = λ which separates the red points from the blue points in P. In other words, all the red points must fall on the same side of the plane, while all the blue points must fall on the other side. No point is allowed to fall on the plane. If the plane exists, then P is said to be linearly separable. Otherwise, P is linearly non-separable. 2 / 18 Y Tao Linear Classification: Perceptron
3 Example 2. Linearly separable Linearly non-separable 3 / 18 Y Tao Linear Classification: Perceptron
4 Think Why do we call the linear classification problem a special case of the classification task? In particular, what is the model, and how do we perform classification on an unknown object? Some reasons why linear classification is important: The datasets in many classification tasks of reality are linearly separable. This is especially true when the objects of the same class form a cluster such that the yes and no clusters are far apart. Linear classification is a scientific problem that has a unique correct answer (i.e., whether the input set is linearly separable). 4 / 18 Y Tao Linear Classification: Perceptron
5 We will first look at a slightly different version of the problem: Definition 3. Let R d denote the d-dimensional space where the domain of each dimension is the set R of real values. Let P be a set of points in R d, each of which is colored either red or blue. The goal is to determine whether there is a d-dimensional plane x 1 c 1 + x 2 c x d c d = 0 such that all the red points must fall on the same side of the plane, while all the blue points must fall on the other side. No point is allowed to fall on the plane. Note that we have restricted the target plane to pass the origin (0, 0,..., 0) of R d. It turns out that the original problem in Definition 1 can be converted into the problem in Definition 3. We will come back to this issue later. 5 / 18 Y Tao Linear Classification: Perceptron
6 We assume that no point in P is at the origin. We further require that if a point p = (x 1, x 2,..., x d ) is red, then it must hold that x 1 c 1 + x 2 c x d c d > 0 Otherwise (i.e., p is blue), then it must hold that x 1 c 1 + x 2 c x d c d < 0 Think Why are we allowed to make these requirements without worrying about which color should be assigned to the > 0 case? 6 / 18 Y Tao Linear Classification: Perceptron
7 Next, we will discuss an algorithm called perceptron. Strictly speaking, this algorithm does not really settle the linear classification problem because when the input set P is linearly non-separable, the algorithm runs forever and never terminates. However, this shortcoming does not prevent perceptron from being a classical method in machine learning. As we will see, the algorithm is surprisingly simple, and guarantees to find a separation plane on a linearly separable dataset. 7 / 18 Y Tao Linear Classification: Perceptron
8 To facilitate our presentation, let us introduce some (standard) definitions and notations on vectors: Define a vector v to be a sequence of d real values (v 1, v 2,..., v d ). Given two vectors v 1 = (a 1, a 2,..., a d ) and v 2 = (b 1, b 2,..., b d ), define: v 1 v 2 as the real value d i=1 a ib i. v 1 + v 2 as the vector (a 1 + b 1, a 2 + b 2,..., a d + b d ). v 1 v 2 as the vector (a 1 b 1, a 2 b 2,..., a d b d ). Each point p(x 1,..., x d ) corresponds to a vector p = (x 1,..., x d ). Denote c = (c 1,..., c d ) where c 1,..., c d are the coefficients of the plane in Definition 3. Hence, if p is a red point, we require p c > 0; otherwise, we require p c < 0. 8 / 18 Y Tao Linear Classification: Perceptron
9 Perceptron The algorithm starts with c = (0, 0,..., 0), and then runs in iterations. In each iteration, it simply checks whether any point in p P violates our requirement according to c. If so, the algorithm adjusts c as follows: If p is red, then c c + p. If p is blue, then c c p. As soon as c has been adjusted, the current iteration finishes; and a new iteration starts. The algorithm finishes if the iteration finds all points of P on the right side of the plane. 9 / 18 Y Tao Linear Classification: Perceptron
10 Example 4. Suppose that P has four 2d points: p 1 = (1, 0), p 2 = (0, 1), p 3 = (0, 1), and p 4 = ( 1, 0). Points p 1 and p 3 are red, and the other are blue. The algorithm starts with c = (0, 0,..., 0). Iteration 1: p 1 c = 0 violating our requirement p 1 c > 0. Hence, we update c to c + p 1 = (1, 0). Iteration 2: p 2 c = 0 violating our requirement p 2 c < 0. Hence, we update c to c p 2 = (1, 0) (0, 1) = (1, 1). Iteration 3: all points satisfy our requirements. Thus, the algorithm finishes with c = (1, 1). 10 / 18 Y Tao Linear Classification: Perceptron
11 Next, we will prove an important theorem, for which purpose we introduce one more notion on a vector v = (v 1,..., v d ): The length of v, denoted as v, is defined to be v v = d i=1 v 2 i. It is well-known that dot product has the following property: v 1 v 2 v 1 v 2 for any v 1 and v 2. Theorem 5. Perceptron always terminates on a linearly separable dataset P Proof. Since P is linearly separable, there is a c such that for all the red points p r P, we have p r c > 0, while for all the blue points p b P, we have p b c < 0. Furthermore, we can assume that c = 1 (think: why?). Next, we will use u to denote this vector c (because we will use the notation c for other purposes). 11 / 18 Y Tao Linear Classification: Perceptron
12 Proof (cont.). Define: γ = min{ p u } p P Note that γ > 0. Also define: R = max p P { p } Recall that the perceptron algorithm adjusts c in each iteration. Let c i (i 1) be the c after the i-th iteration. Also, let c 0 = (0,..., 0) be the initial c before the first iteration. Also, let k be the total number of iterations. 12 / 18 Y Tao Linear Classification: Perceptron
13 Proof (cont.). We will first prove, for any i 0, c i+1 u c i u + γ. Recall that c i+1 was adjusted from c i in one of the following cases: A red point p r violates our requirement, namely, p r c i < 0. In this case, c i+1 = c i + p r ; and hence, c i+1 u = c i u + p r u. From the definition of γ, we know that p r u γ. Therefore, c i+1 u c i u + γ. A blue point p b violates our requirement, namely, p b c i > 0. In this case, c i+1 = c i p b ; and hence, c i+1 u = c i u p b u. From the definition of γ, we know that p b u γ. Therefore, c i+1 u c i u + γ. It follows that c k u c k 1 u + γ... c k 2 u + 2γ c 0 u + kγ = kγ. (1) 13 / 18 Y Tao Linear Classification: Perceptron
14 Proof (cont.). We will now prove, for any i 0, c i+1 2 c i 2 + R 2. Recall that c i+1 was adjusted from c i in one of the following cases: A red point p r violates our requirement, namely, p r c i < 0. In this case, c i+1 = c i + p r. Thus: c i+1 2 = c i+1 c i+1 = ( c i + p r ) ( c i + p r ) = c i c i + 2 c i p r + p r (by def. of R) c i c i p r + R 2 c i 2 + R 2 where the last step used the fact that p r c < 0. A blue point p b violates our requirement, namely, p b c i > 0. The proof is similar, and omitted (a good exercise for you). It follows that c k 2 c k R 2 c k R 2... c kr 2 = kr 2. (2) 14 / 18 Y Tao Linear Classification: Perceptron
15 Proof (cont.). Now we combine (1) and (2) to obtain an upper bound on k. From (1), we know that c k = c k u c k u kγ. Therefore, c k 2 k 2 γ 2. Comparing this to (2) gives: kr 2 k 2 γ 2 k R2 γ / 18 Y Tao Linear Classification: Perceptron
16 There is only one issue remaining. Recall that our original goal was to solve the problem in Definition 1. We instead turned to solve the problem in Definition 3, and claimed that it is ok. Next, we will establish this claim. Let P be a d-dimensional dataset which is the input to Definition 1. We create another dataset P of dimensionality d + 1 for Definition 3 as follows. For each point p(x 1,..., x d ) in P, create a point p (x 1,..., x d, 1) in P, namely, add one more dimension on which the coordinates of all points of P are fixed to 1. The color of p is the same as that of p. 16 / 18 Y Tao Linear Classification: Perceptron
17 Now we prove that the problem of Definition 1 can be converted to that of Definition 3. Lemma 6. P is linearly separable if and only if P is linearly separable. Proof. Direction If. Suppose that x 1 c 1 + x 2 c x d c d + x d+1 c d+1 = 0 is a separation plane in Definition 3. This means that for every red point p(x 1, x 2,..., x d, 1) in P, it holds that x 1 c 1 + x 2 c x d c d + 1 c d+1 > 0. Also, for every blue point p(x 1, x 2,..., x d, 1) in P, it holds that x 1 c 1 + x 2 c x d c d + 1 c d+1 < 0. This means that x 1 c 1 + x 2 c x d c d + c d+1 = 0 is a separation plane in Definition / 18 Y Tao Linear Classification: Perceptron
18 Proof (cont.). Direction Only-If. Suppose that x 1 c 1 + x 2 c x d c d + c d+1 = 0 is a separation plane in Definition 1. This means that for every red point p(x 1, x 2,..., x d ) in P, it holds that x 1 c 1 + x 2 c x d c d + c d+1 > 0. Also, for every blue point p(x 1, x 2,..., x d ) in P, it holds that x 1 c 1 + x 2 c x d c d + c d+1 < 0. This means that x 1 c 1 + x 2 c x d c d + x d+1 c d+1 = 0 is a separation plane in Definition / 18 Y Tao Linear Classification: Perceptron
Linear Classification: Linear Programming
Linear Classification: Linear Programming Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 21 Y Tao Linear Classification: Linear Programming Recall the definition
More informationLinear Classification: Linear Programming
Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong Recall the definition of linear classification. Definition 1. Let R d denote the d-dimensional space where the domain
More informationc i r i i=1 r 1 = [1, 2] r 2 = [0, 1] r 3 = [3, 4].
Lecture Notes: Rank of a Matrix Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk 1 Linear Independence Definition 1. Let r 1, r 2,..., r m
More informationj=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.
Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u
More informationA = , A 32 = n ( 1) i +j a i j det(a i j). (1) j=1
Lecture Notes: Determinant of a Square Matrix Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk 1 Determinant Definition Let A [a ij ] be an
More informationLecture Notes: Solving Linear Systems with Gauss Elimination
Lecture Notes: Solving Linear Systems with Gauss Elimination Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk 1 Echelon Form and Elementary
More informationMultidimensional Divide and Conquer 1 Skylines
Yufei Tao ITEE University of Queensland The next few lectures will be dedicated to an important technique: divide and conquer. You may have encountered the technique in an earlier algorithm course, but
More information. The following is a 3 3 orthogonal matrix: 2/3 1/3 2/3 2/3 2/3 1/3 1/3 2/3 2/3
Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. An n n matrix
More informationCSC321 Lecture 4 The Perceptron Algorithm
CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:
More informationInexact Search is Good Enough
Inexact Search is Good Enough Advanced Machine Learning for NLP Jordan Boyd-Graber MATHEMATICAL TREATMENT Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 1 of 1 Preliminaries:
More informationLecture Notes: Eigenvalues and Eigenvectors. 1 Definitions. 2 Finding All Eigenvalues
Lecture Notes: Eigenvalues and Eigenvectors Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk 1 Definitions Let A be an n n matrix. If there
More informationUsually, when we first formulate a problem in mathematics, we use the most familiar
Change of basis Usually, when we first formulate a problem in mathematics, we use the most familiar coordinates. In R, this means using the Cartesian coordinates x, y, and z. In vector terms, this is equivalent
More informationLecture 6. Notes on Linear Algebra. Perceptron
Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors
More informationLinear Classifiers and the Perceptron
Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers Let s assume that every instance is an n-dimensional vector of real numbers x R n, and there are only two possible
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationLecture Notes: Matrix Inverse. 1 Inverse Definition. 2 Inverse Existence and Uniqueness
Lecture Notes: Matrix Inverse Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Inverse Definition We use I to represent identity matrices,
More informationHigh-Dimensional Indexing by Distributed Aggregation
High-Dimensional Indexing by Distributed Aggregation Yufei Tao ITEE University of Queensland In this lecture, we will learn a new approach for indexing high-dimensional points. The approach borrows ideas
More informationThe Perceptron Algorithm 1
CS 64: Machine Learning Spring 5 College of Computer and Information Science Northeastern University Lecture 5 March, 6 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu Introduction The Perceptron
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationLinear Classifiers and the Perceptron Algorithm
Linear Classifiers and the Perceptron Algorithm 36350, Data Mining 10 November 2008 Contents 1 Linear Classifiers 1 2 The Perceptron Algorithm 3 1 Linear Classifiers Notation: x is a vector of realvalued
More informationLecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012
CS-62 Theory Gems September 27, 22 Lecture 4 Lecturer: Aleksander Mądry Scribes: Alhussein Fawzi Learning Non-Linear Classifiers In the previous lectures, we have focused on finding linear classifiers,
More informationPerceptron (Theory) + Linear Regression
10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative
More informationUnit 2, Section 3: Linear Combinations, Spanning, and Linear Independence Linear Combinations, Spanning, and Linear Independence
Linear Combinations Spanning and Linear Independence We have seen that there are two operations defined on a given vector space V :. vector addition of two vectors and. scalar multiplication of a vector
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationPre-sessional Mathematics for Big Data MSc Class 2: Linear Algebra
Pre-sessional Mathematics for Big Data MSc Class 2: Linear Algebra Yuri Kalnishkan September 22, 2018 Linear algebra is vitally important for applied mathematics We will approach linear algebra from a
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationWhat if There Were No Law of Large Numbers?
What if There Were No Law of Large Numbers? We said that the Law of Large Numbers applies whenever we make independent observations on a random variable X that has an expected value. In those cases the
More informationLECTURE 10: REVIEW OF POWER SERIES. 1. Motivation
LECTURE 10: REVIEW OF POWER SERIES By definition, a power series centered at x 0 is a series of the form where a 0, a 1,... and x 0 are constants. For convenience, we shall mostly be concerned with the
More informationLinear Programming and its Extensions Prof. Prabha Shrama Department of Mathematics and Statistics Indian Institute of Technology, Kanpur
Linear Programming and its Extensions Prof. Prabha Shrama Department of Mathematics and Statistics Indian Institute of Technology, Kanpur Lecture No. # 03 Moving from one basic feasible solution to another,
More informationCS 446: Machine Learning Lecture 4, Part 2: On-Line Learning
CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning 0.1 Linear Functions So far, we have been looking at Linear Functions { as a class of functions which can 1 if W1 X separate some data and not
More informationAnswers Machine Learning Exercises 4
Answers Machine Learning Exercises 4 Tim van Erven November, 007 Exercises. The following Boolean functions take two Boolean features x and x as input. The features can take on the values and, where represents
More information6. The scalar multiple of u by c, denoted by c u is (also) in V. (closure under scalar multiplication)
Definition: A subspace of a vector space V is a subset H of V which is itself a vector space with respect to the addition and scalar multiplication in V. As soon as one verifies a), b), c) below for H,
More informationCOMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37
COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More information1 Randomized complexity
80240233: Complexity of Computation Lecture 6 ITCS, Tsinghua Univesity, Fall 2007 23 October 2007 Instructor: Elad Verbin Notes by: Zhang Zhiqiang and Yu Wei 1 Randomized complexity So far our notion of
More informationEfficient Bandit Algorithms for Online Multiclass Prediction
Efficient Bandit Algorithms for Online Multiclass Prediction Sham Kakade, Shai Shalev-Shwartz and Ambuj Tewari Presented By: Nakul Verma Motivation In many learning applications, true class labels are
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationThe Perceptron Algorithm
The Perceptron Algorithm Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Outline The Perceptron Algorithm Perceptron Mistake Bound Variants of Perceptron 2 Where are we? The Perceptron
More informationError Functions & Linear Regression (2)
Error Functions & Linear Regression (2) John Kelleher & Brian Mac Namee Machine Learning @ DIT Overview 1 Introduction Overview 2 Linear Classifiers Threshold Function Perceptron Learning Rule Training/Learning
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More informationFINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE
FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationLINEAR ALGEBRA: THEORY. Version: August 12,
LINEAR ALGEBRA: THEORY. Version: August 12, 2000 13 2 Basic concepts We will assume that the following concepts are known: Vector, column vector, row vector, transpose. Recall that x is a column vector,
More informationCSC 5170: Theory of Computational Complexity Lecture 4 The Chinese University of Hong Kong 1 February 2010
CSC 5170: Theory of Computational Complexity Lecture 4 The Chinese University of Hong Kong 1 February 2010 Computational complexity studies the amount of resources necessary to perform given computations.
More informationNotes on Complexity Theory Last updated: December, Lecture 2
Notes on Complexity Theory Last updated: December, 2011 Jonathan Katz Lecture 2 1 Review The running time of a Turing machine M on input x is the number of steps M takes before it halts. Machine M is said
More informationCSC 5170: Theory of Computational Complexity Lecture 5 The Chinese University of Hong Kong 8 February 2010
CSC 5170: Theory of Computational Complexity Lecture 5 The Chinese University of Hong Kong 8 February 2010 So far our notion of realistic computation has been completely deterministic: The Turing Machine
More information2. Two binary operations (addition, denoted + and multiplication, denoted
Chapter 2 The Structure of R The purpose of this chapter is to explain to the reader why the set of real numbers is so special. By the end of this chapter, the reader should understand the difference between
More informationLecture 4: Completion of a Metric Space
15 Lecture 4: Completion of a Metric Space Closure vs. Completeness. Recall the statement of Lemma??(b): A subspace M of a metric space X is closed if and only if every convergent sequence {x n } X satisfying
More informationHomogeneous Linear Systems and Their General Solutions
37 Homogeneous Linear Systems and Their General Solutions We are now going to restrict our attention further to the standard first-order systems of differential equations that are linear, with particular
More informationa 2n = . On the other hand, the subsequence a 2n+1 =
Math 316, Intro to Analysis subsequences. This is another note pack which should last us two days. Recall one of our arguments about why a n = ( 1) n diverges. Consider the subsequence a n = It converges
More informationSVMs, Duality and the Kernel Trick
SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today
More informationSecurity Analytics. Topic 6: Perceptron and Support Vector Machine
Security Analytics Topic 6: Perceptron and Support Vector Machine Purdue University Prof. Ninghui Li Based on slides by Prof. Jenifer Neville and Chris Clifton Readings Principle of Data Mining Chapter
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationLecture 4: Linear predictors and the Perceptron
Lecture 4: Linear predictors and the Perceptron Introduction to Learning and Analysis of Big Data Kontorovich and Sabato (BGU) Lecture 4 1 / 34 Inductive Bias Inductive bias is critical to prevent overfitting.
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More information106 CHAPTER 3. TOPOLOGY OF THE REAL LINE. 2. The set of limit points of a set S is denoted L (S)
106 CHAPTER 3. TOPOLOGY OF THE REAL LINE 3.3 Limit Points 3.3.1 Main Definitions Intuitively speaking, a limit point of a set S in a space X is a point of X which can be approximated by points of S other
More information1.2 Functions What is a Function? 1.2. FUNCTIONS 11
1.2. FUNCTIONS 11 1.2 Functions 1.2.1 What is a Function? In this section, we only consider functions of one variable. Loosely speaking, a function is a special relation which exists between two variables.
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More informationWarm up. Regrade requests submitted directly in Gradescope, do not instructors.
Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required
More informationLeast Mean Squares Regression
Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method
More informationLab 12: Structured Prediction
December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?
More informationMultisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues
Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues O. L. Mangasarian and E. W. Wild Presented by: Jun Fang Multisurface Proximal Support Vector Machine Classification
More informationError Correcting Codes Prof. Dr. P Vijay Kumar Department of Electrical Communication Engineering Indian Institute of Science, Bangalore
(Refer Slide Time: 00:54) Error Correcting Codes Prof. Dr. P Vijay Kumar Department of Electrical Communication Engineering Indian Institute of Science, Bangalore Lecture No. # 05 Cosets, Rings & Fields
More informationMTH 2032 SemesterII
MTH 202 SemesterII 2010-11 Linear Algebra Worked Examples Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education December 28, 2011 ii Contents Table of Contents
More informationKernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin
Kernels Machine Learning CSE446 Carlos Guestrin University of Washington October 28, 2013 Carlos Guestrin 2005-2013 1 Linear Separability: More formally, Using Margin Data linearly separable, if there
More information1 Learning Linear Separators
8803 Machine Learning Theory Maria-Florina Balcan Lecture 3: August 30, 2011 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being
More informationTime-bounded computations
Lecture 18 Time-bounded computations We now begin the final part of the course, which is on complexity theory. We ll have time to only scratch the surface complexity theory is a rich subject, and many
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More information1111: Linear Algebra I
1111: Linear Algebra I Dr. Vladimir Dotsenko (Vlad) Lecture 13 Dr. Vladimir Dotsenko (Vlad) 1111: Linear Algebra I Lecture 13 1 / 8 The coordinate vector space R n We already used vectors in n dimensions
More informationClassification with Perceptrons. Reading:
Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters
More informationTuring Machines, diagonalization, the halting problem, reducibility
Notes on Computer Theory Last updated: September, 015 Turing Machines, diagonalization, the halting problem, reducibility 1 Turing Machines A Turing machine is a state machine, similar to the ones we have
More informationLecture 1: Period Three Implies Chaos
Math 7h Professor: Padraic Bartlett Lecture 1: Period Three Implies Chaos Week 1 UCSB 2014 (Source materials: Period three implies chaos, by Li and Yorke, and From Intermediate Value Theorem To Chaos,
More informationLinear Classifiers: Expressiveness
Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture outline Linear classifiers: Introduction What functions do linear classifiers express?
More informationOrdinary Differential Equations Prof. A. K. Nandakumaran Department of Mathematics Indian Institute of Science Bangalore
Ordinary Differential Equations Prof. A. K. Nandakumaran Department of Mathematics Indian Institute of Science Bangalore Module - 3 Lecture - 10 First Order Linear Equations (Refer Slide Time: 00:33) Welcome
More informationLecture 3: the classification of equivalence relations and the definition of a topological space
Lecture 3: the classification of equivalence relations and the definition of a topological space Saul Glasman September 12, 2016 If there s a bijection f : X Y, we ll often say that X and Y are in bijection,
More informationCS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018
CS 301 Lecture 18 Decidable languages Stephen Checkoway April 2, 2018 1 / 26 Decidable language Recall, a language A is decidable if there is some TM M that 1 recognizes A (i.e., L(M) = A), and 2 halts
More information2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller
2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks Todd W. Neller Machine Learning Learning is such an important part of what we consider "intelligence" that
More information2. Introduction to commutative rings (continued)
2. Introduction to commutative rings (continued) 2.1. New examples of commutative rings. Recall that in the first lecture we defined the notions of commutative rings and field and gave some examples of
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationi=1 α ip i, where s The analogue of subspaces
Definition: Let X = {P 1,...,P s } be an affine basis for A. If we write P = s i=1 α ip i, where s i=1 α i = 1 then the uniquely determined coefficients, α i, are called the barycentric coordinates of
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More informationMetric spaces and metrizability
1 Motivation Metric spaces and metrizability By this point in the course, this section should not need much in the way of motivation. From the very beginning, we have talked about R n usual and how relatively
More informationHOMEWORK 4: SVMS AND KERNELS
HOMEWORK 4: SVMS AND KERNELS CMU 060: MACHINE LEARNING (FALL 206) OUT: Sep. 26, 206 DUE: 5:30 pm, Oct. 05, 206 TAs: Simon Shaolei Du, Tianshu Ren, Hsiao-Yu Fish Tung Instructions Homework Submission: Submit
More informationConsequences of the Completeness Property
Consequences of the Completeness Property Philippe B. Laval KSU Today Philippe B. Laval (KSU) Consequences of the Completeness Property Today 1 / 10 Introduction In this section, we use the fact that R
More informationThis last statement about dimension is only one part of a more fundamental fact.
Chapter 4 Isomorphism and Coordinates Recall that a vector space isomorphism is a linear map that is both one-to-one and onto. Such a map preserves every aspect of the vector space structure. In other
More informationActive Learning: Disagreement Coefficient
Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More information4 Limit and Continuity of Functions
Module 2 : Limits and Continuity of Functions Lecture 4 : Limit at a point Objectives In this section you will learn the following The sequential concept of limit of a function The definition of the limit
More informationLecture #5. Dependencies along the genome
Markov Chains Lecture #5 Background Readings: Durbin et. al. Section 3., Polanski&Kimmel Section 2.8. Prepared by Shlomo Moran, based on Danny Geiger s and Nir Friedman s. Dependencies along the genome
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More information2. Prime and Maximal Ideals
18 Andreas Gathmann 2. Prime and Maximal Ideals There are two special kinds of ideals that are of particular importance, both algebraically and geometrically: the so-called prime and maximal ideals. Let
More informationFIXED POINT ITERATION
FIXED POINT ITERATION The idea of the fixed point iteration methods is to first reformulate a equation to an equivalent fixed point problem: f (x) = 0 x = g(x) and then to use the iteration: with an initial
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 3 THE PERCEPTRON Learning Objectives: Describe the biological motivation behind the perceptron. Classify learning algorithms based on whether they are error-driven
More information