Linear Classification: Linear Programming

Similar documents
Linear Classification: Linear Programming

c i r i i=1 r 1 = [1, 2] r 2 = [0, 1] r 3 = [3, 4].

Linear Classification: Perceptron

Lecture Notes: Eigenvalues and Eigenvectors. 1 Definitions. 2 Finding All Eigenvalues

A = , A 32 = n ( 1) i +j a i j det(a i j). (1) j=1

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

1 Review Session. 1.1 Lecture 2

2.2 Some Consequences of the Completeness Axiom

NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II

Chapter 2: Linear Programming Basics. (Bertsimas & Tsitsiklis, Chapter 1)

. The following is a 3 3 orthogonal matrix: 2/3 1/3 2/3 2/3 2/3 1/3 1/3 2/3 2/3

CS5314 Randomized Algorithms. Lecture 18: Probabilistic Method (De-randomization, Sample-and-Modify)

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions

Today s exercises. 5.17: Football Pools. 5.18: Cells of Line and Hyperplane Arrangements. Inclass: PPZ on the formula F

CS 374: Algorithms & Models of Computation, Spring 2017 Greedy Algorithms Lecture 19 April 4, 2017 Chandra Chekuri (UIUC) CS374 1 Spring / 1

Consequences of the Completeness Property

Math 105A HW 1 Solutions

Characterisation of Accumulation Points. Convergence in Metric Spaces. Characterisation of Closed Sets. Characterisation of Closed Sets

5 ProbabilisticAnalysisandRandomized Algorithms

Math 110 (S & E) Textbook: Calculus Early Transcendentals by James Stewart, 7 th Edition

Chapter 4. Measure Theory. 1. Measure Spaces

Lecture 2: A Las Vegas Algorithm for finding the closest pair of points in the plane

Nearest Neighbor Search with Keywords: Compression

The Tychonoff Theorem

ADVANCED CALCULUS - MTH433 LECTURE 4 - FINITE AND INFINITE SETS

Homework #2 Solutions Due: September 5, for all n N n 3 = n2 (n + 1) 2 4

A NICE PROOF OF FARKAS LEMMA

Lecture Notes: Solving Linear Systems with Gauss Elimination

1 The linear algebra of linear programs (March 15 and 22, 2015)

3 Measurable Functions

Chapter 12: Differentiation. SSMth2: Basic Calculus Science and Technology, Engineering and Mathematics (STEM) Strands Mr. Migo M.

COMBINATORIAL GROUP THEORY NOTES

CSE 421 Greedy Algorithms / Interval Scheduling

Chapter. Triangles. Copyright Cengage Learning. All rights reserved.

Convex Sets with Applications to Economics

Gleason s theorem and unentangled orthonormal bases

1 Matroid intersection

Notes for Functional Analysis

Cosets and Lagrange s theorem

DOMINO TILINGS INVARIANT GIBBS MEASURES

ST5215: Advanced Statistical Theory

MATH1050 Greatest/least element, upper/lower bound

Introduction to Real Analysis Alternative Chapter 1

Math 530 Lecture Notes. Xi Chen

Lecture 2. MATH3220 Operations Research and Logistics Jan. 8, Pan Li The Chinese University of Hong Kong. Integer Programming Formulations

Computational Geometry Lecture Linear Programming

Vectors. Section 3: Using the vector product

18.175: Lecture 2 Extension theorems, random variables, distributions

Definition 2.3. We define addition and multiplication of matrices as follows.

Vector Spaces and Dimension. Subspaces of. R n. addition and scalar mutiplication. That is, if u, v in V and alpha in R then ( u + v) Exercise: x

Algorithms, Probability, and Computing Final Exam HS14

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

MA554 Assessment 1 Cosets and Lagrange s theorem

EQUIVALENCE RELATIONS (NOTES FOR STUDENTS) 1. RELATIONS

Chapter 0 Preliminaries

Linear Algebra Review: Linear Independence. IE418 Integer Programming. Linear Algebra Review: Subspaces. Linear Algebra Review: Affine Independence

IE 5531: Engineering Optimization I

1 The Erdős Ko Rado Theorem

CS261: Problem Set #3

Integer Programming ISE 418. Lecture 8. Dr. Ted Ralphs

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

Chapter 1. Preliminaries

LECTURE 15: COMPLETENESS AND CONVEXITY

3. Linear Programming and Polyhedral Combinatorics

MATH31011/MATH41011/MATH61011: FOURIER ANALYSIS AND LEBESGUE INTEGRATION. Chapter 2: Countability and Cantor Sets

Baltic Way 2003 Riga, November 2, 2003

Computational Integer Programming Universidad de los Andes. Lecture 1. Dr. Ted Ralphs

Course 212: Academic Year Section 1: Metric Spaces

Groups Subgroups Normal subgroups Quotient groups Homomorphisms Cyclic groups Permutation groups Cayley s theorem Class equations Sylow theorems

Nearest Neighbor Search with Keywords

Introduction to Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Sequential programs. Uri Abraham. March 9, 2014

Mathematics for Economists

Part V. Chapter 19. Congruence of integers

2. Introduction to commutative rings (continued)

Packing Congruent Bricks into a Cube

IE 5531: Engineering Optimization I

Lecture 10: Linear programming duality and sensitivity 0-0

Topics in Theoretical Computer Science April 08, Lecture 8

CHAPTER 7. Connectedness

1 The Well Ordering Principle, Induction, and Equivalence Relations

Computational Integer Programming. Lecture 2: Modeling and Formulation. Dr. Ted Ralphs

5 Set Operations, Functions, and Counting

COMP3121/9101/3821/9801 Lecture Notes. Linear Programming

Introduction to Statistical Learning Theory

a i,1 a i,j a i,m be vectors with positive entries. The linear programming problem associated to A, b and c is to find all vectors sa b

CHAPTER 8: EXPLORING R

1 Primals and Duals: Zero Sum Games

Algebraic Methods in Combinatorics

arxiv: v1 [math.co] 17 Dec 2007

Isomorphisms between pattern classes

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I

Lecture Notes: Matrix Inverse. 1 Inverse Definition. 2 Inverse Existence and Uniqueness

Lecture 12 : Recurrences DRAFT

Lecture 17: Section 4.2

Notes taken by Graham Taylor. January 22, 2005

Sorting. Chapter 11. CSE 2011 Prof. J. Elder Last Updated: :11 AM

LP Definition and Introduction to Graphical Solution Active Learning Module 2

Statistical NLP: Hidden Markov Models. Updated 12/15

The integers. Chapter 3

Transcription:

Linear Classification: Linear Programming Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 21 Y Tao Linear Classification: Linear Programming

Recall the definition of linear classification. Definition 1. Let R d denote the d-dimensional space where the domain of each dimension is the set R of real values. Let P be a set of points in R d, each of which is colored either red or blue. The goal of the linear classification problem is to determine whether there is a d-dimensional plane x 1 c 1 + x 2 c 2 +... + x d c d = 0 which separates the red points from the blue points in P. In other words, all the red points must fall on the same side of the plane, while all the blue points must fall on the other side. If the plane exists, then P is said to be linearly separable. Otherwise, P is linearly non-separable. 2 / 21 Y Tao Linear Classification: Linear Programming

In this lecture, we will give an algorithm that is able to (i) detect whether P is linearly separable, and (ii) if it is, return a separation plane. Our weapon is to convert the problem to another classic problem called linear programming. 3 / 21 Y Tao Linear Classification: Linear Programming

Definition 2. A half-plane in R d is the set of all points (x 1, x 2,..., x d ) in R d satisfying the following inequality: x 1 c 1 + x 2 c 2 +... + x d c d c d+1 where c 1, c 2,..., c d+1 are real-valued constants. Example 3. y 2 2x + y 2 2 1 x 3x 6 (a) A half-plane in R (b) A half-plane in R 2 4 / 21 Y Tao Linear Classification: Linear Programming

Definition 4 (Linear Programming (LP)). Let S be a set of n half-planes H 1, H 2,..., H n in R d. Let A = H 1 H 2... H n. The goal of the linear programming problem is to decide (i) whether A is empty, and (ii) if A is not empty, return a point in A whose coordinate on the first dimension is the smallest. 5 / 21 Y Tao Linear Classification: Linear Programming

Example 5 (1d LP). H 1 : x 10, H 2 : x 0, H 3 : x 1, H 4 : x 3, H 5 : x 10 A = [1, 3]; answer: x = 1. H 1 : x 10, H 2 : x 0, H 3 : x 4, H 4 : x 3, H 5 : x 10 A = ; answer: no solution. Example 6 (2d LP). p A A = shadow area; answer: p A = ; answer: no solution 6 / 21 Y Tao Linear Classification: Linear Programming

The 1d LP problem can be easily solved in O(n) time (recall that n is the number of half-planes). Think How? 7 / 21 Y Tao Linear Classification: Linear Programming

We now turn our attention to the 2d LP problem. To simplify our discussion, we assume: The planes are in general position. Namely, (i) there do not exist 3 half-planes whose boundary lines cross the same point, and (ii) no boundary line is perpendicular to the x-axis. The optimal solution point is unique. We can ensure this by adding two special planes to S: We will assume that these planes are H 1 and H 2 in the discussion below. 8 / 21 Y Tao Linear Classification: Linear Programming

Now, we give a randomized algorithm to solve the 2d LP problem. Step 1 Randomly permute H 3, H 4,..., H n (we will give a permutation algorithm of running time O(n) in the appendix). Note that the two special planes H 1, H 2 are not permuted. Without loss of generality, let us assume that (H 1,..., H n ) is the sequence of half-planes after the permutation, and l 1,..., l n are their boundary lines, respectively. 9 / 21 Y Tao Linear Classification: Linear Programming

Step 2 The algorithm will then process the half-planes in the order of H 1, H 2,..., H n. The following invariant will be maintained: after having processed H 1,..., H i, the algorithm will be holding a point p satisfying: If A i = H 1 H 2... H i is not empty, then p is a point with the smallest x-coordinate in A i. Otherwise, p is nil. The point p will become the final answer when the algorithm terminates at i = n. To fulfill the requirement for i = 2, we simply set p to the intersection of l 1 and l 2. 10 / 21 Y Tao Linear Classification: Linear Programming

Step 3 We process each H i (i 2) by checking whether the current p falls in H i. If so, then the processing on H i is done. Think In this case, p still has the smallest x-coordinate in H 1 H 2... H i. Why? 11 / 21 Y Tao Linear Classification: Linear Programming

We will first prove a lemma before discussing what to do for the case where p / H i. Lemma 7. If p / H i and A i = H 1... H i is not empty, there must be a point on l i that has the smallest x-coordinate in A i. Proof Suppose that A i is not empty. Let q be a point in A i with the smallest x-coordinate in A i. If q is on l i, then we are done; next, we consider that it is not. Let pq be the line segment connecting p and q. Define A i 1 = H 1... H i 1. A i 1 is a convex region that contains both p and q. It thus follows that A i 1 contains the entire pq. 12 / 21 Y Tao Linear Classification: Linear Programming

Proof (cont.) Since p and q lie on different sides of l i, we know that pq must intersect l i at a point p. We thus know that p falls in all of H 1,..., H i, namely, p A i. p p q l i By definition of p, we know that the x-coordinate of p is less than or equal to that of q. This means that the x-coordinate of p is less than or equal to that of q. 13 / 21 Y Tao Linear Classification: Linear Programming

Lemma 7 shows that if p / H i, then we can focus on the following problem:. Find the point p on l i with the smallest x-coordinate that falls in all of H 1,..., H i 1. For each j [1, i 1], H j intersects l i into a half-line. Hence, there are i 1 half-lines in total. lj l i half-line of h j This is essentially a 1d LP problem defined by all these i 1 half-lines, which we already know can be solved in O(i) time. This completes the algorithm s description. 14 / 21 Y Tao Linear Classification: Linear Programming

Example 8. p A h 4 h 3 h 2 h 1 h 1 h 2 (a) After permutation (b) After processing H 2 p p h 4 h 3 h 2 h 3 h 2 h 1 h 1 (c) After processing H 3 (d) After processing H 4 15 / 21 Y Tao Linear Classification: Linear Programming

Theorem 9. The algorithm runs in O(n) expected time. Proof For each integer i [1, n], let T i be the time we spend on H i after the random permutation at Step 1. Denote by T the total running time. Obviously, T = d i=2 T i. Next, we will prove that E[T i ] = O(1) for all i [2, n], which implies that E[T ] = O(n). It is clear that T 2 = O(1). Fix any i [1, n]. Also, fix a subset Z of S such that Z = n i. Let C be the event that H 1,..., H i is a permutation of S \ Z. Next, we will prove that E[T i C] = O(1). It will follow immediately from Step 1 (random permutation) that E[T i ] = O(1) (think: why?). 16 / 21 Y Tao Linear Classification: Linear Programming

Proof (cont.) Let A i = H 1... H i. We will discuss only the case where A i is not empty (the other case is left to you). Let p be the point in A i with the smallest x-coordinate. p must be the intersection of the boundary lines of two half-planes, say H j1 and H j2. Observe that: If i j 1 and i j 2, then p was already computed before processing H i. In this case, T i = O(1). Otherwise, the processing of H i needs to solve a 1d LP problem, necessitating T i = O(i). However, due to random permutation, we know that i has at most 2/i probability to come from {j 1, j 2 }. Therefore, E[T i C] O(1) i 2 i + O(i) 2 i = O(1). 17 / 21 Y Tao Linear Classification: Linear Programming

Our algorithm can be extended to any dimensionality d. The only change is in Step 3, where we solve a (d 1)-dimensional LP problem if the current p does not fall in H i. As long as d is a constant, the expected running time of the algorithm is still O(n) (the hidden constant is roughly d!). 18 / 21 Y Tao Linear Classification: Linear Programming

Finally, we mention that LP is often defined in an alternative form: Definition 10 (Linear Programming (LP)). Let S be a set of n half-planes H 1, H 2,..., H n in R d. Let A = H 1 H 2... H n. Also, we are given a linear objective function f (p) that takes as input a point p(x 1,..., x d ) in R d, and returns a real value: f (p) = α 1 x 1 + α 2 x 2 +... + α d x d. The goal of the linear programming problem is to decide whether A is empty. If A is not empty, we also need to return a point p A that minimizes f (p). In the version of Definition 4, f (p) is implicitly defined to be the first coordinate of p. In fact, the above definition, which appears to be more general, is the same as the one in Definition 4. Why? 19 / 21 Y Tao Linear Classification: Linear Programming

Reduction from Linear Classification to Linear Programming Let us now return to Definition 1. Denote the points in P as p 1, p 2,..., p n, respectively, where n = P. We require that each point p i (x 1,..., x d ) for i [1, n] should satisfy: { x1 c 1 + x 2 c 2 +... + x d c d c d+1 if p i is red x 1 c 1 + x 2 c 2 +... + x d c d c d+1 if p i is blue In this way, we obtain n inequalities with c 1,..., c d+1 being the unknowns. We aim to maximize the value of c d+1. This is an instance of LP. The LP always returns a solution (because at least c 1 = c 2 =... = c d+1 = 0 satisfy all the inequalities). Let c 1,..., c d+1 be the values returned by the LP. We check whether c d+1 = 0. If so, then we declare that P is not linearly separable. Otherwise, x 1 c 1 + x 2 c 2 +... + x d c d = 0 must be a separation plane (the proof is left as an exercise). 20 / 21 Y Tao Linear Classification: Linear Programming

Appendix: Random Permutation Problem: Let S be an array of n elements. Produce a random permutation of these elements, and store them still in S. Algorithm: for i = 2 to n j = a random number from 1 to i swap S[i] with S[j] 21 / 21 Y Tao Linear Classification: Linear Programming