Linear Classification: Linear Programming

Similar documents
Linear Classification: Linear Programming

c i r i i=1 r 1 = [1, 2] r 2 = [0, 1] r 3 = [3, 4].

Linear Classification: Perceptron

Lecture Notes: Eigenvalues and Eigenvectors. 1 Definitions. 2 Finding All Eigenvalues

A = , A 32 = n ( 1) i +j a i j det(a i j). (1) j=1

2.2 Some Consequences of the Completeness Axiom

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Consequences of the Completeness Property

1 Review Session. 1.1 Lecture 2

NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II

Chapter 2: Linear Programming Basics. (Bertsimas & Tsitsiklis, Chapter 1)

CS5314 Randomized Algorithms. Lecture 18: Probabilistic Method (De-randomization, Sample-and-Modify)

. The following is a 3 3 orthogonal matrix: 2/3 1/3 2/3 2/3 2/3 1/3 1/3 2/3 2/3

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions

Today s exercises. 5.17: Football Pools. 5.18: Cells of Line and Hyperplane Arrangements. Inclass: PPZ on the formula F

CS 374: Algorithms & Models of Computation, Spring 2017 Greedy Algorithms Lecture 19 April 4, 2017 Chandra Chekuri (UIUC) CS374 1 Spring / 1

CSE 421 Greedy Algorithms / Interval Scheduling

Math 105A HW 1 Solutions

Characterisation of Accumulation Points. Convergence in Metric Spaces. Characterisation of Closed Sets. Characterisation of Closed Sets

5 ProbabilisticAnalysisandRandomized Algorithms

Cosets and Lagrange s theorem

Math 110 (S & E) Textbook: Calculus Early Transcendentals by James Stewart, 7 th Edition

Chapter 4. Measure Theory. 1. Measure Spaces

Linear Algebra Review: Linear Independence. IE418 Integer Programming. Linear Algebra Review: Subspaces. Linear Algebra Review: Affine Independence

Lecture 2: A Las Vegas Algorithm for finding the closest pair of points in the plane

The Tychonoff Theorem

IE 5531: Engineering Optimization I

ADVANCED CALCULUS - MTH433 LECTURE 4 - FINITE AND INFINITE SETS

Algorithms, Probability, and Computing Final Exam HS14

Introduction to Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Homework #2 Solutions Due: September 5, for all n N n 3 = n2 (n + 1) 2 4

Nearest Neighbor Search with Keywords: Compression

A NICE PROOF OF FARKAS LEMMA

MA554 Assessment 1 Cosets and Lagrange s theorem

1 The linear algebra of linear programs (March 15 and 22, 2015)

1 The Well Ordering Principle, Induction, and Equivalence Relations

CS261: Problem Set #3

3 Measurable Functions

Chapter 12: Differentiation. SSMth2: Basic Calculus Science and Technology, Engineering and Mathematics (STEM) Strands Mr. Migo M.

COMBINATORIAL GROUP THEORY NOTES

Lecture Notes: Solving Linear Systems with Gauss Elimination

August 23, 2017 Let us measure everything that is measurable, and make measurable everything that is not yet so. Galileo Galilei. 1.

Lecture Notes: Matrix Inverse. 1 Inverse Definition. 2 Inverse Existence and Uniqueness

1 Primals and Duals: Zero Sum Games

Chapter. Triangles. Copyright Cengage Learning. All rights reserved.

Convex Sets with Applications to Economics

Gleason s theorem and unentangled orthonormal bases

How can we find the distance between a point and a plane in R 3? Between two lines in R 3? Between two planes? Between a plane and a line?

1 Matroid intersection

COMP3121/9101/3821/9801 Lecture Notes. Linear Programming

Notes for Functional Analysis

Studying Rudin s Principles of Mathematical Analysis Through Questions. August 4, 2008

ST5215: Advanced Statistical Theory

MATH1050 Greatest/least element, upper/lower bound

Introduction to Real Analysis Alternative Chapter 1

DOMINO TILINGS INVARIANT GIBBS MEASURES

Lecture 2. MATH3220 Operations Research and Logistics Jan. 8, Pan Li The Chinese University of Hong Kong. Integer Programming Formulations

Math 530 Lecture Notes. Xi Chen

Isomorphisms between pattern classes

Hidden Markov Models Part 1: Introduction

Computational Geometry Lecture Linear Programming

x 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2

Vectors. Section 3: Using the vector product

Sorting. Chapter 11. CSE 2011 Prof. J. Elder Last Updated: :11 AM

Definition 2.3. We define addition and multiplication of matrices as follows.

18.175: Lecture 2 Extension theorems, random variables, distributions

Vector Spaces and Dimension. Subspaces of. R n. addition and scalar mutiplication. That is, if u, v in V and alpha in R then ( u + v) Exercise: x

Lecture 2: Divide and conquer and Dynamic programming

Relations, Functions, and Sequences

EQUIVALENCE RELATIONS (NOTES FOR STUDENTS) 1. RELATIONS

Chapter 0 Preliminaries

5 Group theory. 5.1 Binary operations

1 The Erdős Ko Rado Theorem

Homework 3. Convex Optimization /36-725

1: Introduction to Lattices

Linear Programming. Chapter Introduction

VECTORS AND THE GEOMETRY OF SPACE

Integer Programming ISE 418. Lecture 8. Dr. Ted Ralphs

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

LECTURE 15: COMPLETENESS AND CONVEXITY

Chapter 1. Preliminaries

3. Linear Programming and Polyhedral Combinatorics

MATH31011/MATH41011/MATH61011: FOURIER ANALYSIS AND LEBESGUE INTEGRATION. Chapter 2: Countability and Cantor Sets

Baltic Way 2003 Riga, November 2, 2003

Computational Integer Programming Universidad de los Andes. Lecture 1. Dr. Ted Ralphs

Linear Programming and its Extensions Prof. Prabha Shrama Department of Mathematics and Statistics Indian Institute of Technology, Kanpur

Course 212: Academic Year Section 1: Metric Spaces

Groups Subgroups Normal subgroups Quotient groups Homomorphisms Cyclic groups Permutation groups Cayley s theorem Class equations Sylow theorems

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =

Sequential programs. Uri Abraham. March 9, 2014

Mathematics for Economists

Part V. Chapter 19. Congruence of integers

2. Introduction to commutative rings (continued)

Vector equations of lines in the plane and 3-space (uses vector addition & scalar multiplication).

IE 5531: Engineering Optimization I

Lecture 10: Linear programming duality and sensitivity 0-0

Topics in Theoretical Computer Science April 08, Lecture 8

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

Packing Congruent Bricks into a Cube

Transcription:

Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong

Recall the definition of linear classification. Definition 1. Let R d denote the d-dimensional space where the domain of each dimension is the set R of real values. Let P be a set of points in R d, each of which is colored either red or blue. The goal of the linear classification problem is to determine whether there is a d-dimensional plane x 1 c 1 + x 2 c 2 +... + x d c d = 0 which separates the red points from the blue points in P. In other words, all the red points must fall on the same side of the plane, while all the blue points must fall on the other side. If the plane exists, then P is said to be linearly separable. Otherwise, P is linearly non-separable.

In this lecture, we will give an algorithm that is able to (i) detect whether P is linearly separable, and (ii) if it is, return a separation plane. Our weapon is to convert the problem to another classic problem called linear programming.

Definition 2. A half-plane in R d is the set of all points (x 1, x 2,..., x d ) in R d satisfying the following inequality: x 1 c 1 + x 2 c 2 +... + x d c d c d+1 where c 1, c 2,..., c d+1 are real-valued constants. Example 3. y 2 2x + y 2 2 1 x 3x 6 (a) A half-plane in R (b) A half-plane in R 2

Definition 4 (Linear Programming (LP)). Let S be a set of n half-planes H 1, H 2,..., H n in R d. Let A = H 1 H 2... H n. The goal of the linear programming problem is to decide (i) whether A is empty, and (ii) if A is not empty, return a point in A whose coordinate on the first dimension is the smallest.

Example 5 (1d LP). H 1 : x 10, H 2 : x 0, H 3 : x 1, H 4 : x 3, H 5 : x 10 A = [1, 3]; answer: x = 1. H 1 : x 10, H 2 : x 0, H 3 : x 4, H 4 : x 3, H 5 : x 10 A = ; answer: no solution. Example 6 (2d LP). p A A = shadow area; answer: p A = ; answer: no solution

The 1d LP problem can be easily solved in O(n) time (recall that n is the number of half-planes). Think How?

We now turn our attention to the 2d LP problem. To simplify our discussion, we assume: The planes are in general position. Namely, (i) there do not exist 3 half-planes whose boundary lines cross the same point, and (ii) no boundary line is perpendicular to the x-axis. The optimal solution point is unique. We can ensure this by adding two special planes to S: y x We will assume that these planes are H 1 and H 2 in the discussion below.

Now, we give a randomized algorithm to solve the 2d LP problem. Step 1 Randomly permute H 3, H 4,..., H n (we will give a permutation algorithm of running time O(n) in the appendix). Note that the two special planes H 1, H 2 are not permuted. Without loss of generality, let us assume that (H 1,..., H n ) is the sequence of half-planes after the permutation, and l 1,..., l n are their boundary lines, respectively.

Step 2 The algorithm will then process the half-planes in the order of H 1, H 2,..., H n. The following invariant will be maintained: after having processed H 1,..., H i, the algorithm will be holding a point p satisfying: If A i = H 1 H 2... H i is not empty, then p is a point with the smallest x-coordinate in A i. Otherwise, p is nil. The point p will become the final answer when the algorithm terminates at i = n. To fulfill the requirement for i = 2, we simply set p to the intersection of l 1 and l 2.

Step 3 We process each H i (i 3) by checking whether the current p falls in H i. If so, then the processing on H i is done. Think In this case, p must have the smallest x-coordinate in H 1 H 2... H i. Why?

We will first prove a lemma before discussing what to do for the case where p / H i. Lemma 7. If p / H i and A i = H 1... H i is not empty, there must be a point on l i that has the smallest x-coordinate in A i. Proof Suppose that A i is not empty. Let q be a point in A i with the smallest x-coordinate in A i. If q is on l i, then we are done; next, we consider that it is not. Let pq be the line segment connecting p and q. Define A i 1 = H 1... H i 1. A i 1 is a convex region that contains both p and q. It thus follows that A i 1 contains the entire pq.

Proof (cont.) Since p and q lie on different sides of l i, we know that pq must intersect l i at a point p. This implies that p falls in all of H 1,..., H i, namely, p A i. p p q l i By definition of p, we know that the x-coordinate of p is less than or equal to that of q. This means that the x-coordinate of p is less than or equal to that of q.

Lemma 7 shows that if p / H i, then we can focus on the following problem: Find the point p on l i with the smallest x-coordinate that falls in all of H 1,..., H i 1. For each j [1, i 1], H j intersects l i into a half-line. Hence, there are i 1 half-lines in total. lj l i half-line of h j This is essentially a 1d LP problem defined by all these i 1 half-lines, which we already know can be solved in O(i) time. This completes the algorithm s description.

Example 8. p A H5 H1 H4 H3 H1 H3 H2 H2 (a) After permutation (b) After processing H 3 p p H5 H1 H4 H3 H1 H4 H3 H2 H2 (c) After processing H 4 (d) After processing H 5

Theorem 9. The algorithm runs in O(n) expected time. Proof As before, let H 1, H 2,..., H n be the sequence of planes after the permutation. Remember that S = {H 3, H 4,..., H n } and H 1, H 2 are the two plans we added manually. The processing of H 1, H 2 clearly takes constant time. For each integer i [3, n], let T i be the time we spend on H i. Denote by T the total running time. Obviously, T = d i=3 T i. Next, we will prove that E[T i ] = O(1) for all i [3, n], which implies that E[T ] = O(n). Fix any i [3, n]. Also, fix a subset Z of S with size Z = n 2 i. Let C(Z) be the event that H 3,..., H i is a permutation of S \ Z. Next, we will prove that E[T i C] = O(1). It will follow immediately from Step 1 (random permutation) that E[T i ] = Z E[T i C(Z)] Pr[C(Z)] = O(1) (think: why?).

Proof (cont.) Let A i = H 1... H i. We will discuss only the case where A i is not empty (the other case is left to you). Let p be the point in A i with the smallest x-coordinate. p must be the intersection of the boundary lines of two half-planes, say H j1 and H j2. Observe that: If i j 1 and i j 2, then p was already computed before processing H i. In this case, T i = O(1). Otherwise, the processing of H i needs to solve a 1d LP problem, necessitating T i = O(i). However, due to random permutation, we know that i has at most 2/i probability to come from {j 1, j 2 }. Therefore, E[T i C(Z)] O(1) i 2 i + O(i) 2 i = O(1).

Our algorithm can be extended to any dimensionality d. The only change is in Step 3, where we solve a (d 1)-dimensional LP problem if the current p does not fall in H i. As long as d is a constant, the expected running time of the algorithm is still O(n) (the hidden constant is roughly d!).

Finally, we mention that LP is often defined in an alternative form: Definition 10 (Linear Programming (LP)). Let S be a set of n half-planes H 1, H 2,..., H n in R d. Let A = H 1 H 2... H n. Also, we are given a linear objective function f (p) that takes as input a point p(x 1,..., x d ) in R d, and returns a real value: f (p) = α 1 x 1 + α 2 x 2 +... + α d x d. The goal of the linear programming problem is to decide whether A is empty. If A is not empty, we also need to return a point p A that minimizes f (p). In the version of Definition 4, f (p) is implicitly defined to be the first coordinate of p. In fact, the above definition, which appears to be more general, is the same as the one in Definition 4. Why?

Reduction from Linear Classification to Linear Programming Let us now return to Definition 1. Denote the points in P as p 1, p 2,..., p n, respectively, where n = P. We require that each point p i (x 1,..., x d ) for i [1, n] should satisfy: { x1 c 1 + x 2 c 2 +... + x d c d c d+1 if p i is red x 1 c 1 + x 2 c 2 +... + x d c d c d+1 if p i is blue In this way, we obtain n inequalities with c 1,..., c d+1 being the unknowns. We aim to maximize the value of c d+1. This is an instance of LP. The LP always returns a solution (because at least c 1 = c 2 =... = c d+1 = 0 satisfy all the inequalities). Let c 1,..., c d+1 be the values returned by the LP. We check whether c d+1 = 0. If so, then we declare that P is not linearly separable. Otherwise, x 1 c 1 + x 2 c 2 +... + x d c d = 0 must be a separation plane (the proof is left as an exercise).

Appendix: Random Permutation Problem: Let S be an array of n elements. Produce a random permutation of these elements, and store them still in S. Algorithm: for i = 2 to n j = a random number from 1 to i swap S[i] with S[j]