Linear Classification: Linear Programming

Linear Classification: Linear Programming Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 21 Y Tao Linear Classification: Linear Programming

Recall the definition of linear classification. Definition 1. Let R d denote the d-dimensional space where the domain of each dimension is the set R of real values. Let P be a set of points in R d, each of which is colored either red or blue. The goal of the linear classification problem is to determine whether there is a d-dimensional plane x 1 c 1 + x 2 c 2 +... + x d c d = 0 which separates the red points from the blue points in P. In other words, all the red points must fall on the same side of the plane, while all the blue points must fall on the other side. If the plane exists, then P is said to be linearly separable. Otherwise, P is linearly non-separable. 2 / 21 Y Tao Linear Classification: Linear Programming

In this lecture, we will give an algorithm that is able to (i) detect whether P is linearly separable, and (ii) if it is, return a separation plane. Our weapon is to convert the problem to another classic problem called linear programming. 3 / 21 Y Tao Linear Classification: Linear Programming

Definition 2. A half-plane in R d is the set of all points (x 1, x 2,..., x d ) in R d satisfying the following inequality: x 1 c 1 + x 2 c 2 +... + x d c d c d+1 where c 1, c 2,..., c d+1 are real-valued constants. Example 3. y 2 2x + y 2 2 1 x 3x 6 (a) A half-plane in R (b) A half-plane in R 2 4 / 21 Y Tao Linear Classification: Linear Programming

Definition 4 (Linear Programming (LP)). Let S be a set of n half-planes H 1, H 2,..., H n in R d. Let A = H 1 H 2... H n. The goal of the linear programming problem is to decide (i) whether A is empty, and (ii) if A is not empty, return a point in A whose coordinate on the first dimension is the smallest. 5 / 21 Y Tao Linear Classification: Linear Programming

Example 5 (1d LP). H 1 : x 10, H 2 : x 0, H 3 : x 1, H 4 : x 3, H 5 : x 10 A = [1, 3]; answer: x = 1. H 1 : x 10, H 2 : x 0, H 3 : x 4, H 4 : x 3, H 5 : x 10 A = ; answer: no solution. Example 6 (2d LP). p A A = shadow area; answer: p A = ; answer: no solution 6 / 21 Y Tao Linear Classification: Linear Programming

The 1d LP problem can be easily solved in O(n) time (recall that n is the number of half-planes). Think How? 7 / 21 Y Tao Linear Classification: Linear Programming

We now turn our attention to the 2d LP problem. To simplify our discussion, we assume: The planes are in general position. Namely, (i) there do not exist 3 half-planes whose boundary lines cross the same point, and (ii) no boundary line is perpendicular to the x-axis. The optimal solution point is unique. We can ensure this by adding two special planes to S: We will assume that these planes are H 1 and H 2 in the discussion below. 8 / 21 Y Tao Linear Classification: Linear Programming

Now, we give a randomized algorithm to solve the 2d LP problem. Step 1 Randomly permute H 3, H 4,..., H n (we will give a permutation algorithm of running time O(n) in the appendix). Note that the two special planes H 1, H 2 are not permuted. Without loss of generality, let us assume that (H 1,..., H n ) is the sequence of half-planes after the permutation, and l 1,..., l n are their boundary lines, respectively. 9 / 21 Y Tao Linear Classification: Linear Programming

Step 2 The algorithm will then process the half-planes in the order of H 1, H 2,..., H n. The following invariant will be maintained: after having processed H 1,..., H i, the algorithm will be holding a point p satisfying: If A i = H 1 H 2... H i is not empty, then p is a point with the smallest x-coordinate in A i. Otherwise, p is nil. The point p will become the final answer when the algorithm terminates at i = n. To fulfill the requirement for i = 2, we simply set p to the intersection of l 1 and l 2. 10 / 21 Y Tao Linear Classification: Linear Programming

Step 3 We process each H i (i 2) by checking whether the current p falls in H i. If so, then the processing on H i is done. Think In this case, p still has the smallest x-coordinate in H 1 H 2... H i. Why? 11 / 21 Y Tao Linear Classification: Linear Programming

We will first prove a lemma before discussing what to do for the case where p / H i. Lemma 7. If p / H i and A i = H 1... H i is not empty, there must be a point on l i that has the smallest x-coordinate in A i. Proof Suppose that A i is not empty. Let q be a point in A i with the smallest x-coordinate in A i. If q is on l i, then we are done; next, we consider that it is not. Let pq be the line segment connecting p and q. Define A i 1 = H 1... H i 1. A i 1 is a convex region that contains both p and q. It thus follows that A i 1 contains the entire pq. 12 / 21 Y Tao Linear Classification: Linear Programming

Proof (cont.) Since p and q lie on different sides of l i, we know that pq must intersect l i at a point p. We thus know that p falls in all of H 1,..., H i, namely, p A i. p p q l i By definition of p, we know that the x-coordinate of p is less than or equal to that of q. This means that the x-coordinate of p is less than or equal to that of q. 13 / 21 Y Tao Linear Classification: Linear Programming

Lemma 7 shows that if p / H i, then we can focus on the following problem:. Find the point p on l i with the smallest x-coordinate that falls in all of H 1,..., H i 1. For each j [1, i 1], H j intersects l i into a half-line. Hence, there are i 1 half-lines in total. lj l i half-line of h j This is essentially a 1d LP problem defined by all these i 1 half-lines, which we already know can be solved in O(i) time. This completes the algorithm s description. 14 / 21 Y Tao Linear Classification: Linear Programming

Example 8. p A h 4 h 3 h 2 h 1 h 1 h 2 (a) After permutation (b) After processing H 2 p p h 4 h 3 h 2 h 3 h 2 h 1 h 1 (c) After processing H 3 (d) After processing H 4 15 / 21 Y Tao Linear Classification: Linear Programming

Theorem 9. The algorithm runs in O(n) expected time. Proof For each integer i [1, n], let T i be the time we spend on H i after the random permutation at Step 1. Denote by T the total running time. Obviously, T = d i=2 T i. Next, we will prove that E[T i ] = O(1) for all i [2, n], which implies that E[T ] = O(n). It is clear that T 2 = O(1). Fix any i [1, n]. Also, fix a subset Z of S such that Z = n i. Let C be the event that H 1,..., H i is a permutation of S \ Z. Next, we will prove that E[T i C] = O(1). It will follow immediately from Step 1 (random permutation) that E[T i ] = O(1) (think: why?). 16 / 21 Y Tao Linear Classification: Linear Programming

Proof (cont.) Let A i = H 1... H i. We will discuss only the case where A i is not empty (the other case is left to you). Let p be the point in A i with the smallest x-coordinate. p must be the intersection of the boundary lines of two half-planes, say H j1 and H j2. Observe that: If i j 1 and i j 2, then p was already computed before processing H i. In this case, T i = O(1). Otherwise, the processing of H i needs to solve a 1d LP problem, necessitating T i = O(i). However, due to random permutation, we know that i has at most 2/i probability to come from {j 1, j 2 }. Therefore, E[T i C] O(1) i 2 i + O(i) 2 i = O(1). 17 / 21 Y Tao Linear Classification: Linear Programming

Our algorithm can be extended to any dimensionality d. The only change is in Step 3, where we solve a (d 1)-dimensional LP problem if the current p does not fall in H i. As long as d is a constant, the expected running time of the algorithm is still O(n) (the hidden constant is roughly d!). 18 / 21 Y Tao Linear Classification: Linear Programming

Finally, we mention that LP is often defined in an alternative form: Definition 10 (Linear Programming (LP)). Let S be a set of n half-planes H 1, H 2,..., H n in R d. Let A = H 1 H 2... H n. Also, we are given a linear objective function f (p) that takes as input a point p(x 1,..., x d ) in R d, and returns a real value: f (p) = α 1 x 1 + α 2 x 2 +... + α d x d. The goal of the linear programming problem is to decide whether A is empty. If A is not empty, we also need to return a point p A that minimizes f (p). In the version of Definition 4, f (p) is implicitly defined to be the first coordinate of p. In fact, the above definition, which appears to be more general, is the same as the one in Definition 4. Why? 19 / 21 Y Tao Linear Classification: Linear Programming

Reduction from Linear Classification to Linear Programming Let us now return to Definition 1. Denote the points in P as p 1, p 2,..., p n, respectively, where n = P. We require that each point p i (x 1,..., x d ) for i [1, n] should satisfy: { x1 c 1 + x 2 c 2 +... + x d c d c d+1 if p i is red x 1 c 1 + x 2 c 2 +... + x d c d c d+1 if p i is blue In this way, we obtain n inequalities with c 1,..., c d+1 being the unknowns. We aim to maximize the value of c d+1. This is an instance of LP. The LP always returns a solution (because at least c 1 = c 2 =... = c d+1 = 0 satisfy all the inequalities). Let c 1,..., c d+1 be the values returned by the LP. We check whether c d+1 = 0. If so, then we declare that P is not linearly separable. Otherwise, x 1 c 1 + x 2 c 2 +... + x d c d = 0 must be a separation plane (the proof is left as an exercise). 20 / 21 Y Tao Linear Classification: Linear Programming

Appendix: Random Permutation Problem: Let S be an array of n elements. Produce a random permutation of these elements, and store them still in S. Algorithm: for i = 2 to n j = a random number from 1 to i swap S[i] with S[j] 21 / 21 Y Tao Linear Classification: Linear Programming