Linear Classification: Linear Programming

Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong

Recall the definition of linear classification. Definition 1. Let R d denote the d-dimensional space where the domain of each dimension is the set R of real values. Let P be a set of points in R d, each of which is colored either red or blue. The goal of the linear classification problem is to determine whether there is a d-dimensional plane x 1 c 1 + x 2 c 2 +... + x d c d = 0 which separates the red points from the blue points in P. In other words, all the red points must fall on the same side of the plane, while all the blue points must fall on the other side. If the plane exists, then P is said to be linearly separable. Otherwise, P is linearly non-separable.

In this lecture, we will give an algorithm that is able to (i) detect whether P is linearly separable, and (ii) if it is, return a separation plane. Our weapon is to convert the problem to another classic problem called linear programming.

Definition 2. A half-plane in R d is the set of all points (x 1, x 2,..., x d ) in R d satisfying the following inequality: x 1 c 1 + x 2 c 2 +... + x d c d c d+1 where c 1, c 2,..., c d+1 are real-valued constants. Example 3. y 2 2x + y 2 2 1 x 3x 6 (a) A half-plane in R (b) A half-plane in R 2

Definition 4 (Linear Programming (LP)). Let S be a set of n half-planes H 1, H 2,..., H n in R d. Let A = H 1 H 2... H n. The goal of the linear programming problem is to decide (i) whether A is empty, and (ii) if A is not empty, return a point in A whose coordinate on the first dimension is the smallest.

Example 5 (1d LP). H 1 : x 10, H 2 : x 0, H 3 : x 1, H 4 : x 3, H 5 : x 10 A = [1, 3]; answer: x = 1. H 1 : x 10, H 2 : x 0, H 3 : x 4, H 4 : x 3, H 5 : x 10 A = ; answer: no solution. Example 6 (2d LP). p A A = shadow area; answer: p A = ; answer: no solution

The 1d LP problem can be easily solved in O(n) time (recall that n is the number of half-planes). Think How?

We now turn our attention to the 2d LP problem. To simplify our discussion, we assume: The planes are in general position. Namely, (i) there do not exist 3 half-planes whose boundary lines cross the same point, and (ii) no boundary line is perpendicular to the x-axis. The optimal solution point is unique. We can ensure this by adding two special planes to S: y x We will assume that these planes are H 1 and H 2 in the discussion below.

Now, we give a randomized algorithm to solve the 2d LP problem. Step 1 Randomly permute H 3, H 4,..., H n (we will give a permutation algorithm of running time O(n) in the appendix). Note that the two special planes H 1, H 2 are not permuted. Without loss of generality, let us assume that (H 1,..., H n ) is the sequence of half-planes after the permutation, and l 1,..., l n are their boundary lines, respectively.

Step 2 The algorithm will then process the half-planes in the order of H 1, H 2,..., H n. The following invariant will be maintained: after having processed H 1,..., H i, the algorithm will be holding a point p satisfying: If A i = H 1 H 2... H i is not empty, then p is a point with the smallest x-coordinate in A i. Otherwise, p is nil. The point p will become the final answer when the algorithm terminates at i = n. To fulfill the requirement for i = 2, we simply set p to the intersection of l 1 and l 2.

Step 3 We process each H i (i 3) by checking whether the current p falls in H i. If so, then the processing on H i is done. Think In this case, p must have the smallest x-coordinate in H 1 H 2... H i. Why?

We will first prove a lemma before discussing what to do for the case where p / H i. Lemma 7. If p / H i and A i = H 1... H i is not empty, there must be a point on l i that has the smallest x-coordinate in A i. Proof Suppose that A i is not empty. Let q be a point in A i with the smallest x-coordinate in A i. If q is on l i, then we are done; next, we consider that it is not. Let pq be the line segment connecting p and q. Define A i 1 = H 1... H i 1. A i 1 is a convex region that contains both p and q. It thus follows that A i 1 contains the entire pq.

Proof (cont.) Since p and q lie on different sides of l i, we know that pq must intersect l i at a point p. This implies that p falls in all of H 1,..., H i, namely, p A i. p p q l i By definition of p, we know that the x-coordinate of p is less than or equal to that of q. This means that the x-coordinate of p is less than or equal to that of q.

Lemma 7 shows that if p / H i, then we can focus on the following problem: Find the point p on l i with the smallest x-coordinate that falls in all of H 1,..., H i 1. For each j [1, i 1], H j intersects l i into a half-line. Hence, there are i 1 half-lines in total. lj l i half-line of h j This is essentially a 1d LP problem defined by all these i 1 half-lines, which we already know can be solved in O(i) time. This completes the algorithm s description.

Example 8. p A H5 H1 H4 H3 H1 H3 H2 H2 (a) After permutation (b) After processing H 3 p p H5 H1 H4 H3 H1 H4 H3 H2 H2 (c) After processing H 4 (d) After processing H 5

Theorem 9. The algorithm runs in O(n) expected time. Proof As before, let H 1, H 2,..., H n be the sequence of planes after the permutation. Remember that S = {H 3, H 4,..., H n } and H 1, H 2 are the two plans we added manually. The processing of H 1, H 2 clearly takes constant time. For each integer i [3, n], let T i be the time we spend on H i. Denote by T the total running time. Obviously, T = d i=3 T i. Next, we will prove that E[T i ] = O(1) for all i [3, n], which implies that E[T ] = O(n). Fix any i [3, n]. Also, fix a subset Z of S with size Z = n 2 i. Let C(Z) be the event that H 3,..., H i is a permutation of S \ Z. Next, we will prove that E[T i C] = O(1). It will follow immediately from Step 1 (random permutation) that E[T i ] = Z E[T i C(Z)] Pr[C(Z)] = O(1) (think: why?).

Proof (cont.) Let A i = H 1... H i. We will discuss only the case where A i is not empty (the other case is left to you). Let p be the point in A i with the smallest x-coordinate. p must be the intersection of the boundary lines of two half-planes, say H j1 and H j2. Observe that: If i j 1 and i j 2, then p was already computed before processing H i. In this case, T i = O(1). Otherwise, the processing of H i needs to solve a 1d LP problem, necessitating T i = O(i). However, due to random permutation, we know that i has at most 2/i probability to come from {j 1, j 2 }. Therefore, E[T i C(Z)] O(1) i 2 i + O(i) 2 i = O(1).

Our algorithm can be extended to any dimensionality d. The only change is in Step 3, where we solve a (d 1)-dimensional LP problem if the current p does not fall in H i. As long as d is a constant, the expected running time of the algorithm is still O(n) (the hidden constant is roughly d!).

Finally, we mention that LP is often defined in an alternative form: Definition 10 (Linear Programming (LP)). Let S be a set of n half-planes H 1, H 2,..., H n in R d. Let A = H 1 H 2... H n. Also, we are given a linear objective function f (p) that takes as input a point p(x 1,..., x d ) in R d, and returns a real value: f (p) = α 1 x 1 + α 2 x 2 +... + α d x d. The goal of the linear programming problem is to decide whether A is empty. If A is not empty, we also need to return a point p A that minimizes f (p). In the version of Definition 4, f (p) is implicitly defined to be the first coordinate of p. In fact, the above definition, which appears to be more general, is the same as the one in Definition 4. Why?

Reduction from Linear Classification to Linear Programming Let us now return to Definition 1. Denote the points in P as p 1, p 2,..., p n, respectively, where n = P. We require that each point p i (x 1,..., x d ) for i [1, n] should satisfy: { x1 c 1 + x 2 c 2 +... + x d c d c d+1 if p i is red x 1 c 1 + x 2 c 2 +... + x d c d c d+1 if p i is blue In this way, we obtain n inequalities with c 1,..., c d+1 being the unknowns. We aim to maximize the value of c d+1. This is an instance of LP. The LP always returns a solution (because at least c 1 = c 2 =... = c d+1 = 0 satisfy all the inequalities). Let c 1,..., c d+1 be the values returned by the LP. We check whether c d+1 = 0. If so, then we declare that P is not linearly separable. Otherwise, x 1 c 1 + x 2 c 2 +... + x d c d = 0 must be a separation plane (the proof is left as an exercise).

Appendix: Random Permutation Problem: Let S be an array of n elements. Produce a random permutation of these elements, and store them still in S. Algorithm: for i = 2 to n j = a random number from 1 to i swap S[i] with S[j]