Convex Optimization and SVM - PDF Free Download

Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence convex. (iii) Since x x x 0 2 x y 2 for all y S} = y Sx x x 0 2 x y 2 }, it is convex as the intersection of half spaces. (iv) Not convex in general. Take for instance S = 1, 1} and T = 0}. (v) Convex! Look: we have x + S 2 S 1 if x + y S 1 for all y S 2. Thus x x + S 2 S 1 } = x x + y S 1 } = (S 1 y), y S 2 y S 2 where each translated set S 1 y is convex. Problem 2. (i) The Lagrangian of the LP optimisation problem is The dual function is L(x, λ) = c t x + λ t (Ax b) = b t λ + (A t λ + c) t x. g(λ) = inf x The Lagrange dual problem is L(x, λ) = bt λ + inf x (At λ + c) t x b t λ if A t λ + c = 0 = otherwise b t λ subject to A t λ + c = 0 λ 0. 1

(ii) We now derive the dual of the dual problem. To start, we rewrite the dual problem in a standard form, b t λ subject to A t λ + c = 0 The Lagrangian of this optimisation problem is λ 0. L(λ, α, β) = b t λ + α t (A t λ + c) β t λ = (b t β t + α t A t )λ + α t c, subject to β 0. The Lagrange dual function is g(α, β) = inf (b β + Aα) t + c t α } λ c t α if Aα + b β = 0 = otherwise The dual of the dual is thus which is equivalent to c t α subject to Aα + b β = 0 β 0, minimize subject to c t α Aα = b β β 0, which, in turn, is equivalent to the original LP problem minimize c t x subject to Ax b. (iii) Using the weaker form of SC, strong duality holds for any LP problem provided the primal is feasible. Applying this to the dual, strong duality holds for LPs if the dual is feasible. The only possible case in which strong duality can fail: if both the primal and the dual are infeasible, for which p = + and d =. Problem 3. (i) The functions g 1 and g 2 are polynomials in R 2. A quick glance at their Hessian shows that they are strictly convex functions. The sets C j = x R 2 g j (x) 0} are both convex. The feasible set is thus convex, as the intersection of two convex sets. 2

(ii) KKT conditions are 2(λ 1 + λ 2 )x + 1 + 8λ 2 2λ 1 = 0 2(λ 1 + λ 2 )y + 1 + 6λ 1 = 0 λ 1 g 1 (x, y) = 0 λ 2 g 2 (x, y) = 0 (iii) Check case by case, depending on the values of λ 1, λ 2. Check that if λ 1 = λ 2 = 0, λ 1 = 0, λ 2 > 0 or λ 1 > 0, λ 2 > 0, then there are no solution. The only solution occurs when λ 1 > 0, λ 2 = 0, which is (x, y ) = (1 2/2, 2/2), for λ 1 = 1/(2y ). Problem 4. (i) Linear programming (ii) Quadratic programming minimize x y 2 2 subject to A 1 x 1 b 1 A 2 x 2 b 2, Problem 6. (i) The feasible set of the LP relaxation includes the feasible set of the Boolean LP. (ii) It follows from (i) that the Boolean LP is infeasible is the LP relaxation is infeasible. (iii) The Lagrangian function is L(x, λ, ν) = x t diag(ν)x + (c + A t λ) t x b t λ. Minimizing over x, we obtain the Lagrange dual function g(λ, ν) = 1 (c i + a t iλ + ν i ) 2 b t λ, 4 ν i if ν 0, and otherwise, where a i represents the i-th column of A. (iv) The dual of the LP relaxation problem can be found to be subject to b t u 1 t w A t u + c + w = v u 0 v 0 w 0 A careful comparison between this problem and the dual problem derived in (iii) show that they are equivalent: they return the same solution. 3

Problem 7. (i) Points inside the tube [y(x i ) ɛ, y(x i ) + ɛ] are such that their associated slack variables are equal to zero. Points are allowed to lie outside the ɛ-tube provided that their slack variables are non-zero, in which case the penalty is linear The error function t i y(x i ) ɛ = ξ i = penalty induced y(x i ) t i ɛ = ˆξ i = penalty induced C (ξ i + ˆξ i ) + 1 2 β 2 must be minimised subject to the non-negativity of the slack variables, t i y(x i ) + ɛ + ξ i and t i y(x i ) ɛ ˆξ i. (ii) The Lagrangian is L(β 0, β, ξ, ˆξ, λ, ˆλ, ν, ˆν) = C (ξ i + ˆξ i ) + 1 2 β 2 (λ i ξ i + ˆλ i ˆξi ) ν i (ɛ + ξ i + y i t i ) ˆν i (ɛ + ˆξ i y i + t i ). Primal conditions are ξ i 0, ˆξ i 0, t i y(x i ) + ɛ + ξ i and t i y(x i ) ɛ ˆξ i. Dual conditions are λ i, ˆλ i, ν i, ˆν i 0. Complementary slackness ensures λ i ξ i = 0, ˆλ i ˆξi = 0, ν i (ɛ + ξ i + y(x i ) t i ) = 0 and ˆν i (ɛ + ˆξ i y(x i ) + t i ) = 0. Gradient of the Lagrangian vanishes β 0 = (ν i ˆν i ) = 0 β = β (ν i ˆν i )x i = 0 ξ i = C λ i ν i = 0 ˆξ i = C ˆλ i ˆν i = 0 (iii) Check that the dual problem reduces to 1 (ν i ˆν i )(ν j ˆν j )x t 2 ix j ɛ i,j subject to 0 ν i C 0 ˆν i C (ν i + ˆν i ) + (ν i ˆν i )t i 4

(iv) y (x i ) = x t iβ + β 0 = n j=1 (ν j ˆν j )x t ix j + β 0 (v) Points with ν > 0 lie strictly above the upper boundary. Points with ˆν > 0 lie strictly below the lower boundary. For each observation outside the tube, either ν or ˆν is non-zero, they cannot be both non-zero at the same time (check complementary slackness conditions). The SV are such that either ν or ˆν is non-zero. Points inside the tube have ν = ˆν = 0 and do not contribute to the solution. (vi) Pick a point on the upper boundary (for which 0 < ν i < C), for which and average over all such points. β 0 = t i ɛ (νj ˆν j )x t ix j, j=1 5