Chapter 6 Interior-Point Approach to Linear Programming

Chapter 6 Interior-Point Approach to Linear Programming Objectives: Introduce Basic Ideas of Interior-Point Methods. Motivate further research and applications. Slide#1

Linear Programming Problem Minimize c T x (Primal) s.t. Ax = b x 0 where A R m n c,x R n b R m Feasible domain P = {x R n Ax = b} {x R n x 0} ւ Affine Subspace ւ positive orthant Primal interior feasible solution x R n s.t. Ax = b,x > 0. Slide#2

Dual Problem Maximize b T w s.t. A T w+s = c s 0 w R m w : dual variables s : dual slacks Dual interior feasible solution: (w,s) R m R n s.t. A T w+s = c s > 0 Slide#3

Primal-Dual Problem Find (x;w;s) R n R m R n such that Ax = b, x 0 (Primal Feasibility) A T w+s = c, s 0 (Dual Feasibility) x T s = 0 (Complementary slackness) Slide#4

What s Special about LP? P = {x R n Ax = b, x 0} is a polyhedral set with vertices. For a consistent LP, its optimum is either unbounded or is attained at one vertex of P. P Slide#5

Solving LP Problems Fact: x* P Optimum Question: How to find x? Slide#6

Simplex Method Step 1: Start at a vertex x 0. Step 2: If current vertex x k is optimal, STOP! x x k. Otherwise, Step 3: Move to a better neighboring vertex x k+1. GOTO Step 2. x 2 x * x 1 x 0 Slide#7

Is Simplex Method Good? In general, it visits about 0.7159m 0.9522 n 0.3109 vertices (linear in m, sub-linear in n). In the worst cases, Klee and Minty (1971) showed that it traverses 2 n 1 vertices (exponential-time algorithm). Large scale problems may take a long time to run. Slide#8

Basic Strategy of Interior-Point Approach Stay inside of P. Check more directions of movement. Shorten travelling path. i.e., Increase complexity at each iteration but reduce total number of iterations. Slide#9

Interior-Point Approach Step 1: Start with an interior feasible solution. Step 2: If the current solution is optimal, STOP! Otherwise, Step 3: Move to a better interior solution. Go to Step 2. - good direction? - right step-length? Slide#10

Interior-Point Methods Projective scaling method Affine scaling method - Primal affine scaling algorithm - Dual affine scaling algorithm - Primal-Dual algorithm Potential reduction method Path-Following method Slide#11

Primal Affine Scaling Algorithm Minimize c T x (P) s.t. Ax = b x 0 Find an interior feasible solution x k R n s.t. Ax k = b and x k > 0. Slide#12

Good direction (A) Reduce the objective value c T x k+1 c T x k c T x k +α k c T d k x c T d k x 0 Candidate: d k x = c (negative gradient) (Steepest descent) Slide#13

(B) Keep feasibility Ax k+1 = b Ax k +α k Ad k x b Ad k x = 0 i.e., d k x N(A) : null space of A Candidate: projected negative gradient d k x = (I AT (AA T ) 1 A)( c) Slide#14

Valid Step-length Fact: As long as d k x N(A) Ax k+1 = b no matter what value α k is. However, x k+1 > 0 is required! i.e., We have to know how far x k is away from the boundary of the non-negative orthant {x R n x 0}. Slide#15

(C) Scaling e = ( 1, 1,, 1) If x k = e, then (1) x k is one-unit away from the boundary. (2) As long as α k < 1, x k+1 > 0. Slide#16

Scale x k to be e Define X k =diag(x k ) = x k 1 0 x k 2 0... x k n, then X 1 k x k = e. Moreover, consider the following transformations: x R n + x = X y k X X -1 k k y = X x R n + y -1 k one-to-one onto boundary to boundary interior to interior Slide#17

x = X y k Min T c x Min c TX y k s.t. A x = b s.t. A X y = b k x > 0 y > 0 x k y k = e y k+1 = y k +α k d k y d k y d k y = [I X k A T (AX 2 ka T ) 1 AX k ] ( X k c) x k+1 = X k y k+1 = X k y k +α k X k d k y d k y = x k + α k d k y dk x d k x = X k[i X k A T (AX 2 k AT ) 1 AX k ]X k c α k = 0.99 (say) 0 < α k < 1 Slide#18

Observations (1) Another way to determine step-length α k : Since d k y = P k ( X k c) AX k d k y = 0 and AX k y k+1 = AX k y k +α k AX k d k y = b, in order to make sure that we need y k+1 > 0, y k +α k d k y > 0, e b 0 Case 1: If d k y 0 then α k (0, ). Case 2: If (d k y ) i < 0 for some i, then 1 α k = min i { (d k (d y) k i < 0} y) i Slide#19

or α α k = min{ (d k y ) (d k y) i < 0} i for some α (0,1). (2) As in (1) x k+1 = X k y k+1 = X k (e+α k d k y ) = x k +α k X k d k y = x k +α k X k ( P k X k c) = x k α k X k [I X k A T (AX 2 k AT ) 1 AX k ]X k c = x k α k X 2 k[c A T (AX 2 ka T ) 1 AX 2 kc] }{{} = x k α k X 2 k [c AT w k ] }{{} d k x w k = x k α k d k x Slide#20

(3) c T x k+1 = c T (x k +α k X k d k y ) = c T x k +α k c T X k ( P k X k c) = c T x k α k P k X k c 2 = c T x k α k d k y 2 -X c k θ k y d = - P X c k k Hence, c T x k+1 c T x k and c T x k+1 < c T x k if d k y 0 Lemma 7.1 If x k P, x k > 0 with d k y > 0, then (P) is unbounded below. Slide#21

(4) For x k P 0 = {x R n Ax = b, x > 0}, if d k y = P k X k c = 0, then X k c falls in the orthogonal space of N(AX k ), i.e., X k c row space of (AX k ) u k s.t. (AX k ) T u k = X k c or (u k ) T AX k = c T X k (u k ) T A = c T For any x P c T x = (u k ) T Ax = (u k ) T b constant Any feasible solution is optimal!! (Lemma 7.2) In particular, x k is optimal! Slide#22

(5) Combining (3) & (4), if the standard form LP is bounded below and c T x is not a constant, then {c T x k k = 1,2, } is well-defined and strictly decreasing. (Lemma 7.3) (6) w k (AX 2 ka T ) 1 AX 2 kc dual estimate r k c A T w k reduced cost If r k 0, then w k is dual feasible and (x k ) T r k = e T X k r k becomes the duality gap, i.e., c T x k b T w k = e T X k r k. Slide#23

Therefore, if r k 0 and e T X k r k = 0, Stopping rule then x k x w k w. (7) d k y = [I X k A T (AXkA 2 T ) 1 AX k ]( X k c) = X k (c A T (AXkA 2 T ) 1 AXkc) 2 = X k (c A T w k ) = X k r k Slide#24

Primal Affine Scaling Algorithm Step 1: Set k 0, ε > 0, 0 < α < 1 Step 2: Compute find x 0 > 0 and Ax 0 = b. w k = (AX 2 k AT ) 1 AX 2 k c r k = c A T w k If then r k 0, and e T X k r k ε STOP! x x k, w w k Otherwise, Step 3: Compute If d k y d k y = X kr k > 0, then STOP! Unbounded. If d k y = 0, then STOP! x x k Otherwise, Slide#25

Step 4: Find α α k = min i { (d k (d k y y) ) i < 0} i x k+1 = x k +α k X k d k y k k +1 Go to Step 2. Slide#26

AN EXAMPLE min 2x 1 +x 2 s.t. x 1 x 2 15 x 2 15 x 1,x 2 0 x 2 15 x = 0 4 x = 0 1 2 x = 0 3 10 15 30 x 1 x = 0 2 Slide#27

Reformulate to standard form min 2x 1 +x 2 s.t. x 1 x 2 +x 3 = 15 x 2 +x 4 = 15 x 1,x 2,x 3,x 4 0 and x 0 = 10 2 7 13 is feasible Slide#28

MATRIX FORMAT where A = min c T x s.t. Ax = b x 0 1 1 1 0 0 1 0 1, b = 15 15 c = 2 1 0 0, x 0 = 10 2 7 13 X 0 = 10 0 2 0 7 13 Slide#29

SCALING y = X 1 0 x = 1 10 1 2 0 0 1 7 1 13 x 1 x 2 x 3 x 4, y 1 = x 1 /10 y 2 = x 2 /2 y 3 = x 3 /7 y 4 = x 4 /13 The problem is transformed to min 2(10y 1 )+(2y 2 ) = 20y 1 +2y 2 s.t. 10y 1 2y 2 +7y 3 = 15 2y 2 +13y 4 = 15 y 1,y 1,y 3,y 4 0 7.5 y = 0 4 y = 0 1 y = 0 3 y 0 1.5 3 y = 0 2 Slide#30

The new matrix form min c T y s.t. Āy = b y 0 where Ā = 10 2 7 0 0 2 0 13, b = 15 15 c = 20 2 0 0, and y 0 = 1 1 1 1 Ã = AX 0 c = X 0 c y 0 = X 1 0 x 0 Slide#31

Slide#32 Step direction in transformed space d 0 y = PX 0 c = +6.66 +0.68 9.33 0.10 y 1 = y 0 + 0.95 9.33 +6.66 +0.68 9.33 0.10 = 1 1 1 1 +0.10 +6.66 +0.68 9.33 0.10 = 1.66 1.07 0.07 0.99

Scale back x = X 0 y x 1 = 10 y 1 = 16.6 x 2 = 2 y 2 = 2.14 x 3 = 7 y 3 = 0.49 x 4 = 13 y 4 = 12.87 15 2 x1 x 0 10 15 30 Slide#33

How to Start? 1. Big-M method (LP) Min c T x s.t. Ax = b x 0 Objective: to make e = Ae = b? 1 1. 1 be feasible, i.e., Method: Adding an artificial variable x a with a large positive number M for (LP ) Min c T x+mx a s.t. Ax+(b Ae)x a = b x 0, x a 0 Slide#34

Properties: (1) (LP ) is a standard form LP with n+1 variables and m constraints. (2) e = 1 1. 1 R n+1 is an interior feasible solution of (LP ). (3) If x a > 0 in (x,x a ) then (LP) is infeasible. Otherwise, either (LP) is unbounded or x is optimal to (LP). Slide#35

2. Two-Phase method (LP) Min c T x s.t. Ax = b x 0 Choose any x 0 > 0, calculate V = b Ax 0. If V = 0, then x 0 is interior feasible. Otherwise, consider (Phase-I) Min u s.t. Ax+Vu = b x 0, u 0 Slide#36

Properties: (1) (Phase-I) is a standard form LP with n+1 variables and m constraints. (2) ˆx 0 = x 0 u 0 = x 0 1 is interior feasible for (Phase-I). (3) (Phase-I) is bounded below by 0. (4) Apply primal-affine scaling to (Phase-I) will generate x u for (Phase-I). If u > 0, (LP) is infeasible. Otherwise, x > 0 for (Phase-II) as an initial feasible solution. Slide#37

Properties of Primal Affine Scaling (1) The convergence proof, i.e., {x k } x under Non-degeneracy assumption (Theorem 7.2) is given by Vanderbei/Meketon/ Freedman in (1985). (2) Convergence proof without Non-degeneracy assumption, T. Tsuchiya (1991) P. Tseng/ Z. Luo (1992) (3) The computational bottleneck is to find (AX 2 ka T ) 1 (4) No polynomial-time proof - J. Lagarias showed primal affine scaling is only of super-linear rate. Slide#38

- N. Megiddo/ M. Shub showed that primal affine scaling might visit all vertices if it moves too close to the boundary. (5) In practice, VMF reported # iterations Simplex Affine Scaling 0.9522 0.7159 m n 0.3109-0.0187 0.1694 7.3885 m n (6) It may lose primal feasibility due to machine accuracy (Phase-I again). (7) May be sensitive to primal degeneracy. Slide#39

Improving Primal-Affine Scaling Objective: Stay away from the boundary! 1. Potential Push Method x c x = c x k T T k x k x k-1 Min n j=1 log e x j s.t. Ax = b, x > 0 c T x = c T x k (x k ) to replace x k Slide#40

2. Logarithmic Barrier Function Method Min c T x µ n j=1 log e x j s.t. Ax = b x > 0 (1) {x (µ) µ > 0} x (2)d k µ = 1 µ X k[i X k A T (AX 2 k AT ) 1 AX k ](X k c µe) = 1 µ X kp k (X k c)+x k P k e = d k x +X k P k e }{{} centering force (3) Polynomial-time proof, i.e., terminates in O( nl) iterations C. Gonzaga (1989) (Problems in Proof!!) C. Roos/ J. Vial (1990) - Total complexity O(n 3 L)! Slide#41

Dual Affine Scaling (D) Max b T w s.t. A T w+s = c s 0 Given (w k,s k ) dual interior feasible, i.e., A T w k +s k = c s k > 0 Objective find (d k w,d k s) and β k > 0 such that w k+1 = w k +β k d k w s k+1 = s k +β k d k s is still dual interior feasible, and b T w k+1 b T w k Slide#42

Observations: (1) Scaling w k R m no scaling needed s k > 0 scale to e = -1 S k 1 1. 1 s k k u = e = 1 1 1 S k s - space u space S k = s k 1 s k 2 0 0... s k n = diag (s k ) u = S 1 k s s = S k u d u = S 1 k d s d s = S k d u Slide#43

(2) Dual Feasibility A T w k+1 +s k+1 }{{} = AT (w k +β k d k w )+(sk +β k d k s ) c = (A T w k +s k ) }{{} +β k }{{} (AT d k w +dk s ) c > 0 A T d k w +d k s = 0 is required! Sk 1 A T d k w +Sk 1 d k }{{ s = 0 } d k u AS 1 k (S 1 k A T d k w +d k u) = 0 (AS 2 k A T )d k w +AS 1 k d k u = 0 d k w = (ASk 2 A T ) 1 AS 1 }{{ k dk } u Q Slide#44

(3) Increase Objective value b T d k w = bt Qd k u 0 We can choose d k u = QT b then b T d k w = b T QQ T b = Q T b 2 0!! d k w = Qd k u = QQ T b = (ASk 2 A T ) 1 ASk 1 S 1 b }{{}}{{} Q = (AS 2 k A T ) 1 b k A T (ASk 2 A T ) 1 Q T and d k s = A T d k w = A T (AS 2 k A T ) 1 b Slide#45

(4) Step-size β k s k+1 = s k }{{} +β k d k s > 0 > 0 (i) d k s = 0, problem (D) has a constant objective value and (w k,s k ) optimal (ii) d k s > 0, β k (0, ) problem (D) is unbounded (iii) some (d k s) i < 0 β k = min i { αsk i (d k s) i (d k s ) i < 0} for α (0,1) Slide#46

(5) Primal estimate then x k = S 2 k d k s Ax k = AS 2 k ( A T d k w) = AS 2 k A T d k w = (AS 2 k A T )(AS 2 k A T ) 1 b = b Hence x k is a primal estimate, once x k 0, then x k is primal feasible. If c T x k b T w k = 0, then x k x w k w s k s Slide#47

(6) Dual Affine Scaling Algorithm Step 1: Set k = 0 and find (w 0,s 0 ) s.t. A T w 0 +s 0 = c, s 0 > 0 Step 2: Set S k = diag (s k ) Compute d k w = (AS 2 k A T ) 1 b d k s = AT d k w Step 3: If d k s = 0, STOP! w k w, s k s If d k s > 0, STOP! (D) is unbounded Step 4: Compute x k = S 2 k d k s If x k 0 and c T x k b T w k ε STOP! w k w, s k s, x k x Slide#48

Step 5: Compute β k = min i { αsk i (d k s) i (d k s ) i < 0} for 0 < α < 1 Step 6: w k+1 = w k +β k d k w s k+1 = s k +β k d k s Set k k +1 Go to Step 2. Slide#49

(7) Starting Dual Affine Scaling Find (w 0,s 0 ) s.t. A T w 0 +s 0 = c s 0 > 0 If c > 0, then w 0 = 0, s 0 = c will do. (Big - M Method) Define p R n p i = 1 if c i 0 0 if c i > 0 Consider, for a large M > 0, (Big-M Problem) Max b T w+mw a s.t. A T w+pw a +s = c w,w a unrestricted s 0 Slide#50

(a) (Big-M) is a standard LP with n constraints and m+1+n variables (b) Define c = max i c i and θ > 1 then w = 0 w a = θ c s = c+θ cp > 0 is an intial interior feasible solution for problem (D). (c) (w a ) 0 = θ c < 0 Since M > 0 is large (w a ) k ր 0 as k ր + if (w a ) k does not approach or cross zero, then problem (D) is infeasible. Slide#51

(8) Performance (i) No polynomial-time proof. (ii) Computational bottleneck (AS 2 k A T ) 1. (iii) Less sensitive to primal degeneracy and numerical errors, but sensitive to dual degeneracy. (iv) Improves dual objective function very fast, but attaining primal feasibility is slow. Slide#52

(9) Improvement (i) Logarithmic Barrier Function Method (µ > 0) Max b T w+µ n j=1 ln[c j A T j w] s.t. A T w < c w = 1 µ (AS 2 K A T ) 1 b (AS 2 }{{} K A T )AS 1 k e }{{} d k w cetering force as µ 0, w k (µ) w J. Renegar O(n 3.5 L) P. Vaidya O(n 3 L) C. Roos/ J. Vial O(n 3 L) Slide#53

(ii) Power Series Method w * s* order 3 w s 0 0 order 1 order 2 continuous trajectory d w(β) d β = limit βk 0 w k+1 w k β k O.D.E. = [AS(β) 2 A T ] 1 b d s(β) d β = ATd w(β) d β Initial condition w(0) = w 0, s(0) = s 0 where S(β) = diag(s 0 +βd s ) Slide#54

Power-Series Expansion: w(β) = w 0 + i=1 β j [ 1 j! ] [dj w(β) d β j ] β=0 s(β) = s 0 + i=1 β j [ 1 j! ][dj s(β) d β j ] β=0 (a) As long as [ dj w(β) d β j ] β=0 and [ dj s(β) d β j ] β=0, j = 1,2, n are known, w(β), s(β) are known (b) Dual Affine Scaling is the case of first-order approximation w(β) = w 0 +β[ d w(β) d β ] β=0 s(β) = s 0 +β[ d s(β) d β ] β=0 (c) A power-series approximation of order 4 or 5 cuts total # of iterations by 1/2. Slide#55

Primal-Dual Algorithm (P) Min c T x s.t. Ax = b x 0 (D) Max b T w s.t. A T w+s = c s 0 Assumptions: (A1) S = {x R n Ax = b, x > 0} (A2)T = {(w,s) R m+n A T w+s = c, s > 0} (A3) A has full row rank. (1) Consider (for µ > 0) (P µ ) Min c T x µ n j=1 ln x j s.t. Ax = b x > 0 Strictly convex objective function with linear constraints. At most one unique global optimum. Completely characterized by K-K-T conditions. Slide#56

L(x,w) = c T x µ n j=1 ln x j +w T (b Ax) (x > 0, w : unrestricted) L w (x,w) = b Ax L x (x,w) = c µ x 1 µ x 2. µ x n A T w Define µ x j = sj > 0 Then Kuhn-Tucker conditions becomes (K-K-T) Ax b = 0, x > 0 A T w+s c = 0, s > 0 XSe = µe x 1 x 2... x n s 1 s 2... s n 1 1. 1 = x 1 s 1 x 2 s 2. x n s n = µ µ. µ or x T s = n j=1 x j s j = nµ Slide#57

(2) Consider (for µ > 0) (D µ ) Max b T w+µ n j=1 ln s j s.t. A T w+s = c s > 0 Same (K-K-T) can be obtained. Basic Ideas (1) For µ > 0, let (x(µ),w(µ),s(µ)) be a solution to K-K-T, then x(µ) is optimal to (P µ ) and (w(µ),s(µ)) optimal to (D µ ). (2) For x(µ) S (w(µ),s(µ)) T duality gap g(µ) = c T x(µ) b T w(µ) = (c T w(µ) T A)x(µ) = s T (µ)x(µ) = nµ Slide#58

(3) As µ 0 g(µ) = 0 : no duality gap x(µ) x w(µ) w s(µ) s (4) Define center path Γ = K-K-T conditions {(x(µ),w(µ),s(µ)) with µ > 0 Follow the center path from large µ > 0 reducing to zero. s Γ µ >>0 (x( ), s( µ )) µ 1 1 (x( µ ), s( µ )) 2 2 µ = 0 (x*,s*) x Slide#59

(5) On Γ, since x j s j = µ, so x and s play equal role, i.e., x and s are not biased toward either (x j = 0) or (s j = 0) - center path (6) Path-Following Stay on(close to) Γ (x(µ 1 ),w(µ 1 ),s(µ 1 )) µ 1 ց µ 2 Move to (x(µ 2 ),w(µ 2 ),s(µ 2 )) µ 2 ց µ 3. (x,w,s ) Slide#60

Questions: (1) Given µ > 0, when does (x(µ), w(µ), s(µ)) exist? (2) How to find it? (3) How to reduce µ? (4) When will (x(µ), w(µ), s(µ)) converge as µ 0? Slide#61

Answers: (1) Lemma 7.6: Under the assumptions (A1)-(A3), x(µ), w(µ), s(µ) exist and x(µ) x w(µ) w as µ 0. s(µ) s (2) To find a solution, consider solving the K-K-T conditions Ax b = 0 A T w+s c = 0 XSe µe = 0 System of nonlinear equations F(z) = 0 z = (x,w,s) To find z (µ) s.t. F(z (µ)) = 0 by Newton Method! Slide#62

Newton Method (i) One dimensional case f(z) z * 2 1 z z f(z 1 ) f(z 2 ) z 1 z 2 f (z 1 ) z 2 = z 1 f(z1 ) f (z 1 ) (ii) Higher dimensional case z 2 = z 1 [J F (z 1 )] 1 F(z 1 ) or J F (z 1 ) z = F(z 1 ) where J F (z 1 ) = and z = z 2 z 1 F i (z) z j z=z 1 Slide#63

Solving Ax b = 0 A T w+is c = 0 XSe µe = 0 Suppose we have (x k,w k,s k ) with x k S, (w k,s k ) T and x k js k j µ k Want to find s.t. (d k x,dk w,dk s ) and (βk P,βk D ) x k+1 = x k +β k P dk x S (w k+1 = w k +β k D dk w ; sk+1 = s k +β k D dk s ) T and x k+1 j s k+1 j µ k+1 < µ k Slide#64

By Newton Method, we have A 0 0 0 A T I S k 0 X k d k x d k w d k s = 0 0 v k where v k = µ k e X k S k e Ad k x = 0 (1) A T d k w +dk s = 0 (2) S k d k x +X k d k s = v k (3) by (2), by (3), AX k S 1 k A T d k w = AX ks 1 k d k s or d k w = (AX k S 1 k A T ) 1 AX k S 1 k d k s (4) d k s = X 1 k v k X 1 k S k d k x AX k S 1 k d k s = AS 1 k v k Ad k x }{{} (5) 0 by (1) Slide#65

by (4) d k w = (AX k S 1 k A T ) 1 AS 1 k v k (6) by (2) d k s = A T d k w (7) by (3) d k x = S 1 k (v k X k d k s) (8) (Observation 1) If d k x > 0 and c T d k x < 0, then (P) is unbounded! If d k s > 0 and bt d k w > 0, then (D) is unbounded! Otherwise β k P = min i β k D = min i αx k i (d k x) i (d k x ) i < 0 αs k i (d k s) i (d k s ) i < 0 (Observation 2) µ k+1 (xk+1 ) T s k+1 n Slide#66

Primal-Dual Algorithm Step 1: k 0, choose ε 1 > 0,0 < α,δ < 1 (x 0,w 0,s 0 ) S T Step 2: µ k (xk ) T s k n v k µ k e X k S k e Step 3: If µ k < ε 1 Stop! x x k, w w k, s s k Otherwise, compute d k w (AX k S 1 k A T ) 1 AS 1 k v k d k s A T d k w d k x S 1 k (v k X k d k s ) Step 4: If d k x = 0 or dk s = 0 µ k (1 δ)µ k (δ > 0,δ < 1) v k µ k e X k S k e Go to Step 3 Otherwise, Slide#67

Step 5: If d k x > 0, c T x k < 0, Stop! (P) is unbounded! If d k s > 0, bt d k w > 0, Stop! (D) is unbounded! Otherwise Step 6: β k P = min i αx k i (d k x) i (d k x ) i < 0 β k D = min i αs k i (d k s ) i (d k s) i < 0 x k+1 x k +βp kdk x w k+1 w k +βdd k k w s k+1 s k +βd kdk s k k +1 Go to Step 2. Slide#68

Properties: (1) The computational bottleneck is to find (AX k S 1 k A T ) 1. (2) The scaling matrix X k S 1 k is a geometric mean of X k and S 1 k. Non-biased toward x or s! More robust then both the primal and the dual algorithms. (3) To start the primal-dual algorithm, check 3.5 on p. 45 of Ch. 7. (4) There are many different ways of choosing µ k and β k P,β k D. Slide#69

Kojima/ Mizuno/ Yoshise (Feb 87) picked µ k and βp, k βd k in a special formula and presented an O(n 4 L) primal-dual algorithm. Monteiro/ Adler (May 87) improved to O(n 3 L) by properly chosen β k P, βk D and µ k+1 = µ k (1 0.1/ n). (5) A practical implementation without requiring x k S, (w k,s k ) T is included in Section 3.6 on p.47 of Ch. 7. Slide#70

Matrix Computation Primal Dual (AX 2 A T ) 1 (AS 2 A T ) 1 Primal-Dual (AXS 1 A T ) 1 Matrix Inversion?? - Cholesky Factorization - Conjugate Gradient - LQ Factorization - Matrix Partition - Infinitely Summable Series (I N) 1 = I +N +N 2 + - Chebychev approximation - Sparsity - Parallel Computing Slide#71

Cholesky Factorization & Forward/ Backward Solve Computational Bottleneck: Primal Dual Primal-Dual (AX 2 k AT ) 1 (AS 2 k A T ) 1 (AX k S 1 k A T ) 1 Observations: 1. Basically, we are solving (AD k A T )u = v for u diagonal with positive elements 2. When A has full row-rank m(< n) and D k has positive diagonals, M = AD k A T is symmetric positive definite. Slide#72

3. When M is symmetric positive definite, a unique lower triangular matrix L s.t. M = LL T and L has positive diagonals. 4. Mu = v LL T u }{{} = v z we may first solve Lz = v for z, then solve L T u = z for u. 5. Lz = v l 11 l 21 l 22 0..... l m1 l m2 l mm z 1 z 2. z m = v 1 v 2. v m Slide#73

l 11 z 1 = v 1 z 1 = v 1 l 11 l 21 z 1 +l 22 z 2 = v 2 z 2 = v 2 l 21 z 1.. l m1 z 1 +l m2 z 2 + +l mm z m = v m z m = v m m 1 i=1 l mi z i l mm z 1 z 2 z m Forward Solve!! l 22 6. L T u = z l 11 l 21 l m1 l 22 l m2... l m 1 m 1 l m m 1 l mm u 1 u 2. u m = z 1 z 2. z m l mm u m = z m u m = z m l mm l m 1 m 1 u m 1 +l m m 1 u m = z m 1 u m 1 = z m 1 l m m 1 u m l m 1 m 1... l 11 u 1 +l 21 u 2 + +l m1 u m = z m u 1 = z m m i=2 l i1 u i l 11 u m u m 1 u 2 u 1 Backward Solve! Slide#74

7. Example M = 1 0 2 1 0 4 8 10 2 8 29 22 1 10 22 42 v = 2 6 16 33 Mu = v, i.e. u = M 1 v 1 0 2 1 0 4 8 10 2 8 29 22 1 10 22 42 = l 11 l 21 l 22 0 l 31 l 32 l 33 l 41 l 42 l 43 l 44 }{{} l 11 l 21 l 31 l 41 l 22 l 32 l 42 l 33 l 43 0 l 44 }{{} l 2 11 = 1 l 11 = 1 = 1 l 11 l 21 = 0 l 21 = 0 l 11 = 0 l 11 l 31 = 2 l 31 = 2 l 11 = 2 l 11 l 41 = 1 l 41 = 1 l 11 = 1 l 21 l 21 +l 22 l 22 = 4 l 22 = 4 0 0 = 2 l 21 l 31 +l 22 l 32 = 8 l 32 = 8 0 2 l 22 = 4 l 21 l 41 +l 22 l 42 = 10 l 42 = 10 0 1 l 22 = 5 L L T 1st column 2nd l31 2 +l2 32 +l2 33 = 29 l 33 = 29 2 2 4 2 = 3 l 31 l 41 +l 32 l 42 +l 33 l 43 = 22 l 43 = 22 2 1 4 5 = 0 l 33 l 2 41 +l2 42 +l2 43 +l2 44 = 42 3rd l 44 = 42 5 2 1 2 0 2 = 4} 4th Slide#75

Slide#76 L = 1 0 0 0 0 2 0 0 2 4 3 0 1 5 0 4 Solve Lz = v 1 0 0 0 0 2 0 0 2 4 3 0 1 5 0 4 z 1 z 2 z 3 z 4 = 2 6 16 33 z 1 = 2, z 2 = 3,z 3 = 0,z 4 = 4 Solve L T u = z 1 0 2 1 0 2 4 5 0 0 3 0 0 0 0 4 u 1 u 2 u 3 u 4 = 2 3 0 4 u 4 = 1, u 3 = 0,u 2 = 1,u 1 = 1 u = 1 1 0 1

8. Cholesky Factorization Algorithm l 11 m 11 for i = 2 to m end l i1 mi1 for j=2 to m l11 for i = j to m s m i j for k=1 to j-1 s s- l i k l j k end if i=j else l j j s l j j s lj j end end end Slide#77

9. Forward Solve z = 1 for end v 1 l 11 i = 2 to m s = 0 for k = 1 to i-1 s s + l ik z k end z i = (v i - s) l ii 9. Backward Solve u = m z m l mm for i = 1 to m-1 end s = 0 for k = m-i+1 to m s s + l k(m-i) u k end u = m-i ( z - s ) m-i l m-i m-i Slide#78

Conjugate Gradient Method Solving Mu = v in at most m iterations, where M is m m. (M.R. Hestenes / E. Stiefel 1952) Basic Idea: (Error correction method) (1) Start from an approximate solution u k. (2) Evaluate an error function h k. (3) Move along a direction d k which reduces the error. (4) Moving directions are mutually conjugate w.r.t. M, i.e., (d k ) T Md j = 0, for j k. (5) u k+1 = u k +α k d k with an appropriate step-size α k. Slide#79

(1) Given u k is known, Mu k? = v (2) Define r k = v Mu k (residual vector) and h k = (r k ) T M 1 r k (error function) i.e., h k = (v Mu k ) T M 1 (v Mu k ) = (u k ) T Mu k 2v T u k +v T M 1 v. (3) If d k is given, then determine α k s.t. u k+1 = u k +α k d k minimizes h k+1. Set h k+1 α k = 0, we have α k = (dk ) T r k (d k ) T Md k = (dk ) T r k (d k ) T p k where p k = Md k. Slide#80

d k : a good direction which reduces the error function. How about the negative gradient? dh k du k = 2(Muk v) = 2r k (4) To be mutually conjugate w.r.t. M, define d k = r k β k Md k 1 and require that We have (Md k 1 ) T d k = 0. β k = (Mdk 1 ) T r k (Md k 1 ) T p k 1 = (pk 1 ) T r k (p k 1 ) T p k 1 where p k 1 = Md k 1. Slide#81

(5) Note that r k+1 = v Mu k+1 = v M(u k +α k d k ) = (v Mu k ) α k Md k = r k α k p k Algorithm CG: Choose arbitrary u 0, k 0, ε > 0. Compute d 0 = r 0 = v Mu 0. Repeat p k = Md k α k = (dk ) T r k (d k ) T p k u k+1 = u k +α k d k r k+1 = r k α k p k β k+1 = (pk ) T r k+1 (p k ) T p k Slide#82

d k+1 = r k+1 β k+1 p k k k +1 until r k+1 ε. Output u k+1 as the solution. Complexity: Matrix-vector multiplication O(m 2 ) Terminate in m iterations O(m) Total O(m 3 ) M : large, sparse, γ non-zeros per row we need (γ + 5)km multiplications where k m. Slide#83

LQ Factorization Basic Idea: M = I B symmetric p.d. 0 < λ i < 1 k=0 B k = (I +B +B 2 + ) M 1 = (I B) 1 = k=0 B k Mu = v u = M 1 v = [I +B +B 2 + ] v (LP) Min c T x s.t. Ax = b x 0 Slide#84

A : m n with full row-rank A = L m m Q m n, QQ T = I }{{} orthonormal Min c T x (LP) s.t. LQx = b x 0 Min c T x (LP ) s.t. Qx = L 1 b x 0 w k = [QD 2 k QT ] 1 }{{} QD2 k c α[i QX k Q T }{{} ] B symmetric p.d. 0 < λ i < 1 Slide#85

Extentions Quadratic Programming: (QP) 1 Min 2 xt Qx+c T x s.t. Ax = b x 0 (QCQP) Min s.t. 1 2 xt Qx+c T x 1 2 xt H k x+h T kx = c k, k = 1,,m x 0 Convex Programming: Min f(x) s.t. Ax = b x 0 Slide#86

Semi-Infinite Programming: Min c T x s.t. n j=1 x j f j (t) g(t) x 0 t T n variables T constraints infinitely many Slide#87

Second-order Cone Programming Min s.t. c T x Ax = b x K (SOC) where x, c R n, b R m, K = {(x 1,x 2,...,x n ) R n x 2 1 +...+x 2 n 1 x n } Semidefinite Programming Min s.t. C X A X = b X S n + (SDP) where X, C S n, b R m, A = and is a linear operator. A 1 A 2. A m,a i S n, Slide#88