Lecture 15 Newton Method and Self-Concordance. October 23, PDF Free Download

Newton Method and Self-Concordance October 23, 2008

Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications for Newton s Method Equality Constrained Minimization Convex Optimization 1

Classical Convergence Analysis of Newton s Method Lecture 15 Given a level set L 0 = {x f(x) f(x 0 )}, it requires for all x, y L 0 : 2 f(x) 2 f(y) L x y 2 mi 2 f(x) MI for some constants L > 0, m > 0, and M > 0 Given a desired error level ɛ, we are interested in an ɛ-solution of the problem i.e., a vector x such that f( x) f + ɛ An upper bound on the number of iterations to generate an ɛ-solution is given by f(x 0 ) f ( ) ɛ0 + log 2 log 2 γ ɛ where γ = σβη 2 m, M 2 η (0, m 2 /L), and ɛ 0 = 2m 3 /L 2 Convex Optimization 2

This follows from L 2m 2 f(x k+k) ( ) L 2 K 2m 2 f(x k) ( 1 2 ) 2 K where k is such that f(x k ) η and η is small enough so that η L 2m 2 1. By the above relation and strong convexity of f, we have 2 f(x k ) f 1 2m f(x k) 2 2m3 L 2 ( 1 2 ) 2 K The bound is conceptually informative, but not practical Furthermore, the constants L, m, and M change with affine transformations of the space, while Newton s method is affine invariant Can a bound be obtained in terms of problem data that is affine invariant and, moreover, practically verifiable? Convex Optimization 3

Self-concordance Nesterov and Nemirovski introduced a notion of self-concordance and a class of self-concordant functions Importance of the self-concordance: Possesses affine invariant property Provides a new tool for analyzing Newton s method that exploits the affine invariance of the method Results in a practical upper bound on the Newton s iterations Plays a crucial role in performance analysis of interior point method Def. A function f : R R is self-concordant when f is convex and f (x) 2f (x) 3/2 for all x domf The rate of change in curvature of f is bounded by the curvature Note: One can use a constant κ other than 2 in the definition Convex Optimization 4

Examples Linear and quadratic functions are self-concordant [f (x) = 0 for all x] Negative logarithm f(x) = ln x, x > 0 is self-concordant: f (x) = 1 x 2, f (x) = 2 f (x) x 3, = 2 for all x > 0 f (x) 3/2 Exponential function e x is not self-concordant: f (x) = f (x) = e x, f (x) f (x) = ex 3/2 e = f (x) 3x/2 e x/2, as x f (x) 3/2 Even-power monomial x 2p, p > 2 is not self-concordant: f (x) = 2p(2p 1)x 2p 2, f (x) = 2p(2p 1)(2p 2)x 2p 3, f (x) f (x) = p 3 x 2p 3 3/2 p 2 x 3(p 1), f (x) as x 0 f (x) 3/2 Convex Optimization 5

Scaling and Affine Invariance In the definition of the self-concordant function, one can use any other κ Let f be self-concordant with κ, then f(x) = κ2 f(x) is self-concordant 4 with κ = 2: f (x) 8κ2 f (x) = f (x) 3/2 4κ 3 f (x) 2 3/2 Self-concordant function is affine invariant: for f self-concordant and x = ay + b, we have f(y) = f(x) self-concordant with the same κ: Convexity is preserved: f is convex f (y) = a 3 f (x), f (y) = a 2 f (x) f (x) a 3 f (x) = f (x) 3/2 a 3 f (x) κ 3/2 Convex Optimization 6

Self-concordant Function in R n Def. A function f : R n R is self-concordant when it is self-concordant along every line, i.e., f is convex g(t) = f(x + tv) is self-concordant for all x domf and all v, Note: The constant κ of self-concordance is independent of the choice of x and v, i.e., g (t) g (t) 3/2 2 for all x domf, v R n, and t R such that x + tv domf Convex Optimization 7

Operations Preserving Self-concordance Lecture 15 Note: To start with, these operations have to preserve convexity Scaling with a positive factor of at least 1: when f is self-concordant and a > 1, then af is also self-concordant The sum f 1 + f 2 of two self-concordant functions is self-concordant (extends to any finite sum) Composition with affine mapping: when f(y) is self-concordant, then f(ax + b) is self-concordant Composition with ln-function: Let g : R R be a convex function with domg = R ++, and g (x) 3 g (x) x for all x > 0 Then: f(x) = ln ( g(x)) ln(x) over {x > 0 g(x) < 0} is self-concordant HW7 Convex Optimization 8

Implications When f (x) > 0 for all x (then f is strictly convex), the self-concordant condition can be written as d dx ( ) f (x) 1 2 1 for all x domf. Assuming that for some t > 0, the interval [0, t] lies in domf, we can integrate from 0 to t, and obtain t t 0 d dx ( ) f (x) 1 2 dx t. Convex Optimization 9

This implies the following lower and upper bounds on f (x), f (0) ( ) 2 f (t) 1 + t f (0) f (0) ( ) 2, (1) 1 t f (0) where the lower bound is valid for all t 0 with t domf, and the upper bound is valid for all t domf with 0 t < f (0) 1/2. Convex Optimization 10

Bounds on f Consider f : R n R with 2 f(x) > 0. Let v be a descent direction, i.e., such that f(x) T v < 0. Consider g(t) = f(x + tv) f(x) for t > 0. Integrating the lower bound of Eq. (1) twice, we obtain ( ) g(t) g(0) + tg (0) + t g (0) ln 1 + t g (0) Consider now v = Newton s direction, i.e., v = [ 2 f(x)] 1 f(x). Let h(t) = f(x + tv). We have h (0) = f(x)v = λ 2 (x), h (0) = v T 2 f(0)v = λ 2 (x). Integrating the lower bound of Eq. (1) twice, we obtain for 0 t < 1 λ(x) h(t) h(0) tλ 2 (x) tλ(x) ln (1 tλ(x)) (2) Convex Optimization 11

Newton Direction for Self-concordant Functions Lecture 15 Let f : R n R be self-concordant and 2 f(x) 0 for all x L 0 Using self-concordance it can be seen that For Newton s decrement λ(x) and any v R n : λ(x) = sup v 0 v T f(x) (v T 2 f(x)v) 1/2 HW7 with the supremum attainment at v = [ 2 f(x)] 1 f(x) Convex Optimization 12

Self-Concordant Functions: Newton Method Analysis Consider Newton s method, started at x 0, with backtracking line search. Assume that f : R n R is self-concordant and strictly convex. (If domf R n, assume the level set {x f(x) f(x 0 )} is closed. Also, assume that f is bounded from below. Under the preceding assumptions an optimizer x of f over R n exists. HW7 Analysis is similar to the classic one except Self-concordance replaces strict convexity and Lipschitz Hessian assumptions Newton decrement replaces the role of the gradient norm Convex Optimization 13

Convergence Result Theorem 1 Assume that f is self-concordant strictly convex function that is bounded below over R n. Then, there exist η (0, 1/4) and γ such that Damped phase If λ k > η, we have f(x k+1 ) f(x k ) γ. Pure Newton If λ k η, the backtracking line search selects α = 1 and 2λ k+1 (2λ k ) 2 where λ k = λ(x k ). Convex Optimization 14

Proof Let h(α) = f(x k + αd k ). As seen in Eq. (2), we have for 0 α < 1 λ k h(α) h(0) αλ 2 k αλ k ln (1 αλ k ) (3) We use this relation to show that the stepsize ˆα = 1 1+λ k satisfies the exit condition of the line search. In particular, letting α = ˆα, we have h(ˆα) h(0) ˆαλ 2 k ˆαλ k ln(1 ˆαλ k ) = h(0) λ k + ln(1 + λ k ) Using the inequality t + ln(1 + t) we obtain t2 2(1+t) for t 0, and σ 1/2, Convex Optimization 15

h(ˆα) h(0) σ λ 2 k 1 + λ k = h(0) σˆαλ 2 k. Therefore, at the exit of the line search, we have α k β 1+λ k, implying λ2 k h(α k ) h(0) σβ. 1 + λ k When λ k > η, since h(α k ) = f(x k+1 ) and h(0) = f(x k ), we have f(x k+1 ) f(x k ) γ, γ = σβ η2 1 + η. Assume that λ k 1 2σ, from Eq. (3) and the inequality 2 x ln(1 x) x 2 /2 + x 3 for 0 x 0.81, we have Convex Optimization 16

h(1) h(0) 1 2 λ2 k + λ3 k h(0) σλ2 k, where the last inequality follows from λ k 1 2σ 2. Thus, the unit step satisfies the sufficient decrease condition (no damping). The proof that 2λ k+1 (2λ k ) 2 left for HW7. The preceding material is from Chapter 9.6 of the book by Boyd and Vandenberghe. Convex Optimization 17

Newton s Method for Self-concordant Functions Lecture 15 Let f : R n R be self-concordant and 2 f(x) 0 for all x L 0 The analysis of Newton s method with backtracking line search: For an ɛ-solution, i.e., x with f( x) f + ɛ An upper bound on the number of iterations is given by f(x 0 ) f Γ 1 η2 + log 2 log 2, Γ = σβ ɛ 1 + η, η = 1 2σ 4 Explicitly: f(x 0 ) f Γ +log 2 log 2 1 ɛ = 20 8σ σβ(1 2σ) [f(x 0) f 1 ]+log 2 2 log 2 ɛ The bound is practical (when f can be underestimated) Convex Optimization 18

Equality Constrained Minimization minimize f(x) subject to Ax = b Lecture 15 Assumption: The function f is convex, twice continuously differentiable The matrix A R p n has Rank A = p Optimal value f is finite and attained If Ax = b for some x relint(domf), there is no duality gap and there exists an optimal dual solution KKT Optimality Conditions state x is optimal if and only if there exists λ such that Ax = b, f(x ) + A T λ = 0 Solving the problem is equivalent to solving the KKT conditions: n + p equations in n + p variables, x R n and λ R p Convex Optimization 19

Equality Constrained Quadratic Minimization 1 minimize 2 xt P x + q T x + r with P S+ n subject to Ax = b Important in extending the Newton s method to equality constraints KKT Optimality Conditions [ ] [ ] [ ] P A T x q A 0 λ = b Coefficient matrix is referred to as KKT matrix The KKT matrix is nonsingular if and only if Ax = 0, x 0 = x T P x > 0 [P is psd over null space of A] Equivalent conditions for nonsingularity of A: P + A T A 0 N (A) N (P ) = {0} Convex Optimization 20

Newton s Method with Equality Constraints Almost the same as Newton s method, except for two differences: The initial iterate x 0 has to be feasible, Ax 0 = b The Newton s directions d k have to be feasible, Ad k = 0 Given iterate x k, Newton s direction d k is determined as a solution to minimize f(x k ) + f(x k )d + 1 2 dt 2 f(x k )d subject to A(x k + d) = b The KKT conditions:(w k is optimal dual for the quadratic minimization) [ 2 f(x k ) A T ] [ dk ] [ f(xk ) ] A 0 w k = 0 Convex Optimization 21

Newton s Method with Equality Constraints Given starting point x domf with Ax = b and tolerance ɛ > 0. Repeat 1. Compute Newton s direction d N (x) by solving the corresponding KKT system. 2. Compute the decrement λ(x) 3. Stopping criterion. Quit if λ 2 /2 ɛ. 4. Line search. Choose stepsize α by backtracking line search. 5. Update. x := x + αd N (x). When f is strictly convex and self-concordant, the bound on the number of iterations required to achieve an ɛ-accuracy is the same as in the unconstrained case Convex Optimization 22

Newton s Method with Infeasible Points Useful when determining an initial feasible iterate x 0 is difficult Linearizing optimality conditions Ax = b and f(x ) + A T λ at some x + d x, with x is possibly infeasible, and using w λ We have f(x ) f(x + d) f(x) + 2 f(x)d Approximate KKT A(x + d) = b, f(x) + 2 f(x)d + A T w = 0 At infeasible x k, the set of linear inequalities determining d k and w k : [ 2 f(x k ) A T ] [ ] [ ] dk f(xk ) = A 0 Ax k b Note: When x k is feasible, the system of equations is the same as in the feasible point Newton s method w k Convex Optimization 23

Equality Constrained Analytic Centering minimize f(x) = n i=1 ln x i subject to Ax = b Feasible point Newton s method: g = f(x), H = 2 f(x) [ H A T A 0 ] [ d w ] = [ g 0 ], g = The Hessian is positive definite 1 x 1. 1 x n [ 1, H = diag,..., 1 x 2 1 x 2 n Lecture 15 KKT matrix first row: Hd+A T w = g d = H 1 (g +A T w) (1) KKT matrix second row, Ad = 0, and Eq. (1) AH 1 (g+a T w) = 0 The matrix A has full row rank, thus AH 1 A T is invertible, hence w = ( AH 1 A ) T 1 AH 1 g, H 1 = diag [ x 2 1,..., x2 n The matrix AH 1 A T is known as Schur complement of H (any H) ] ] Convex Optimization 24

Network Flow Optimization minimize nl=1 φ l (x l ) subject to Ax = b Directed graph with n arcs and p + 1 nodes Variable x l : flow through arc l Cost φ l : cost flow function for arc l, with φ l (t) > 0 Node-incidence matrix Ã R (p+1) n defined as Ã il = 1 arc j originates at node i 1 arc j ends at node i 0 otherwise Reduced node-incidence matrix A R p n is Ã with last row removed b R p is (reduced) source vector Rank A = p when the graph is connected Convex Optimization 25

KKT system for infeasible Newton s method [ H A T A 0 ] [ d w ] = where h = Ax b is a measure of infeasibility at the current point x g = [φ 1(x 1 ),..., φ n (x n )] T [ g h ] H = diag [φ 1(x 1 ),..., φ n(x n )] with positive diagonal entries Solve via elimination: w = (AH 1 A T ) 1 [h AH 1 g], d = H 1 (g + A T w) Convex Optimization 26

Lecture 15 Newton Method and Self-Concordance. October 23, 2008