Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization. October 15, PDF Free Download

Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization October 15, 2008

Outline Lecture 11 Gradient descent algorithm Improvement to result in Lec 11 At what rate will it converge? Constrained minimization over simple sets Convex Optimization 1

Unconstrained Minimization minimize f(x) subject x R n. Assumption 1: The function f is convex and continuously differentiable over R n The optimal value f = inf x R n f(x) is finite. Gradient descent algorithm x k+1 = x k α f(x k ) Convex Optimization 2

Theorem 1: Bounded Gradients Theorem 1 Let Assumption 1 hold, and suppose that the gradients are bounded. Then, the gradient method generates the sequence {x k } such that lim inf k f(x k) f + αl2 2 We first proved that for y R n and for all k, x k+1 y 2 x k y 2 2α k (f(x k ) f(y)) + α 2 k f(x k) 2. Convex Optimization 3

Theorem 2: Lipschitz Gradients Q-approximation Lemma For continuously differentiable function with Lipschitz gradients, we have f(y) f(x) + f(x) T (y x) + M 2 y x 2 for all x, y R n, Theorem Let Assumption 1 hold, and assume that the gradients of f are Lipschitz. Then, for α with α < 2, we have M lim f(x k) = 0. k If in addition, an optimal solution exists [i.e., the min x f(x) is attained at some x ], then every accumulation point of the sequence {x k } is optimal. Convex Optimization 4

Proof: Using Q-approximation Lemma with y = x k+1 and x = x k, we have f(x k+1 ) f(x k ) α f(x k ) 2 + α2 M 2 f(x k) 2 = f(x k ) α 2 (2 αm) f(x k) 2 By summing these relations and using 2 αm > 0, we can see that k f(x k ) 2 <, implying lim k f(x k ) = 0. Suppose a solution exists, and the sequence {x k } has an accumulation point x. Then, by continuity of the gradient we have f(x k ) f( x) along an appropriate subsequence. Convex Optimization 5

Since f(x k ) 0, it follows f( x) = 0, implying that x is a solution. Where do we use convexity? Correct Theorem 2: Assume that the gradients of f are Lipschitz. Then, for α with α < 2, we have M k=1 f(x k ) 2 <. If in addition, an optimal solution exists [i.e., the min x f(x) is attained at some x ], then every accumulation point of the sequence {x k } is optimal. {x k } need not converge. accumalation point. Note difference between limit point and Example: f(x) = 1 1+x 2. Satisfies conditions and x k. Convex Optimization 6

What does convexity buy? Answer: Convergence of the sequence {x k } Recall/Define X as optimal set. Theorem 3: Let Assumption 1 hold. Further, assume that the gradients of f are Lipschitz and α with α < 2 M. If X is nonempty then lim x k = x, x X. k Proof (outline): Previous theorem tells us a lot. subsequence of {x k } is a point in X. Questions Is there a convergent subsequence? Every convergenent Do all the subsequence converge to the same point in X? Convex Optimization 7

Conditions of Lemma 1 are satisfied. Fix y = x, where x X x k+1 x 2 x k x 2 2α(f(x k ) f ) + α 2 f(x k ) 2. Drop the negative term x k+1 x 2 x k x 2 + α 2 f(x k ) 2. Note from Theorem 2 that l=1 f(x l ) 2 <. Convex Optimization 8

Therefore adding α 2 l=k+1 f(x l ) 2 to both sides we obtain x k+1 x 2 + α 2 l=k+1 f(x l ) 2 x k x 2 + α 2 f(x l ) 2. l=k Define Thus, equation is u k = x k x 2 + α 2 f(x l ) 2. l=k u k+1 u k, which implies {u k } converges. We already know l=k f(x l ) 2 converges as k. Therefore, x k x converges. Therefore, {x k } is bounded => convergent subsequence exists. Take a convergent subsequence {x sk } and let limit point be x. We know Convex Optimization 9

x X from previous Theorem. Thus, lim x s k k x = 0. But we just showed that lim k x k x exists. Therefore, the limit must be 0 and {x k } converge to x. Quick Summary: Without convexity no convergence of iterates. With convexity convergence of iterates. Go back and check f(x) = 1 1+x 2. It can t be convex. General note: What do I get from all this analysis? Convex Optimization 10

Rates of convergence Successfully solved the problem. Finish course and go home? Unfortunately, no. Gradient descent has poor rates of convergence. Emperically observed when we simulate. See last page for an example. Lets consider the class of strongly convex functions. Recall, 2 f(x) mi for all x R n f(y) f(x) + f(x) T (y x) + m 2 1 2m f(x) 2 f(x) f m x 2 x 2 Convex Optimization 11

Theorem 4: Let Assumption 1 hold. Further, assume that f is strongly convex, gradients of f are Lipschitz and α with α < min(2,m). Then, M x k x cq k, 0 < q < 1. Note: Geometric rate of convergence. (Normal people call it exponential;)) Proof: Strongly convex => Unique minimum. Basic iterate relation Convex Optimization 12

with y = x gives x k+1 x 2 x k x 2 2α(f(x k ) f ) + α f(x k ) 2 x k x 2 2α m 2 x k x 2 + α f(x k ) 2 = x k x 2 2α m 2 x k x 2 + α f(x k ) f(x ) 2 x k x 2 αm x k x 2 + α 2 M x k x 2 = ( 1 mα + α 2 M ) x k x 2 = ( 1 mα + α 2 M ) k+1 x0 x 2 Remark: Result holds when 0 < α < 2 m. Geometric is not bad. But, how many functions are strongly convex? Convex Optimization 13

Constrained minimization over simple sets Assumption 2 minimize f(x) subject x X. f is continuously differentiable and convex on an open interval containing X X is closed and compact f > Optimality condition f(x ) T (x x ) 0 Simple sets: Easy projection. Ball, hyperplanes, halfspaces Assumption 2: The set X is closed and convex. Convex Optimization 14

Recall projection P X [x] x P X [x] y when y X P X [x] P X [y] x y when x, y R n Gradient projection algorithm x k+1 = P X [x k α k f(x k )] Can we obtain results similar to Theorems 1-4? Lemma 1 and Theorem 1 Let Assumption 1 and 2 hold, and suppose that the gradients are bounded. Then, for any y X, and all k x k+1 y 2 x k y 2 2α k (f(x k ) f(y)) + α 2 k f(x k) 2. Convex Optimization 15

and hence for α k = α, lim inf k f(x k) f + αl2 2 Proof: By the definition of the method, it follows that for any k, x k+1 y 2 = P X [x k α k f(x k )] y 2 x k α k f(x k ) y 2 x k y 2 2α k f(x k ) T (x k y) + α 2 k f(x k) 2. Rest identical Theorem 2: Not very interesting. After all, f(x ) need not be 0. What we could show is something like lim k f(x k ) T (x x k ) 0 for all x. Convex Optimization 16

Theorem 3: Let Assumption 2 hold. Further, assume that the gradients of f are Lipschitz and α with α < 2 M. If X is nonempty then lim x k = x, x X. k Theorem 4: Let Assumption 2 hold. Further, assume that f is strongly convex, gradients of f are Lipschitz and α with α < min(2,m). Then, M x k x c q k, 0 < q < 1. Convex Optimization 17

Proofs Pack it. I am going home! Prof. Nedić will do it. I think. Convex Optimization 18

Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization. October 15, 2008