Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

Size: px
Start display at page:

Download "Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University"

Transcription

1 Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40

2 Outline Unconstrained minimization problems Descent methods Gradient descent method Steepest descent method Newton s method Self-concordance Implementation SJTU Ying Cui 2 / 40

3 Unconstrained minimization assumptions: min x f(x) assume f : R n R is convex, twice continuously differentiable (implying that domf is open) assume there exists an optimal point x (optimal value p = inf x f(x) is attained and finite) a necessary and sufficient condition for optimality: f(x ) = 0 solving unconstrained minimization problem is the same as finding a solution of optimality equation in a few special cases, can be solved analytically usually, must be solved by an iterative algorithm produce a sequence of points x (k) dom f, k = 0,1,... with f(x (k) ) p, as k terminated when f(x (k) ) p ǫ for some tolerance ǫ > 0 SJTU Ying Cui 3 / 40

4 Initial point and sublevel set algorithms in this chapter require a starting point x (0) such that x (0) domf sublevel set S = {x f(x) f(x (0) )} is closed (hard to verify) 2nd condition is satisfied for all x (0) domf if f is closed, i.e., all sublevel sets are closed, equivalent to epi f is closed true if f is continuous and dom f = R n true if f(x) as x bd dom f examples of differentiable functions with closed sublevel sets: m f(x) = log( exp(a T i x+b i)), i=1 m f(x) = log(b i a T i x) i=1 SJTU Ying Cui 4 / 40

5 Strong convexity and implications f is strongly convex on S if there exists an m > 0 such that implications 2 f(x) mi for all x S for x,y S, f(y) f(x)+ f(x) T (y x)+ m 2 x y 2 2 m = 0: recover the basic inequality characterizing convexity m > 0: a better lower bound than follows from convexity alone imply that S is bounded p > and for x S, f(x) p 1 2m f(x) 2 2 if gradient is small at a point, then the point is nearly optimal a condition for suboptimality generalizing optimality condition f(x) 2 (2mǫ) 1/2 = f(x) p ǫ useful as a stopping criterion if m is known upper bound on f(x): there exists an M > 0 such that 2 f(x) MI for all x S SJTU Ying Cui 5 / 40

6 Condition number of matrix and convex set condition number of a matrix: the ratio of its largest eigenvalue to its smallest eigenvalue condition number of a convex set: square of the ratio of its maximum width to its minimum width width of a convex set C in the direction q with q 2 = 1: W(C,q) = sup z C q T z inf z C q T z minimum width and maximum width of C: W min = inf q 2=1W(C,q) and W max = sup q 2=1 W(C,q) condition number of C: cond(c) = W 2 max W 2 min a measure of its anisotropy or eccentricity: cond(c) small means C has approximately the same width in all directions (nearly spherical); cond(c) large means that C is far wider in some directions than in others SJTU Ying Cui 6 / 40

7 Condition number of sublevel sets mi 2 f(x) MI for all x S upper bound of condition number of 2 f(x): cond( 2 f(x)) M/m upper bound of condition number of sublevel set C α = {x f(x) α}, p < α f(x (0) ): geometric interpretation: cond(c α ) M/m lim α p cond(c α) = cond( 2 f(x )) condition number of the sublevel sets of f (which is bounded by M/m) has a strong effect on the efficiency of some common methods for unconstrained minimization SJTU Ying Cui 7 / 40

8 Descent methods algorithms described in this chapter produce a minimizing sequence x (k),k = 1,, where x (k+1) = x (k) +t (k) x (k) with f(x (k+1) ) < f(x (k) ) and t (k) > 0 other notations: x + = x+t x, x := x+t x x is step (or search direction); t is step size (or step length) convexity of f implies f(x (k) ) T x (k) < 0 (i.e., x (k) is a descent direction) General descent method. given a starting point x dom f. repeat 1. Determine a descent direction x. 2. Line search. Choose a step size t > Update. x := x+t x until stopping criterion is satisfied. SJTU Ying Cui 8 / 40

9 Line search types exact line search: t = argmin t>0 f(x+t x) minimize f along ray {x+t x t 0} used when cost of the minimization problem with one variable is low compared to the cost of computing the search direction itself in some special cases the minimizer can be found analytically, and in others it can be computed efficiently SJTU Ying Cui 9 / 40

10 Line search types backtracking line search (with parameters α (0, 1 2 ),β (0,1)) reduce f enough along ray {x+t x t 0} starting at t = 1, repeat t := βt until f(x+t x) < f(x)+αt f(x) T x convexity of f: f(x+t x) f(x)+t f(x) T x constant α can be interpreted as the fraction of decrease in f predicted by linear extrapolation that we will accept graphical interpretation: backtrack until t t 0 f(x+t x) t = 0 f(x)+t f(x) T x t0 f(x)+αt f(x) T x t Figure 9.1 Backtracking line search. The curve shows f, restricted to the line over which we search. The lower dashed line shows the linear extrapolation of f, and the upper dashed line has a slope a factor of α smaller. The backtracking condition is that f lies below the upper dashed line, i.e., 0 t t0. SJTU Ying Cui 10 / 40

11 Gradient descent method general descent method with x = f(x) Gradient descent method. given a starting point x dom f. repeat 1. x := f(x). 2. Line search. Choose step size t via exact or backtracking line search. 3. Update. x := x+t x. until stopping criterion is satisfied. stopping criterion usually of the form f(x) 2 ǫ convergence result: for strongly convex f, f(x (k) ) p c k (f(x (0) ) p ) exact line search: c = 1 m/m < 1 backtracking line search: c = 1 min{2mα,2βαm/m} < 1 linear convergence: the error lies below a line on a log-linear plot of error versus iteration number very simple, but often very slow; rarely used in practice SJTU Ying Cui 11 / 40

12 Examples a quadratic problem in R 2 f(x) = (1/2)(x 2 1 +γx2 2 ) (γ > 0) with exact line search, starting at x (0) = (γ,1): closed-form expressions for iterates ( ) k ( x (k) γ 1 1 = γ, x (k) 2 = γ γ 1 ) k ( ) 2k γ 1, f(x (k) ) = f(x (0) ) γ +1 γ +1 γ +1 exact solution found in one iteration if γ = 1; convergence rapid if γ not far from 1; convergence very slow if γ 1 or γ 1 4 x2 0 x (1) x (0) x1 Figure 9.2 Some contour lines of the function f(x) = (1/2)(x x 2 2). The condition number of the sublevel sets, which are ellipsoids, is exactly 10. The figure shows the iterates of the gradient method with exact line search, started at x (0) = (10,1). SJTU Ying Cui 12 / 40

13 Examples a nonquadratic problem in R 2 f(x 1,x 2 ) = e x 1+3x e x 1 3x e x backtracking line search: approximately linear convergence (sublevel sets of f not too badly conditioned, M/m not too large) exact line search: approximately linear convergence, about twice as fast as with backtracking line search x (0) x (2) x (0) f(x (k) ) p 10 5 backtracking l.s exact l.s. x (1) x (1) Figure 9.3 Iterates of the gradient method with backtracking line search, for the problem in R 2 with objective f given in (9.20). The dashed curves are level curves of f, and the small circles are the iterates of the gradient method. The solid lines, which connect successive iterates, show the scaled steps t (k) x (k). Figure 9.5 Iterates of the gradient method with exact line search for the problem in R 2 with objective f given in (9.20) k Figure 9.4 Error f(x (k) ) p versus iteration k of the gradient method with backtracking and exact line search, for the problem in R 2 with objective f given in (9.20). The plot shows nearly linear convergence, with the error reduced approximately by the factor 0.4 in each iteration of the gradient method with backtracking line search, and by the factor 0.2 in each iteration of the gradient method with exact line search. SJTU Ying Cui 13 / 40

14 Examples a problem in R f(x) = c T x log(b i a T i x) i=1 backtracking line search: approximately linear convergence exact line search: approximately linear convergence, only a bit faster than with backtracking line search f(x (k) ) p 10 0 exact l.s backtracking l.s k Figure 9.6 Error f(x (k) ) p versus iteration k for the gradient method with backtracking and exact line search, for a problem in R 100. SJTU Ying Cui 14 / 40

15 Conclusions of gradient decent method characteristics: exhibit approximately linear convergence, i.e., error converges to zero approximately as a geometric series choice of backtracking parameters α, β has a noticeable but not dramatic effect on the convergence exact line search sometimes improves the convergence, but not much (and probably not worth trouble of implementing it) convergence rate depends greatly on the condition number of the Hessian, or the sublevel sets main adavantage and disadvantage: main advantage: simplicity main disadvantage: convergence rate depends so critically on the condition number of the Hessian or sublevel sets SJTU Ying Cui 15 / 40

16 Steepest descent method normalized steepest descent direction (at x, for norm ) x nsd = argmin{ f(x) T v v = 1} first-order Taylor approximation of f(x+v) around x is f(x+v) f(x)+ f(x) T v directional derivative of f at x in direction v is f(x) T v direction x nsd is unit-norm direction with most negative directional derivative (unnormalized) steepest descent direction x sd = f(x) x nsd satisfies f(x) T x sd = f(x) 2 SJTU Ying Cui 16 / 40

17 Steepest descent method general descent method with x = x sd Steepest descent method. given a starting point x dom f. repeat 1. Compute steepest descent direction x sd. 2. Line search. Choose step size t via exact or backtracking line search. 3. Update. x := x+t x sd. until stopping criterion is satisfied. when exact line search is used, scale factors in the descent direction have no effect, so x nsd or x sd can be used convergence result: for strongly convex f, f(x (k) ) p c k (f(x (0) ) p ) backtracking line search: c = 1 2mα γ 2 min{1,βγ 2 /M} < 1 any norm can be bounded in terms of the Euclidean norm, i.e., there exist constants γ, γ (0,1] such that x γ x 2 and x γ x 2 linear convergence, same as gradient decent method SJTU Ying Cui 17 / 40

18 Steepest decent for different norms Euclidean norm: x sd = f(x) coincide with the gradient descent method quadratic norm x P = (x T Px) 1/2 (P S n ++): x sd = P 1 f(x) can be thought of as the gradient descent method applied to the problem after the change of coordinates x = P 1/2 x l 1 -norm: x sd = f(x) x i e i, where f(x) x i = f(x) is a coordinate-descent algorithm (update the component with maximum absolute partial derivative value) f(x) xnsd f(x) xnsd Figure 9.9 Normalized steepest descent direction for a quadratic norm. The ellipsoid shown is the unit ball of the norm, translated to the point x. The normalized steepest descent direction xnsd at x extends as far as possible in the direction f(x) while staying in the ellipsoid. The gradient and normalized steepest descent directions are shown. Figure 9.10 Normalized steepest descent direction for the l1-norm. The diamond is the unit ball of the l1-norm, translated to the point x. The normalized steepest descent direction can always be chosen in the direction of a standard basis vector; in this example we have xnsd = e1. SJTU Ying Cui 18 / 40

19 Choice of norm for steepest descent choice of norm has strong effect on speed of convergence of steepest decent method (consider quadratic P-norm) steepest descent method with quadratic P-norm is same as gradient method after change of coordinates x = P 1/2 x to increase speed of convergence, choose P so that the sublevel sets of f, transformed by P 1/2, are well conditioned ellipsoid {x x T Px 1} approximates shape of sublevel sets work well in cases where we can identify a matrix P for which the transformed problem has moderate condition number steepest descent with backtracking line search for two quadratic norms (ellipses show {x x x (k) P = 1}) 10 5 x (0) x (1) x (2) x (0) x (2) f(x (k) ) p P1 P x (1) Figure 9.11 Steepest descent method with a quadratic norm P1. The ellipses are the boundaries of the norm balls {x x x (k) P1 1} at x (0) and x (1). Figure 9.12 Steepest descent method, with quadratic norm P k Figure 9.13 Error f(x (k) ) p versus iteration k, for the steepest descent method with the quadratic norm P1 and the quadratic norm P2. Convergence is rapid for the norm P1 and very slow for P2. SJTU Ying Cui 19 / 40

20 Newton step Newton step for f at x: x nt = 2 f(x) 1 f(x) convexity of f ( 2 f(x) 0) implies f(x) T x nt < 0 unless f(x) = 0 Newton step is a decent direction unless x is optimal affine invariant: Newton step of f(y) = f(ty) (T nonsingular) at y and Newton step of f at x = Ty satisfies x nt = T y nt SJTU Ying Cui 20 / 40

21 Interpretations Newton step Minimizer of second-order approximation x+ x nt minimizes second-order Taylor approximation of f at x (a convex quadratic function of v) ˆf(x+v) = f(x)+ f(x) T v vt 2 f(x)v if f is quadratic, then x+ x nt is the exact minimizer of f if f is nearly quadratic, then x+ x nt should be a very good estimate of the minimizer of f when x is near x (quadratic model of f will be very accurate), x+ x nt should be a very good approximation of x f (x,f(x)) (x+ xnt,f(x + xnt)) Figure 9.16Thefunctionf (shownsolid) anditssecond-orderapproximation f at x (dashed). The Newton step xnt is what must be added to x to give the minimizer of f. SJTU Ying Cui 21 / 40 f

22 Interpretations Newton step Solution of linearized optimality condition x+ x nt solves linearized optimality condition f(x+v) f(x)+ 2 f(x)v = 0 when x is near x (so the optimality condition almost holds), x+ x nt should be a very good approximation of x f (x,f (x)) f (x+ xnt,f (x+ xnt)) Figure 9.18 The solid curve is the derivative f of the function f shown in figure f is the linear approximation of f at x. The Newton step xnt is the difference between the root of f and the point x. SJTU Ying Cui 22 / 40

23 Interpretations Newton step Steepest descent direction in Hessian norm x nt is steepest descent direction at x for the quadratic norm defined by the Hessian 2 f(x), i.e., u 2 f(x) = (u T 2 f(x)u) 1/2 when x is near x ( 2 f(x) after the associated change of coordinates x = ( 2 f(x)) 1/2 x has small condition number), steepest descent with 2 f(x) converges very rapidly x x+ xnsd x+ xnt Figure 9.17 The dashed lines are level curves of a convex function. The ellipsoid shown (with solid line) is {x + v v T 2 f(x)v 1}. The arrow shows f(x), the gradient descent direction. The Newton step xnt is the steepest descent direction in the norm 2 f(x). The figure also shows xnsd, the normalized steepest descent direction for the same norm. SJTU Ying Cui 23 / 40

24 Newton decrement Newton decrement at x (a measure of the proximity of x to x ): properties λ(x) = ( f(x) T 2 f(x) 1 f(x)) 1/2 1 2 λ(x)2 is an estimate of f(x) p, using quadratic approx. ˆf: f(x) inf v ˆf(x+v) = f(x) ˆf(x+ x nt ) = 1 2 λ(x)2 λ(x) is equal to the norm of Newton step at x in the quadratic Hessian norm u 2 f(x) = (u T 2 f(x)u) 1/2 : λ(x) = x nt 2 f(x) = ( x T nt 2 f(x) x nt ) 1/2 λ(x) 2 is directional derivative of f at x in Newton direction: λ(x) 2 = f(x) T x nt = d dt f(x+ x ntt) t=0 affine invariant: Newton decrement of f(y) = f(ty) (T nonsingular) at y same as Newton decrement of f at x = Ty SJTU Ying Cui 24 / 40

25 Newton s method general descent method with x = x nt General descent method. given a starting point x domf, tolerance ǫ > 0 repeat 1. Compute the Newton step and decrement x nt := 2 f(x) 1 f(x); λ 2 := f(x) T 2 f(x) 1 f(x) 2. Stopping criterion.quit if λ 2 /2 ǫ. 3. Line search. Choose step size t by backtracking line search. 4. Update x := x+t x nt. Newton s method is affine invariant due to affine invariance of Newton step and decrement independent of linear changes of coordinates Newton iterates for ˆf(y) = f(ty) with starting point y (0) = T 1 x (0) are y (k) = T 1 x (k) SJTU Ying Cui 25 / 40

26 Classical convergence analysis assumptions f strongly convex on S with constant m, implying mi 2 f(x) MI for all x S 2 f is Lipschitz continuous on S with constant L > 0, i.e., 2 f(x) 2 f(y) 2 L x y 2 for all x S L measures how well f can be approximated by a quadratic function outline: there exist constants η (0,m 2 /L),γ > 0 such that if f(x (k) ) 2 η, then f(x (k+1) ) f(x (k) ) γ if f(x (k) ) 2 < η, then L f(x (k+1) ) 2m 2 2 ( L f(x (k) ) 2 ) 2m 2 2 implying for all l k, we have f(x (l) ) 2 < η SJTU Ying Cui 26 / 40

27 Classical convergence analysis damped Newton phase ( f(x) 2 η) most iterations require backtracking steps function value decreases by at least γ, i.e., f(x (k+1) ) f(x (k) ) γ if p >, this phase ends after at most (f(x 0 ) p )/γ iterations quadratically convergent phase ( f(x) 2 < η) all iterations use step size t = 1 (no backtracking steps) f(x) 2 converges to zero quadratically: if f(x (k) ) 2 < η then L f(x l ) 2m 2 2 ( L f(x (k) ) 2 l k ( ) 2m l k 2), l k = f(x (l) ) p 1 2m f(xl ) 2 2 2m3 L 2 ( 1 2 ) 2 l k+1, l k if p >, this phase ends (f(x (l) ) p ǫ) after at most log 2 log 2 (ǫ 0 /ǫ) iterations SJTU Ying Cui 27 / 40

28 Classical convergence analysis conclusion: number of iterations until f(x) p ǫ is bounded above by f(x (0) ) p +log γ 2 log 2 (ǫ 0 /ǫ) γ = αβη 2 m M 2, η = min{1,3(1 2α)} m2 L, ǫ 0 = 2m 3 /L 2 second term is small (of the order of 6) and almost constant for practical purposes in practice, constants m,l (hence γ,ǫ 0 ) are usually unknown provide qualitative insight in convergence properties (i.e., explains two algorithm phases) SJTU Ying Cui 28 / 40

29 Examples example in R 2 (page 12) 10 5 x (0) 10 0 x (1) f(x (k) ) p Figure 9.19 Newton s method for the problem in R 2, with objective f given in (9.20), and backtracking line search parameters α = 0.1, β = 0.7. Also shown are the ellipsoids {x x x (k) 2 f(x (k) ) 1} at the first two iterates k Figure 9.20 Error versus iteration k of Newton s method for the problem in R 2. Convergence to a very high accuracy is achieved in five iterations. backtracking parameters α = 0.1,β = 0.7 converges in only 5 steps apparent quadratic convergence SJTU Ying Cui 29 / 40

30 Examples example in R 100 (page 13) f(x (k) ) p exact l.s. backtracking l.s. step size t (k) exact l.s. backtracking l.s k Figure 9.21 Error versus iteration for Newton s method for the problem in R 100. The backtracking line search parameters are α = 0.01, β = 0.5. Here too convergence is extremely rapid: a very high accuracy is attained in only seven or eight iterations. The convergence of Newton s method with exact line search is only one iteration faster than with backtracking line search k Figure 9.22 The step size t versus iteration for Newton s method with backtracking and exact line search, applied to the problem in R 100. The backtracking line search takes one backtracking step in the first two iterations. After the first two iterations it always selects t = 1. backtracking parameters α = 0.01,β = 0.5 backtracking line search almost as fast as exact line search (and much simpler) clearly shows two convergent phases (damped phase of 2 iterations) SJTU Ying Cui 30 / 40

31 Examples example in R (with sparse a i ) f(x) = log(1 x 2 i) log(b i a T i x) i=1 i= f(x (k) ) p k Figure 9.23 Error versus iteration of Newton s method, for a problem in R A backtracking line search with parameters α = 0.01, β = 0.5 is used. Even for this large scale problem, Newton s method requires only 18 iterations to achieve very high accuracy. backtracking parameters α = 0.01,β = 0.5 a linearly convergent phase of about 13 iterations followed by a quadratically convergent phase of 4 or 5 iterations convergence performance similar to small examples SJTU Ying Cui 31 / 40

32 Conclusions of Newton s method strong advantages over gradient and steepest descent methods: convergence of Newton s method is rapid in general, and quadratic near x Newton s method is affine invariant, insensitive to choice of coordinates, or condition number of sublevel sets of f Newton s method scales well with problem size good performance of Newton s method is not dependent on the choice of algorithm parameters main disadvantage: cost of forming and storing the Hessian cost of computing the Newton step, which requires solving a set of linear equations in many cases it is possible to exploit problem structure to substantially reduce cost step SJTU Ying Cui 32 / 40

33 Self-concordance shortcomings of classical convergence analysis depends on unknown constants m, M, L, only conceptually useful bound is not affinely invariant (m,m,l change if coordinates change), although Newton s method is convergence analysis via self-concordance (Nesterov and Nemirovski) analysis of Newton s method for self-concordant functions does not depend on any unknown constants gives affine-invariant bound include many logarithmic barrier functions that play an important role in interior-point methods for solving convex optimization problems SJTU Ying Cui 33 / 40

34 Self-concordant functions definition convex f : R R is self-concordant if f (x) 2f (x) 3/2 for all x domf f : R n R is self-concordant if it is self-concordant along every line in its domain, i.e., g(t) = f(x+tv) is self-concordant for all x domf,v R n examples on R linear functions (zero second and third derivatives) quadratic functions (zero third derivative and nonnegative second derivative) negative logarithm f(x) = logx negative entropy plus negative logarithm: f(x) = xlogx logx SJTU Ying Cui 34 / 40

35 Self-concordant functions remarks constant 2 is chosen for convenience, in order to simplify the formulas later on; any other positive constant could be used instead if f : R R satisfies f (x) kf (x) 3/2, then f(x) = k 2 /4f(x) satisfies f (x) 2 f (x) 3/2 what is important is that the third derivative of the function is bounded by some multiple of the 3/2-power of its second derivative self-concordance is affine invariant if f : R R is s.c., then f(y) = f(ay +b) is s.c. self-concordance condition limits the third derivative of a function, in a way independent of affine coordinate changes SJTU Ying Cui 35 / 40

36 Self-concordant calculus properties preserved under positive scaling α 1, and sum if f is s.c. and a 1, then af is s.c. if f1 and f 2 are s.c., then f 1 +f 2 is s.c. preserved under composition with affine function if f : R n R is s.c. and A R n m,b R n, then f(ax+b) is s.c. preserved under composition with logarithm if g : R R is convex with domg = R++ and g (x) 3g (x)/x for all x, then f(x) = log( g(x)) logx is s.c. on {x x > 0,g(x) < 0} if g (x) 3g (x)/x holds for g, then it holds for g(x)+ax 2 +bx+c where a 0 examples: the following are s.c. f(x) = m i=1 log(b i a T i x) on {x at i x < b i, i = 1,...,m} f(x) = logdetx on S n ++ f(x) = log(y 2 x T x) on {(x,y) x 2 < y} SJTU Ying Cui 36 / 40

37 Convergence analysis for self-concordant functions assumptions: f : R n R is s.c. summary: there exist constants η (0,1/4],γ > 0 such that (η = (1 2α)/4, γ = αβ η2 1+η ) if λ(x) > η, then f(x (k+1) ) f(x (k) ) γ if λ(x) η, then 2λ(x (k+1) ) (2λ(x (k) )) 2 = f(x (l) ) p λ(x (l) ) 2 ( 1 2) 2 l k+1, l k complexity bound: number of Newton iterations bounded by f(x (0) ) p +log γ 2 log 2 (1/ǫ) = 20 8α αβ(1 2α) 2(f(x(0) ) p )+log 2 log 2 (1/ǫ) depends only on line search parameters α,β and final accuracy ǫ second term is small and can be safely replaced with 6 example: for α = 0.1,β = 0.8,ǫ = 10 10, we have bound 375(f(x (0) ) p )+6 SJTU Ying Cui 37 / 40

38 Numerical example 150 randomly generated instances of a i and b i for m min log(b i a T x i x) i= iterations f(x (0) ) p Figure 9.25 Number of Newton iterations required to minimize selfconcordant functions versus f(x (0) ) p. The function f has the form f(x) = m i=1 log(bi at i x), where the problem data ai and b are randomly generated. The circles show problems with m = 100, n = 50; the squares show problems with m = 1000, n = 500; and the diamonds show problems with m = 1000, n = 50. Fifty instances of each are shown. number of iterations much smaller than 375(f(x (0) ) p )+6 bound of form c(f(x (0) ) p )+6 with smaller c (empirically) valid SJTU Ying Cui 38 / 40

39 Implementation main effort in each iteration: evaluate derivatives and solve Newton system H x = g where H = 2 f(x), g = f(x) via Cholesky factorization H = LL T, x nt = L T L 1 g, λ(x) = L 1 g 2 where L is a lower triangular matrix cost (1/3)n 3 flops for unstructured system cost (1/3)n 3 if H sparse, banded SJTU Ying Cui 39 / 40

40 Example of dense Newton system with structure f(x) = n ψ i (x i )+ψ 0 (Ax+b), i=1 H = D +A T H 0 A assume A R p n, dense, with p n D diagonal with diagonal elements ψ i (x i); H 0 = 2 ψ 0 (Ax+b) method 1: form H, solve via dense Cholesky factorization (cost (1/3)n 3 ) method 2: factor H 0 = L 0 L T 0 ; write Newton system as D x+a T L 0 ω = g, L T 0 A x ω = 0 eliminate x from first equation; compute ω and x from (I +L T 0 AD 1 A T L 0 )ω = L T 0 AD 1 g, D x = g A T L 0 ω cost: 2p 2 n (dominated by computation of L T 0 AD 1 A T L 0 ) SJTU Ying Cui 40 / 40

Unconstrained minimization

Unconstrained minimization CSCI5254: Convex Optimization & Its Applications Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions 1 Unconstrained

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Unconstrained minimization: assumptions

Unconstrained minimization: assumptions Unconstrained minimization I terminology and assumptions I gradient descent method I steepest descent method I Newton s method I self-concordant functions I implementation IOE 611: Nonlinear Programming,

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

More information

A Brief Review on Convex Optimization

A Brief Review on Convex Optimization A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Equality constrained minimization

Equality constrained minimization Chapter 10 Equality constrained minimization 10.1 Equality constrained minimization problems In this chapter we describe methods for solving a convex optimization problem with equality constraints, minimize

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

11. Equality constrained minimization

11. Equality constrained minimization Convex Optimization Boyd & Vandenberghe 11. Equality constrained minimization equality constrained minimization eliminating equality constraints Newton s method with equality constraints infeasible start

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Analytic Center Cutting-Plane Method

Analytic Center Cutting-Plane Method Analytic Center Cutting-Plane Method S. Boyd, L. Vandenberghe, and J. Skaf April 14, 2011 Contents 1 Analytic center cutting-plane method 2 2 Computing the analytic center 3 3 Pruning constraints 5 4 Lower

More information

Lecture 14: Newton s Method

Lecture 14: Newton s Method 10-725/36-725: Conve Optimization Fall 2016 Lecturer: Javier Pena Lecture 14: Newton s ethod Scribes: Varun Joshi, Xuan Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

More information

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods March 23, 2018 1 / 35 This material is covered in S. Boyd, L. Vandenberge s book Convex Optimization https://web.stanford.edu/~boyd/cvxbook/.

More information

8. Conjugate functions

8. Conjugate functions L. Vandenberghe EE236C (Spring 2013-14) 8. Conjugate functions closed functions conjugate function 8-1 Closed set a set C is closed if it contains its boundary: x k C, x k x = x C operations that preserve

More information

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa Convex Optimization Lecture 12 - Equality Constrained Optimization Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 19 Today s Lecture 1 Basic Concepts 2 for Equality Constrained

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

An Optimal Affine Invariant Smooth Minimization Algorithm.

An Optimal Affine Invariant Smooth Minimization Algorithm. An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre d Aspremont, CNRS & École Polytechnique. Joint work with Martin Jaggi. Support from ERC SIPA. A. d Aspremont IWSL, Moscow, June 2013,

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions 3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Self-Concordant Barrier Functions for Convex Optimization

Self-Concordant Barrier Functions for Convex Optimization Appendix F Self-Concordant Barrier Functions for Convex Optimization F.1 Introduction In this Appendix we present a framework for developing polynomial-time algorithms for the solution of convex optimization

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

14. Nonlinear equations

14. Nonlinear equations L. Vandenberghe ECE133A (Winter 2018) 14. Nonlinear equations Newton method for nonlinear equations damped Newton method for unconstrained minimization Newton method for nonlinear least squares 14-1 Set

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Lecture 5: September 12

Lecture 5: September 12 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

EE364a Homework 8 solutions

EE364a Homework 8 solutions EE364a, Winter 2007-08 Prof. S. Boyd EE364a Homework 8 solutions 9.8 Steepest descent method in l -norm. Explain how to find a steepest descent direction in the l -norm, and give a simple interpretation.

More information

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Convex Functions Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Definition convex function Examples

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

4. Convex optimization problems (part 1: general)

4. Convex optimization problems (part 1: general) EE/AA 578, Univ of Washington, Fall 2016 4. Convex optimization problems (part 1: general) optimization problem in standard form convex optimization problems quasiconvex optimization 4 1 Optimization problem

More information

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: 1-5: Least-squares I A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: f (x) =(Ax b) T (Ax b) =x T A T Ax 2b T Ax + b T b f (x) = 2A T Ax 2A T b = 0 Chih-Jen Lin (National

More information

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1 EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Lecture 9 Sequential unconstrained minimization

Lecture 9 Sequential unconstrained minimization S. Boyd EE364 Lecture 9 Sequential unconstrained minimization brief history of SUMT & IP methods logarithmic barrier function central path UMT & SUMT complexity analysis feasibility phase generalized inequalities

More information

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...

More information

Barrier Method. Javier Peña Convex Optimization /36-725

Barrier Method. Javier Peña Convex Optimization /36-725 Barrier Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: Newton s method For root-finding F (x) = 0 x + = x F (x) 1 F (x) For optimization x f(x) x + = x 2 f(x) 1 f(x) Assume f strongly

More information

Convex Functions. Pontus Giselsson

Convex Functions. Pontus Giselsson Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum

More information

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods Convex Optimization Prof. Nati Srebro Lecture 12: Infeasible-Start Newton s Method Interior Point Methods Equality Constrained Optimization f 0 (x) s. t. A R p n, b R p Using access to: 2 nd order oracle

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...

More information

Numerical optimization

Numerical optimization Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal

More information

Lecture 14 Barrier method

Lecture 14 Barrier method L. Vandenberghe EE236A (Fall 2013-14) Lecture 14 Barrier method centering problem Newton decrement local convergence of Newton method short-step barrier method global convergence of Newton method predictor-corrector

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Improving the Convergence of Back-Propogation Learning with Second Order Methods the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore

Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore Lecture - 13 Steepest Descent Method Hello, welcome back to this series

More information

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems 1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

10. Ellipsoid method

10. Ellipsoid method 10. Ellipsoid method EE236C (Spring 2008-09) ellipsoid method convergence proof inequality constraints 10 1 Ellipsoid method history developed by Shor, Nemirovski, Yudin in 1970s used in 1979 by Khachian

More information

Interior Point Algorithms for Constrained Convex Optimization

Interior Point Algorithms for Constrained Convex Optimization Interior Point Algorithms for Constrained Convex Optimization Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Inequality constrained minimization problems

More information

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent 10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Agenda. Interior Point Methods. 1 Barrier functions. 2 Analytic center. 3 Central path. 4 Barrier method. 5 Primal-dual path following algorithms

Agenda. Interior Point Methods. 1 Barrier functions. 2 Analytic center. 3 Central path. 4 Barrier method. 5 Primal-dual path following algorithms Agenda Interior Point Methods 1 Barrier functions 2 Analytic center 3 Central path 4 Barrier method 5 Primal-dual path following algorithms 6 Nesterov Todd scaling 7 Complexity analysis Interior point

More information

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization 5.93 Optimization Methods Lecture 8: Optimality Conditions and Gradient Methods for Unconstrained Optimization Outline. Necessary and sucient optimality conditions Slide. Gradient m e t h o d s 3. The

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

ECE133A Applied Numerical Computing Additional Lecture Notes

ECE133A Applied Numerical Computing Additional Lecture Notes Winter Quarter 2018 ECE133A Applied Numerical Computing Additional Lecture Notes L. Vandenberghe ii Contents 1 LU factorization 1 1.1 Definition................................. 1 1.2 Nonsingular sets

More information

Lecture: Convex Optimization Problems

Lecture: Convex Optimization Problems 1/36 Lecture: Convex Optimization Problems http://bicmr.pku.edu.cn/~wenzw/opt-2015-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/36 optimization

More information

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725 Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex

More information

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: 1-5: Least-squares I A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: f (x) =(Ax b) T (Ax b) =x T A T Ax 2b T Ax + b T b f (x) = 2A T Ax 2A T b = 0 Chih-Jen Lin (National

More information

Second Order Optimization Algorithms I

Second Order Optimization Algorithms I Second Order Optimization Algorithms I Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 7, 8, 9 and 10 1 The

More information

Gradient Methods Using Momentum and Memory

Gradient Methods Using Momentum and Memory Chapter 3 Gradient Methods Using Momentum and Memory The steepest descent method described in Chapter always steps in the negative gradient direction, which is orthogonal to the boundary of the level set

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

A Distributed Newton Method for Network Utility Maximization, II: Convergence

A Distributed Newton Method for Network Utility Maximization, II: Convergence A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility

More information

Optimization Methods. Lecture 19: Line Searches and Newton s Method

Optimization Methods. Lecture 19: Line Searches and Newton s Method 15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method 1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions

More information

4. Convex optimization problems

4. Convex optimization problems Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization

More information

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions 3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions

More information

L. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones. definition. spectral decomposition. quadratic representation. log-det barrier 18-1

L. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones. definition. spectral decomposition. quadratic representation. log-det barrier 18-1 L. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones definition spectral decomposition quadratic representation log-det barrier 18-1 Introduction This lecture: theoretical properties of the following

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Introduction and a quick repetition of analysis/linear algebra First lecture, 12.04.2010 Jun.-Prof. Matthias Hein Organization of the lecture Advanced course, 2+2 hours,

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Lecture 5: September 15

Lecture 5: September 15 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko

More information

The proximal mapping

The proximal mapping The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

IE 521 Convex Optimization

IE 521 Convex Optimization Lecture 5: Convex II 6th February 2019 Convex Local Lipschitz Outline Local Lipschitz 1 / 23 Convex Local Lipschitz Convex Function: f : R n R is convex if dom(f ) is convex and for any λ [0, 1], x, y

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Examination paper for TMA4180 Optimization I

Examination paper for TMA4180 Optimization I Department of Mathematical Sciences Examination paper for TMA4180 Optimization I Academic contact during examination: Phone: Examination date: 26th May 2016 Examination time (from to): 09:00 13:00 Permitted

More information

More First-Order Optimization Algorithms

More First-Order Optimization Algorithms More First-Order Optimization Algorithms Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 3, 8, 3 The SDM

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information