Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University
|
|
- Lorraine Sparks
- 5 years ago
- Views:
Transcription
1 Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40
2 Outline Unconstrained minimization problems Descent methods Gradient descent method Steepest descent method Newton s method Self-concordance Implementation SJTU Ying Cui 2 / 40
3 Unconstrained minimization assumptions: min x f(x) assume f : R n R is convex, twice continuously differentiable (implying that domf is open) assume there exists an optimal point x (optimal value p = inf x f(x) is attained and finite) a necessary and sufficient condition for optimality: f(x ) = 0 solving unconstrained minimization problem is the same as finding a solution of optimality equation in a few special cases, can be solved analytically usually, must be solved by an iterative algorithm produce a sequence of points x (k) dom f, k = 0,1,... with f(x (k) ) p, as k terminated when f(x (k) ) p ǫ for some tolerance ǫ > 0 SJTU Ying Cui 3 / 40
4 Initial point and sublevel set algorithms in this chapter require a starting point x (0) such that x (0) domf sublevel set S = {x f(x) f(x (0) )} is closed (hard to verify) 2nd condition is satisfied for all x (0) domf if f is closed, i.e., all sublevel sets are closed, equivalent to epi f is closed true if f is continuous and dom f = R n true if f(x) as x bd dom f examples of differentiable functions with closed sublevel sets: m f(x) = log( exp(a T i x+b i)), i=1 m f(x) = log(b i a T i x) i=1 SJTU Ying Cui 4 / 40
5 Strong convexity and implications f is strongly convex on S if there exists an m > 0 such that implications 2 f(x) mi for all x S for x,y S, f(y) f(x)+ f(x) T (y x)+ m 2 x y 2 2 m = 0: recover the basic inequality characterizing convexity m > 0: a better lower bound than follows from convexity alone imply that S is bounded p > and for x S, f(x) p 1 2m f(x) 2 2 if gradient is small at a point, then the point is nearly optimal a condition for suboptimality generalizing optimality condition f(x) 2 (2mǫ) 1/2 = f(x) p ǫ useful as a stopping criterion if m is known upper bound on f(x): there exists an M > 0 such that 2 f(x) MI for all x S SJTU Ying Cui 5 / 40
6 Condition number of matrix and convex set condition number of a matrix: the ratio of its largest eigenvalue to its smallest eigenvalue condition number of a convex set: square of the ratio of its maximum width to its minimum width width of a convex set C in the direction q with q 2 = 1: W(C,q) = sup z C q T z inf z C q T z minimum width and maximum width of C: W min = inf q 2=1W(C,q) and W max = sup q 2=1 W(C,q) condition number of C: cond(c) = W 2 max W 2 min a measure of its anisotropy or eccentricity: cond(c) small means C has approximately the same width in all directions (nearly spherical); cond(c) large means that C is far wider in some directions than in others SJTU Ying Cui 6 / 40
7 Condition number of sublevel sets mi 2 f(x) MI for all x S upper bound of condition number of 2 f(x): cond( 2 f(x)) M/m upper bound of condition number of sublevel set C α = {x f(x) α}, p < α f(x (0) ): geometric interpretation: cond(c α ) M/m lim α p cond(c α) = cond( 2 f(x )) condition number of the sublevel sets of f (which is bounded by M/m) has a strong effect on the efficiency of some common methods for unconstrained minimization SJTU Ying Cui 7 / 40
8 Descent methods algorithms described in this chapter produce a minimizing sequence x (k),k = 1,, where x (k+1) = x (k) +t (k) x (k) with f(x (k+1) ) < f(x (k) ) and t (k) > 0 other notations: x + = x+t x, x := x+t x x is step (or search direction); t is step size (or step length) convexity of f implies f(x (k) ) T x (k) < 0 (i.e., x (k) is a descent direction) General descent method. given a starting point x dom f. repeat 1. Determine a descent direction x. 2. Line search. Choose a step size t > Update. x := x+t x until stopping criterion is satisfied. SJTU Ying Cui 8 / 40
9 Line search types exact line search: t = argmin t>0 f(x+t x) minimize f along ray {x+t x t 0} used when cost of the minimization problem with one variable is low compared to the cost of computing the search direction itself in some special cases the minimizer can be found analytically, and in others it can be computed efficiently SJTU Ying Cui 9 / 40
10 Line search types backtracking line search (with parameters α (0, 1 2 ),β (0,1)) reduce f enough along ray {x+t x t 0} starting at t = 1, repeat t := βt until f(x+t x) < f(x)+αt f(x) T x convexity of f: f(x+t x) f(x)+t f(x) T x constant α can be interpreted as the fraction of decrease in f predicted by linear extrapolation that we will accept graphical interpretation: backtrack until t t 0 f(x+t x) t = 0 f(x)+t f(x) T x t0 f(x)+αt f(x) T x t Figure 9.1 Backtracking line search. The curve shows f, restricted to the line over which we search. The lower dashed line shows the linear extrapolation of f, and the upper dashed line has a slope a factor of α smaller. The backtracking condition is that f lies below the upper dashed line, i.e., 0 t t0. SJTU Ying Cui 10 / 40
11 Gradient descent method general descent method with x = f(x) Gradient descent method. given a starting point x dom f. repeat 1. x := f(x). 2. Line search. Choose step size t via exact or backtracking line search. 3. Update. x := x+t x. until stopping criterion is satisfied. stopping criterion usually of the form f(x) 2 ǫ convergence result: for strongly convex f, f(x (k) ) p c k (f(x (0) ) p ) exact line search: c = 1 m/m < 1 backtracking line search: c = 1 min{2mα,2βαm/m} < 1 linear convergence: the error lies below a line on a log-linear plot of error versus iteration number very simple, but often very slow; rarely used in practice SJTU Ying Cui 11 / 40
12 Examples a quadratic problem in R 2 f(x) = (1/2)(x 2 1 +γx2 2 ) (γ > 0) with exact line search, starting at x (0) = (γ,1): closed-form expressions for iterates ( ) k ( x (k) γ 1 1 = γ, x (k) 2 = γ γ 1 ) k ( ) 2k γ 1, f(x (k) ) = f(x (0) ) γ +1 γ +1 γ +1 exact solution found in one iteration if γ = 1; convergence rapid if γ not far from 1; convergence very slow if γ 1 or γ 1 4 x2 0 x (1) x (0) x1 Figure 9.2 Some contour lines of the function f(x) = (1/2)(x x 2 2). The condition number of the sublevel sets, which are ellipsoids, is exactly 10. The figure shows the iterates of the gradient method with exact line search, started at x (0) = (10,1). SJTU Ying Cui 12 / 40
13 Examples a nonquadratic problem in R 2 f(x 1,x 2 ) = e x 1+3x e x 1 3x e x backtracking line search: approximately linear convergence (sublevel sets of f not too badly conditioned, M/m not too large) exact line search: approximately linear convergence, about twice as fast as with backtracking line search x (0) x (2) x (0) f(x (k) ) p 10 5 backtracking l.s exact l.s. x (1) x (1) Figure 9.3 Iterates of the gradient method with backtracking line search, for the problem in R 2 with objective f given in (9.20). The dashed curves are level curves of f, and the small circles are the iterates of the gradient method. The solid lines, which connect successive iterates, show the scaled steps t (k) x (k). Figure 9.5 Iterates of the gradient method with exact line search for the problem in R 2 with objective f given in (9.20) k Figure 9.4 Error f(x (k) ) p versus iteration k of the gradient method with backtracking and exact line search, for the problem in R 2 with objective f given in (9.20). The plot shows nearly linear convergence, with the error reduced approximately by the factor 0.4 in each iteration of the gradient method with backtracking line search, and by the factor 0.2 in each iteration of the gradient method with exact line search. SJTU Ying Cui 13 / 40
14 Examples a problem in R f(x) = c T x log(b i a T i x) i=1 backtracking line search: approximately linear convergence exact line search: approximately linear convergence, only a bit faster than with backtracking line search f(x (k) ) p 10 0 exact l.s backtracking l.s k Figure 9.6 Error f(x (k) ) p versus iteration k for the gradient method with backtracking and exact line search, for a problem in R 100. SJTU Ying Cui 14 / 40
15 Conclusions of gradient decent method characteristics: exhibit approximately linear convergence, i.e., error converges to zero approximately as a geometric series choice of backtracking parameters α, β has a noticeable but not dramatic effect on the convergence exact line search sometimes improves the convergence, but not much (and probably not worth trouble of implementing it) convergence rate depends greatly on the condition number of the Hessian, or the sublevel sets main adavantage and disadvantage: main advantage: simplicity main disadvantage: convergence rate depends so critically on the condition number of the Hessian or sublevel sets SJTU Ying Cui 15 / 40
16 Steepest descent method normalized steepest descent direction (at x, for norm ) x nsd = argmin{ f(x) T v v = 1} first-order Taylor approximation of f(x+v) around x is f(x+v) f(x)+ f(x) T v directional derivative of f at x in direction v is f(x) T v direction x nsd is unit-norm direction with most negative directional derivative (unnormalized) steepest descent direction x sd = f(x) x nsd satisfies f(x) T x sd = f(x) 2 SJTU Ying Cui 16 / 40
17 Steepest descent method general descent method with x = x sd Steepest descent method. given a starting point x dom f. repeat 1. Compute steepest descent direction x sd. 2. Line search. Choose step size t via exact or backtracking line search. 3. Update. x := x+t x sd. until stopping criterion is satisfied. when exact line search is used, scale factors in the descent direction have no effect, so x nsd or x sd can be used convergence result: for strongly convex f, f(x (k) ) p c k (f(x (0) ) p ) backtracking line search: c = 1 2mα γ 2 min{1,βγ 2 /M} < 1 any norm can be bounded in terms of the Euclidean norm, i.e., there exist constants γ, γ (0,1] such that x γ x 2 and x γ x 2 linear convergence, same as gradient decent method SJTU Ying Cui 17 / 40
18 Steepest decent for different norms Euclidean norm: x sd = f(x) coincide with the gradient descent method quadratic norm x P = (x T Px) 1/2 (P S n ++): x sd = P 1 f(x) can be thought of as the gradient descent method applied to the problem after the change of coordinates x = P 1/2 x l 1 -norm: x sd = f(x) x i e i, where f(x) x i = f(x) is a coordinate-descent algorithm (update the component with maximum absolute partial derivative value) f(x) xnsd f(x) xnsd Figure 9.9 Normalized steepest descent direction for a quadratic norm. The ellipsoid shown is the unit ball of the norm, translated to the point x. The normalized steepest descent direction xnsd at x extends as far as possible in the direction f(x) while staying in the ellipsoid. The gradient and normalized steepest descent directions are shown. Figure 9.10 Normalized steepest descent direction for the l1-norm. The diamond is the unit ball of the l1-norm, translated to the point x. The normalized steepest descent direction can always be chosen in the direction of a standard basis vector; in this example we have xnsd = e1. SJTU Ying Cui 18 / 40
19 Choice of norm for steepest descent choice of norm has strong effect on speed of convergence of steepest decent method (consider quadratic P-norm) steepest descent method with quadratic P-norm is same as gradient method after change of coordinates x = P 1/2 x to increase speed of convergence, choose P so that the sublevel sets of f, transformed by P 1/2, are well conditioned ellipsoid {x x T Px 1} approximates shape of sublevel sets work well in cases where we can identify a matrix P for which the transformed problem has moderate condition number steepest descent with backtracking line search for two quadratic norms (ellipses show {x x x (k) P = 1}) 10 5 x (0) x (1) x (2) x (0) x (2) f(x (k) ) p P1 P x (1) Figure 9.11 Steepest descent method with a quadratic norm P1. The ellipses are the boundaries of the norm balls {x x x (k) P1 1} at x (0) and x (1). Figure 9.12 Steepest descent method, with quadratic norm P k Figure 9.13 Error f(x (k) ) p versus iteration k, for the steepest descent method with the quadratic norm P1 and the quadratic norm P2. Convergence is rapid for the norm P1 and very slow for P2. SJTU Ying Cui 19 / 40
20 Newton step Newton step for f at x: x nt = 2 f(x) 1 f(x) convexity of f ( 2 f(x) 0) implies f(x) T x nt < 0 unless f(x) = 0 Newton step is a decent direction unless x is optimal affine invariant: Newton step of f(y) = f(ty) (T nonsingular) at y and Newton step of f at x = Ty satisfies x nt = T y nt SJTU Ying Cui 20 / 40
21 Interpretations Newton step Minimizer of second-order approximation x+ x nt minimizes second-order Taylor approximation of f at x (a convex quadratic function of v) ˆf(x+v) = f(x)+ f(x) T v vt 2 f(x)v if f is quadratic, then x+ x nt is the exact minimizer of f if f is nearly quadratic, then x+ x nt should be a very good estimate of the minimizer of f when x is near x (quadratic model of f will be very accurate), x+ x nt should be a very good approximation of x f (x,f(x)) (x+ xnt,f(x + xnt)) Figure 9.16Thefunctionf (shownsolid) anditssecond-orderapproximation f at x (dashed). The Newton step xnt is what must be added to x to give the minimizer of f. SJTU Ying Cui 21 / 40 f
22 Interpretations Newton step Solution of linearized optimality condition x+ x nt solves linearized optimality condition f(x+v) f(x)+ 2 f(x)v = 0 when x is near x (so the optimality condition almost holds), x+ x nt should be a very good approximation of x f (x,f (x)) f (x+ xnt,f (x+ xnt)) Figure 9.18 The solid curve is the derivative f of the function f shown in figure f is the linear approximation of f at x. The Newton step xnt is the difference between the root of f and the point x. SJTU Ying Cui 22 / 40
23 Interpretations Newton step Steepest descent direction in Hessian norm x nt is steepest descent direction at x for the quadratic norm defined by the Hessian 2 f(x), i.e., u 2 f(x) = (u T 2 f(x)u) 1/2 when x is near x ( 2 f(x) after the associated change of coordinates x = ( 2 f(x)) 1/2 x has small condition number), steepest descent with 2 f(x) converges very rapidly x x+ xnsd x+ xnt Figure 9.17 The dashed lines are level curves of a convex function. The ellipsoid shown (with solid line) is {x + v v T 2 f(x)v 1}. The arrow shows f(x), the gradient descent direction. The Newton step xnt is the steepest descent direction in the norm 2 f(x). The figure also shows xnsd, the normalized steepest descent direction for the same norm. SJTU Ying Cui 23 / 40
24 Newton decrement Newton decrement at x (a measure of the proximity of x to x ): properties λ(x) = ( f(x) T 2 f(x) 1 f(x)) 1/2 1 2 λ(x)2 is an estimate of f(x) p, using quadratic approx. ˆf: f(x) inf v ˆf(x+v) = f(x) ˆf(x+ x nt ) = 1 2 λ(x)2 λ(x) is equal to the norm of Newton step at x in the quadratic Hessian norm u 2 f(x) = (u T 2 f(x)u) 1/2 : λ(x) = x nt 2 f(x) = ( x T nt 2 f(x) x nt ) 1/2 λ(x) 2 is directional derivative of f at x in Newton direction: λ(x) 2 = f(x) T x nt = d dt f(x+ x ntt) t=0 affine invariant: Newton decrement of f(y) = f(ty) (T nonsingular) at y same as Newton decrement of f at x = Ty SJTU Ying Cui 24 / 40
25 Newton s method general descent method with x = x nt General descent method. given a starting point x domf, tolerance ǫ > 0 repeat 1. Compute the Newton step and decrement x nt := 2 f(x) 1 f(x); λ 2 := f(x) T 2 f(x) 1 f(x) 2. Stopping criterion.quit if λ 2 /2 ǫ. 3. Line search. Choose step size t by backtracking line search. 4. Update x := x+t x nt. Newton s method is affine invariant due to affine invariance of Newton step and decrement independent of linear changes of coordinates Newton iterates for ˆf(y) = f(ty) with starting point y (0) = T 1 x (0) are y (k) = T 1 x (k) SJTU Ying Cui 25 / 40
26 Classical convergence analysis assumptions f strongly convex on S with constant m, implying mi 2 f(x) MI for all x S 2 f is Lipschitz continuous on S with constant L > 0, i.e., 2 f(x) 2 f(y) 2 L x y 2 for all x S L measures how well f can be approximated by a quadratic function outline: there exist constants η (0,m 2 /L),γ > 0 such that if f(x (k) ) 2 η, then f(x (k+1) ) f(x (k) ) γ if f(x (k) ) 2 < η, then L f(x (k+1) ) 2m 2 2 ( L f(x (k) ) 2 ) 2m 2 2 implying for all l k, we have f(x (l) ) 2 < η SJTU Ying Cui 26 / 40
27 Classical convergence analysis damped Newton phase ( f(x) 2 η) most iterations require backtracking steps function value decreases by at least γ, i.e., f(x (k+1) ) f(x (k) ) γ if p >, this phase ends after at most (f(x 0 ) p )/γ iterations quadratically convergent phase ( f(x) 2 < η) all iterations use step size t = 1 (no backtracking steps) f(x) 2 converges to zero quadratically: if f(x (k) ) 2 < η then L f(x l ) 2m 2 2 ( L f(x (k) ) 2 l k ( ) 2m l k 2), l k = f(x (l) ) p 1 2m f(xl ) 2 2 2m3 L 2 ( 1 2 ) 2 l k+1, l k if p >, this phase ends (f(x (l) ) p ǫ) after at most log 2 log 2 (ǫ 0 /ǫ) iterations SJTU Ying Cui 27 / 40
28 Classical convergence analysis conclusion: number of iterations until f(x) p ǫ is bounded above by f(x (0) ) p +log γ 2 log 2 (ǫ 0 /ǫ) γ = αβη 2 m M 2, η = min{1,3(1 2α)} m2 L, ǫ 0 = 2m 3 /L 2 second term is small (of the order of 6) and almost constant for practical purposes in practice, constants m,l (hence γ,ǫ 0 ) are usually unknown provide qualitative insight in convergence properties (i.e., explains two algorithm phases) SJTU Ying Cui 28 / 40
29 Examples example in R 2 (page 12) 10 5 x (0) 10 0 x (1) f(x (k) ) p Figure 9.19 Newton s method for the problem in R 2, with objective f given in (9.20), and backtracking line search parameters α = 0.1, β = 0.7. Also shown are the ellipsoids {x x x (k) 2 f(x (k) ) 1} at the first two iterates k Figure 9.20 Error versus iteration k of Newton s method for the problem in R 2. Convergence to a very high accuracy is achieved in five iterations. backtracking parameters α = 0.1,β = 0.7 converges in only 5 steps apparent quadratic convergence SJTU Ying Cui 29 / 40
30 Examples example in R 100 (page 13) f(x (k) ) p exact l.s. backtracking l.s. step size t (k) exact l.s. backtracking l.s k Figure 9.21 Error versus iteration for Newton s method for the problem in R 100. The backtracking line search parameters are α = 0.01, β = 0.5. Here too convergence is extremely rapid: a very high accuracy is attained in only seven or eight iterations. The convergence of Newton s method with exact line search is only one iteration faster than with backtracking line search k Figure 9.22 The step size t versus iteration for Newton s method with backtracking and exact line search, applied to the problem in R 100. The backtracking line search takes one backtracking step in the first two iterations. After the first two iterations it always selects t = 1. backtracking parameters α = 0.01,β = 0.5 backtracking line search almost as fast as exact line search (and much simpler) clearly shows two convergent phases (damped phase of 2 iterations) SJTU Ying Cui 30 / 40
31 Examples example in R (with sparse a i ) f(x) = log(1 x 2 i) log(b i a T i x) i=1 i= f(x (k) ) p k Figure 9.23 Error versus iteration of Newton s method, for a problem in R A backtracking line search with parameters α = 0.01, β = 0.5 is used. Even for this large scale problem, Newton s method requires only 18 iterations to achieve very high accuracy. backtracking parameters α = 0.01,β = 0.5 a linearly convergent phase of about 13 iterations followed by a quadratically convergent phase of 4 or 5 iterations convergence performance similar to small examples SJTU Ying Cui 31 / 40
32 Conclusions of Newton s method strong advantages over gradient and steepest descent methods: convergence of Newton s method is rapid in general, and quadratic near x Newton s method is affine invariant, insensitive to choice of coordinates, or condition number of sublevel sets of f Newton s method scales well with problem size good performance of Newton s method is not dependent on the choice of algorithm parameters main disadvantage: cost of forming and storing the Hessian cost of computing the Newton step, which requires solving a set of linear equations in many cases it is possible to exploit problem structure to substantially reduce cost step SJTU Ying Cui 32 / 40
33 Self-concordance shortcomings of classical convergence analysis depends on unknown constants m, M, L, only conceptually useful bound is not affinely invariant (m,m,l change if coordinates change), although Newton s method is convergence analysis via self-concordance (Nesterov and Nemirovski) analysis of Newton s method for self-concordant functions does not depend on any unknown constants gives affine-invariant bound include many logarithmic barrier functions that play an important role in interior-point methods for solving convex optimization problems SJTU Ying Cui 33 / 40
34 Self-concordant functions definition convex f : R R is self-concordant if f (x) 2f (x) 3/2 for all x domf f : R n R is self-concordant if it is self-concordant along every line in its domain, i.e., g(t) = f(x+tv) is self-concordant for all x domf,v R n examples on R linear functions (zero second and third derivatives) quadratic functions (zero third derivative and nonnegative second derivative) negative logarithm f(x) = logx negative entropy plus negative logarithm: f(x) = xlogx logx SJTU Ying Cui 34 / 40
35 Self-concordant functions remarks constant 2 is chosen for convenience, in order to simplify the formulas later on; any other positive constant could be used instead if f : R R satisfies f (x) kf (x) 3/2, then f(x) = k 2 /4f(x) satisfies f (x) 2 f (x) 3/2 what is important is that the third derivative of the function is bounded by some multiple of the 3/2-power of its second derivative self-concordance is affine invariant if f : R R is s.c., then f(y) = f(ay +b) is s.c. self-concordance condition limits the third derivative of a function, in a way independent of affine coordinate changes SJTU Ying Cui 35 / 40
36 Self-concordant calculus properties preserved under positive scaling α 1, and sum if f is s.c. and a 1, then af is s.c. if f1 and f 2 are s.c., then f 1 +f 2 is s.c. preserved under composition with affine function if f : R n R is s.c. and A R n m,b R n, then f(ax+b) is s.c. preserved under composition with logarithm if g : R R is convex with domg = R++ and g (x) 3g (x)/x for all x, then f(x) = log( g(x)) logx is s.c. on {x x > 0,g(x) < 0} if g (x) 3g (x)/x holds for g, then it holds for g(x)+ax 2 +bx+c where a 0 examples: the following are s.c. f(x) = m i=1 log(b i a T i x) on {x at i x < b i, i = 1,...,m} f(x) = logdetx on S n ++ f(x) = log(y 2 x T x) on {(x,y) x 2 < y} SJTU Ying Cui 36 / 40
37 Convergence analysis for self-concordant functions assumptions: f : R n R is s.c. summary: there exist constants η (0,1/4],γ > 0 such that (η = (1 2α)/4, γ = αβ η2 1+η ) if λ(x) > η, then f(x (k+1) ) f(x (k) ) γ if λ(x) η, then 2λ(x (k+1) ) (2λ(x (k) )) 2 = f(x (l) ) p λ(x (l) ) 2 ( 1 2) 2 l k+1, l k complexity bound: number of Newton iterations bounded by f(x (0) ) p +log γ 2 log 2 (1/ǫ) = 20 8α αβ(1 2α) 2(f(x(0) ) p )+log 2 log 2 (1/ǫ) depends only on line search parameters α,β and final accuracy ǫ second term is small and can be safely replaced with 6 example: for α = 0.1,β = 0.8,ǫ = 10 10, we have bound 375(f(x (0) ) p )+6 SJTU Ying Cui 37 / 40
38 Numerical example 150 randomly generated instances of a i and b i for m min log(b i a T x i x) i= iterations f(x (0) ) p Figure 9.25 Number of Newton iterations required to minimize selfconcordant functions versus f(x (0) ) p. The function f has the form f(x) = m i=1 log(bi at i x), where the problem data ai and b are randomly generated. The circles show problems with m = 100, n = 50; the squares show problems with m = 1000, n = 500; and the diamonds show problems with m = 1000, n = 50. Fifty instances of each are shown. number of iterations much smaller than 375(f(x (0) ) p )+6 bound of form c(f(x (0) ) p )+6 with smaller c (empirically) valid SJTU Ying Cui 38 / 40
39 Implementation main effort in each iteration: evaluate derivatives and solve Newton system H x = g where H = 2 f(x), g = f(x) via Cholesky factorization H = LL T, x nt = L T L 1 g, λ(x) = L 1 g 2 where L is a lower triangular matrix cost (1/3)n 3 flops for unstructured system cost (1/3)n 3 if H sparse, banded SJTU Ying Cui 39 / 40
40 Example of dense Newton system with structure f(x) = n ψ i (x i )+ψ 0 (Ax+b), i=1 H = D +A T H 0 A assume A R p n, dense, with p n D diagonal with diagonal elements ψ i (x i); H 0 = 2 ψ 0 (Ax+b) method 1: form H, solve via dense Cholesky factorization (cost (1/3)n 3 ) method 2: factor H 0 = L 0 L T 0 ; write Newton system as D x+a T L 0 ω = g, L T 0 A x ω = 0 eliminate x from first equation; compute ω and x from (I +L T 0 AD 1 A T L 0 )ω = L T 0 AD 1 g, D x = g A T L 0 ω cost: 2p 2 n (dominated by computation of L T 0 AD 1 A T L 0 ) SJTU Ying Cui 40 / 40
Unconstrained minimization
CSCI5254: Convex Optimization & Its Applications Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions 1 Unconstrained
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationUnconstrained minimization: assumptions
Unconstrained minimization I terminology and assumptions I gradient descent method I steepest descent method I Newton s method I self-concordant functions I implementation IOE 611: Nonlinear Programming,
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationCSCI : Optimization and Control of Networks. Review on Convex Optimization
CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one
More informationA Brief Review on Convex Optimization
A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationEquality constrained minimization
Chapter 10 Equality constrained minimization 10.1 Equality constrained minimization problems In this chapter we describe methods for solving a convex optimization problem with equality constraints, minimize
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More information11. Equality constrained minimization
Convex Optimization Boyd & Vandenberghe 11. Equality constrained minimization equality constrained minimization eliminating equality constraints Newton s method with equality constraints infeasible start
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationAnalytic Center Cutting-Plane Method
Analytic Center Cutting-Plane Method S. Boyd, L. Vandenberghe, and J. Skaf April 14, 2011 Contents 1 Analytic center cutting-plane method 2 2 Computing the analytic center 3 3 Pruning constraints 5 4 Lower
More informationLecture 14: Newton s Method
10-725/36-725: Conve Optimization Fall 2016 Lecturer: Javier Pena Lecture 14: Newton s ethod Scribes: Varun Joshi, Xuan Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More informationCSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods
CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods March 23, 2018 1 / 35 This material is covered in S. Boyd, L. Vandenberge s book Convex Optimization https://web.stanford.edu/~boyd/cvxbook/.
More information8. Conjugate functions
L. Vandenberghe EE236C (Spring 2013-14) 8. Conjugate functions closed functions conjugate function 8-1 Closed set a set C is closed if it contains its boundary: x k C, x k x = x C operations that preserve
More informationConvex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa
Convex Optimization Lecture 12 - Equality Constrained Optimization Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 19 Today s Lecture 1 Basic Concepts 2 for Equality Constrained
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex
More informationDescent methods. min x. f(x)
Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationAn Optimal Affine Invariant Smooth Minimization Algorithm.
An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre d Aspremont, CNRS & École Polytechnique. Joint work with Martin Jaggi. Support from ERC SIPA. A. d Aspremont IWSL, Moscow, June 2013,
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationThe Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More information3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions
3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationSelf-Concordant Barrier Functions for Convex Optimization
Appendix F Self-Concordant Barrier Functions for Convex Optimization F.1 Introduction In this Appendix we present a framework for developing polynomial-time algorithms for the solution of convex optimization
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More information14. Nonlinear equations
L. Vandenberghe ECE133A (Winter 2018) 14. Nonlinear equations Newton method for nonlinear equations damped Newton method for unconstrained minimization Newton method for nonlinear least squares 14-1 Set
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationLecture 5: September 12
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationEE364a Homework 8 solutions
EE364a, Winter 2007-08 Prof. S. Boyd EE364a Homework 8 solutions 9.8 Steepest descent method in l -norm. Explain how to find a steepest descent direction in the l -norm, and give a simple interpretation.
More informationConvex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)
Convex Functions Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Definition convex function Examples
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More information4. Convex optimization problems (part 1: general)
EE/AA 578, Univ of Washington, Fall 2016 4. Convex optimization problems (part 1: general) optimization problem in standard form convex optimization problems quasiconvex optimization 4 1 Optimization problem
More informationA : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:
1-5: Least-squares I A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: f (x) =(Ax b) T (Ax b) =x T A T Ax 2b T Ax + b T b f (x) = 2A T Ax 2A T b = 0 Chih-Jen Lin (National
More informationEE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1
EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More information2. Quasi-Newton methods
L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationLecture 9 Sequential unconstrained minimization
S. Boyd EE364 Lecture 9 Sequential unconstrained minimization brief history of SUMT & IP methods logarithmic barrier function central path UMT & SUMT complexity analysis feasibility phase generalized inequalities
More informationLecture 4 - The Gradient Method Objective: find an optimal solution of the problem
Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...
More informationBarrier Method. Javier Peña Convex Optimization /36-725
Barrier Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: Newton s method For root-finding F (x) = 0 x + = x F (x) 1 F (x) For optimization x f(x) x + = x 2 f(x) 1 f(x) Assume f strongly
More informationConvex Functions. Pontus Giselsson
Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum
More informationConvex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods
Convex Optimization Prof. Nati Srebro Lecture 12: Infeasible-Start Newton s Method Interior Point Methods Equality Constrained Optimization f 0 (x) s. t. A R p n, b R p Using access to: 2 nd order oracle
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationLecture 4 - The Gradient Method Objective: find an optimal solution of the problem
Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...
More informationNumerical optimization
Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal
More informationLecture 14 Barrier method
L. Vandenberghe EE236A (Fall 2013-14) Lecture 14 Barrier method centering problem Newton decrement local convergence of Newton method short-step barrier method global convergence of Newton method predictor-corrector
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationImproving the Convergence of Back-Propogation Learning with Second Order Methods
the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More informationNumerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore
Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore Lecture - 13 Steepest Descent Method Hello, welcome back to this series
More informationNumerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems
1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More information10. Ellipsoid method
10. Ellipsoid method EE236C (Spring 2008-09) ellipsoid method convergence proof inequality constraints 10 1 Ellipsoid method history developed by Shor, Nemirovski, Yudin in 1970s used in 1979 by Khachian
More informationInterior Point Algorithms for Constrained Convex Optimization
Interior Point Algorithms for Constrained Convex Optimization Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Inequality constrained minimization problems
More informationLecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent
10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationAgenda. Interior Point Methods. 1 Barrier functions. 2 Analytic center. 3 Central path. 4 Barrier method. 5 Primal-dual path following algorithms
Agenda Interior Point Methods 1 Barrier functions 2 Analytic center 3 Central path 4 Barrier method 5 Primal-dual path following algorithms 6 Nesterov Todd scaling 7 Complexity analysis Interior point
More informationOptimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization
5.93 Optimization Methods Lecture 8: Optimality Conditions and Gradient Methods for Unconstrained Optimization Outline. Necessary and sucient optimality conditions Slide. Gradient m e t h o d s 3. The
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationECE133A Applied Numerical Computing Additional Lecture Notes
Winter Quarter 2018 ECE133A Applied Numerical Computing Additional Lecture Notes L. Vandenberghe ii Contents 1 LU factorization 1 1.1 Definition................................. 1 1.2 Nonsingular sets
More informationLecture: Convex Optimization Problems
1/36 Lecture: Convex Optimization Problems http://bicmr.pku.edu.cn/~wenzw/opt-2015-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/36 optimization
More informationGradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725
Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex
More informationA : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:
1-5: Least-squares I A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: f (x) =(Ax b) T (Ax b) =x T A T Ax 2b T Ax + b T b f (x) = 2A T Ax 2A T b = 0 Chih-Jen Lin (National
More informationSecond Order Optimization Algorithms I
Second Order Optimization Algorithms I Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 7, 8, 9 and 10 1 The
More informationGradient Methods Using Momentum and Memory
Chapter 3 Gradient Methods Using Momentum and Memory The steepest descent method described in Chapter always steps in the negative gradient direction, which is orthogonal to the boundary of the level set
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationA Distributed Newton Method for Network Utility Maximization, II: Convergence
A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility
More informationOptimization Methods. Lecture 19: Line Searches and Newton s Method
15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method 1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions
More information4. Convex optimization problems
Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization
More information3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions
3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions
More informationL. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones. definition. spectral decomposition. quadratic representation. log-det barrier 18-1
L. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones definition spectral decomposition quadratic representation log-det barrier 18-1 Introduction This lecture: theoretical properties of the following
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationGradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve
More information4TE3/6TE3. Algorithms for. Continuous Optimization
4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca
More informationSuppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.
Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of
More informationConvex Optimization and Modeling
Convex Optimization and Modeling Introduction and a quick repetition of analysis/linear algebra First lecture, 12.04.2010 Jun.-Prof. Matthias Hein Organization of the lecture Advanced course, 2+2 hours,
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationLecture 5: September 15
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS
More informationEE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17
EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko
More informationThe proximal mapping
The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationIE 521 Convex Optimization
Lecture 5: Convex II 6th February 2019 Convex Local Lipschitz Outline Local Lipschitz 1 / 23 Convex Local Lipschitz Convex Function: f : R n R is convex if dom(f ) is convex and for any λ [0, 1], x, y
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationExamination paper for TMA4180 Optimization I
Department of Mathematical Sciences Examination paper for TMA4180 Optimization I Academic contact during examination: Phone: Examination date: 26th May 2016 Examination time (from to): 09:00 13:00 Permitted
More informationMore First-Order Optimization Algorithms
More First-Order Optimization Algorithms Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 3, 8, 3 The SDM
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More information