Proximal gradient methods
|
|
- Aldous Davidson
- 5 years ago
- Views:
Transcription
1 ELE 538B: Large-Scale Optimization for Data Science Proximal gradient methods Yuxin Chen Princeton University, Spring 08
2 Outline Proximal gradient descent for composite functions Proximal mapping / operator Convergence analysis
3 Proximal gradient descent for composite functions
4 Composite models minimize x subject to F (x) := f(x) + h(x) x R n f: convex and smooth h: convex (may not be differentiable) let F opt := min x F (x) be optimal cost Proximal gradient methods 6-4
5 Examples l regularized minimization minimize x f(x) + x }{{} h(x): l norm use l regularization to promote sparsity nuclear norm regularized minimization minimize X f(x) + X }{{} h(x): nuclear norm use nuclear norm regularization to promote low-rank structure Proximal gradient methods 6-5
6 A proximal view of gradient descent To motivate proximal gradient methods, we first revisit gradient descent x t+ = arg min x { x t+ = x t η t f(x t ) f(x t ) + f(x t ), x x t }{{} first-order approximation + } x x t η } t {{} proximal term Proximal gradient methods 6-6
7 A proximal view of gradient descent I JA proximal view of gradien Îx xt Î A proximal view of t gradient descent I x t+ x = arg min f (xt ) + ÈÒf (xt ), x xt Í A proximal view of gradient descent x ( ) xt+ = arg min f (xt ) + ÈÒf (xt ), x xt Í + xt+ = arg min f (xt ) + h f (xt ), x xt i + x kx xt k ηt A proximal view of gradien Îx xt Î + c f (x) tt+ tis the point where By optimality condition, xt+ x = x t Òf (xt ) I t f (xt ) + ÈÒf (xt ), x xt Í and xtmin Î have same xt+ = arg f (x ) +slope ÈÒf (xt ), x xt Í t Îx x Proximal gradient methods 5-3 Ì Îx I xt Î + c f (x) J t+ By optimality condition, x is the point wher t xt Í and t+ is the point f (x By optimality condition, where xt+ = argxmin f (xt ) + ÈÒf (xt ), x t )x+tèòf Í +(xt ), xîx xt Î t Îx xt Î t t t t x f (x ) + ÈÒf (x ), x x Í and t Îx x Î have same slope t roximal gradient methods 5-3 Proximal gradient methods first-order approximation proximal term By optimality condition, xt+ is the point where f (xt ) + h f (xt ), x xt i and - η t kx xt k have same slope Proximal gradient methods 6-7
8 How about projected gradient descent? x t+ = arg min x = arg min x where C (x) = x t+ = P C ( x t η t f(x t ) ) { f(x t ) + f(x t ), x x t + } x x t + C (x) η t { x (x t η t f(x t )) } + η t C (x) (6.) { 0, if x C, else Proximal gradient methods 6-8
9 Proximal operator Define proximal operator prox h (x) := arg min z for any convex function h { z x + h(z) } This allows one to express projected GD update (6.) as x t+ = prox ηt C ( x t η t f(x t ) ) (6.) Proximal gradient methods 6-9
10 Proximal gradient methods One can generalize (6.) to accommodate more general h Algorithm 6. Proximal gradient algorithm : for t = 0,, do : x t+ ( = prox ηth x t η t f(x t ) ) alternates between gradient updates on f and proximal minimization on h useful if prox h is inexpensive Proximal gradient methods 6-0
11 Proximal mapping / operator
12 Why consider proximal operators? { prox h (x) := arg min z z x + h(z)} well-defined under very general conditions (including nonsmooth convex functions) can be evaluated efficiently for many widely used functions (in particular, regularizers) this abstraction is conceptually and mathematically simple, and covers many well-known optimization algorithms Proximal gradient methods 6-
13 bt+ = t t + t t {z } Example: indicator functions = b momentum term proxµt+ g bt+ µt+ rf t+ t+ xt+ = xt t Òf (xt ) + t+ x = I Ì A proximal view of gradient descent J arg min f (x ) + ÈÒf (xtt+ ), x xtnorm Í+ Îx xt Î Example: x = arg min ft (xt ) + ÈÒf (xt ), x xt Í + x x approximation Îx xt Î +first-order c proximal term I t If h = C is indicator function Proximal gradient methods ( 0, if x C If h(x) =h(x) ÎxÎ, then = else prox h (x) = Â, (sof t thresholding) st (x; ) Îx x t 5- condition, xt+ is the point where t ), x xt Í and Îx xt Î have same slo f (x ) + ÈÒf (x t where Âst (x) = x +, if x <, is applied in entry-wise manner then Y By optimality ]x, if tx > [0, else proxh (x) = arg min kz xk Proximal gradient methods z C Proximal gradient methods (Euclidean projection) Proximal gradient methods 5-6-3
14 Example: l norm x t x t+ x t Îx xt Î + c Îx xt Î + c x t+ If h(x) = x, then (prox λh (x)) i = ψ st (x i ; λ) (soft-thresholding) x λ, if x > λ where ψ st (x) = x + λ, if x < λ 0, else Proximal gradient methods 6-4
15 Basic rules If f(x) = ag(x) + b with a > 0, then prox f (x) = prox ag (x) affine addition: if f(x) = g(x) + a x + b, then prox f (x) = prox g (x a) Proximal gradient methods 6-5
16 Basic rules quadratic addition: if f(x) = g(x) + ρ x a, then prox f (x) = prox +ρ g ( + ρ x + ρ + ρ a ) scaling and translation: if f(x) = g(ax + b) with a 0, then prox f (x) = a ( ) prox a g(ax + b) b (homework) Proximal gradient methods 6-6
17 Proof for quadratic addition { prox f (x) = arg min z x x + g(z) + ρ } z a { } + ρ = arg min x z z, x + ρa + g(z) { = arg min x z } z, x + ρa + + ρ + ρ g(z) = arg min x = prox +ρ g { ( z + ρ x + ρ ) + ρ a ( + ρ x + ρ ) + ρ a + + ρ g(z) } Proximal gradient methods 6-7
18 Basic rules orthogonal mapping: if f(x) = g(qx) with Q orthogonal (QQ = Q Q = I), then prox f (x) = Q prox g (Qx) (homework) orthogonal affine mapping: if f(x) = g(qx + b) with QQ = α I, then }{{} does not require Q Q=α I prox f (x) = ( ) ( ) I αq Q x + αq prox α g(qx + b) b for general Q, it is not easy to derive prox f from prox g Proximal gradient methods 6-8
19 Basic rules norm composition: if f(x) = g( x ) with domain(g) = [0, ), then x prox f (x) = prox g ( x ) x 0 x Proximal gradient methods 6-9
20 Proof for norm composition Observe that { min f(z) + } z z x { = min g( z ) + z z z x + } x { = min min g(α) + α 0 z: z =α α z x + } x { = min g(α) + α 0 α α x + } x (Cauchy-Schwarz) = min {g(α) + } (α x ) α 0 From above calculation, we know optimal point is α = prox g ( x ) and z = α x x = prox x g ( x ), x thus concluding proof Proximal gradient methods 6-0
21 = PC (x)> PC (z) xp (x)z S= PC (z) z PCs( > C( ) sp if x œ C, proxh (x) is Euclidean Œ else methods (constrained) Gradient I C, which is nonexpansive: projection PC onto 0, if x œ C Recall that when h(x) =, proxh (x) is Euclidean ŒC (xelse Îx x Î + c Gradient x methods x (constrained) ÎP )Gradient P (x )Î methods (constrained) Gradient Æ Îx x Î C methods (constrained) projection PC onto C, which is nonexpansive: a property for general prox ( ) h PC (z) x when z h(x) = 0, Recall that PC (x) 0 nsiveness of proximal operators s Cx Nonexpansiveness of proximal operators I C Proximal gradient methods W C s P ( ÎPC (x ) PC (x )Î Æ Îx xc Î veness of proximal operators Gradient methods (constrained) Gradient methods (constrained) gradient methods Îx x Î + cproximal x x property for general proxh ( ) Gradient methods (constrained) ) PC ( 5 5- Gradient methods (constrained) Recall that when h(x) = C (x), proxh (x) is Euclidean projection PC onto C, which is nonexpansive for convex C: ansiveness) Proximal gradient methods 3-3 kpc (x ) PC (x )k kx x k 6-3-3
22 Nonexpansiveness of proximal operators Nonexpansiveness of proximal operators Îx xoperators Î + c x x nsiveness ofc proximal Îx x Î + Nonexpansiveness of proximal operators Nonexpansiveness of proximal operators onexpansiveness is a property for general proxh ( ) x Î + Îx x Î + c Îxx xxî+ c Îx xîx Î + c Îx x Î + c x x of proximal Nonexpansiveness of proximal operators Nonexpansiveness c x x prox (x ) prox ) operators h h (xis a property for general proxh ( ) a property for general proxh ( ) Nonexpansiveness in some sense, proximal operator behaves like projection Nonexpansiveness is a property for general proxh ( ) + x x Îc Îx xîx Î +xcî + c Îx x Îx+ Îx c x xis aprox h (x ) prox h (x ) Nonexpansiveness property for general proxh ( ) Nonexpansiveness is a property for general proxh ( ) Fact 6. act 5. (Nonexpansiveness) Fact 5. (Nonexpansiveness) Fact 5. (Nonexpansiveness) Îprox (firm(xnonexpansiveness) ) prox (x )Î Æ Îx x Î hîprox (x ) prox (x )Î Æ Îx x Î (x ) prox h Îprox h h (x )Î Æ Îx x Î Fact 5. (Nonexpansiveness)h h Proximal gradient methods hprox ) sense, prox ),like xprojection x i kprox (x5-3 ) proxh (x )k ansiveness) Fact 5. (Nonexpansiveness) operator h (x h (xoperator In some sense, proximal behaves ximal gradient methods 5-3 h In some proximal behaveslike projection Îproxh (x )Proximal prox (xmethods )Îsense, Æ Îx proximal x Î operator behaves like projection In hsome gradient Îprox (x ) prox (x )Î Æ Îx (nonexpansiveness) x Îh h Îx Æsense, proximal operator behaves h (x ) prox h (x In )Î some like projection Proximal gradient methods x Î In some sense, proximal operator behaves like projection kprox proximal operator behaves like projection 5-3 h (x ) prox h (x )k Proximal gradient methods Proximal gradient methods kx x k 6-
23 Proof of Fact 6. Let z = prox h (x ) and z = prox h (x ). Subgradient characterizations of z and z read x z h(z ) and x z h(z ) The nonexpansiveness claim z z x x would follow if (x x ) (z z ) z z }{{} firm nonexpansiveness (together with Cauchy-Schwarz) = (x z x + z ) (z z ) 0 h(z ) h(z ) + x z, z }{{} z add these inequalities h(z = ) h(z ) h(z ) + x z, z }{{} z h(z ) Proximal gradient methods 6-3
24 Resolvent of subdifferential operator One can interpret prox via resolvant of subdifferential operator Fact 6. z = prox f (x) z = (I + f) }{{} (x) resolvent of operator f where I is identity mapping Proximal gradient methods 6-4
25 Justification of Fact 6. { z = arg min f(u) + } u u x 0 f(z) + z x (optimality condition) x (I + f) (z) z = (I + f) (x) Proximal gradient methods 6-5
26 Moreau decomposition Fact 6.3 Suppose f is closed convex, and f (x) := sup z { x, z f(z)} is convex conjugate of f. Then x = prox f (x) + prox f (x) key relationship between proximal mapping and duality generalization of orthogonal decomposition Proximal gradient methods 6-6
27 x + g g(x) K K ++, g(z) + + g(x) = max{x, x a n + + x = arg +} = S (x) x x kzk hz, x + ai + min kzk hz, x + ai + g(z) n = arg min = prox + x+ a x g + + prox (x) = x P (x) g g(x) = max{x,, x } = S (x) x + an kzk = prox hz,+ x g+ ai g ++ + g(z) + = arg min kzk = max{x, g(x) =, arg x=n }min + x= S (x) hz, x + ai + max{x n g}=+s (x)pc (x) + P? (x) (,, x ) x C + + = x g prox (x) P (x) ( g + = max{x n } = S (x) proxg (x) = x= arg Pprox (x) g (x) = x min z P (x) x g(x) + a +,, x g(z) x +C? + = arg min C + x+ a + P (x) + P? (x) prox (x) = P(x) (x)x xk? z P g(x) +PxP + + C? (x) K? PC (x) + PC? (x) C K CK P K (x) PKC(x) = prox + x + a P (x) + P (x)? n C g C + (x) x = prox g x+ a K K PK (x) PK PK+? (x) + x+ + g(x) = max{x,,+ x } = S + (x) = prox = x P (x) prox (x) Moreau decomposition for convex cones = prox x+ a + + P (x) + P (x) g(x) = max{x,, x } = S (x) K KP?=(x) PK (x) P}K=P= (x) (x) PxK? (x) +x? (x) K K K (x) K (x) (x) prox x(x) PP P PK (x) PK (x) g(x) x, K nk S K max{x K PK PK (x) PK (x) x g, x(x) +? proxg (x) = x P (x) P (x) + PC? (x) PC (x) + PCC (x)? K K PK (x) K K PK (x) PK? (x) x g(x) = pr + g(x) = max{x,, xn } = S (x) proxg (x) = x P (x) PC (x) + PC? (x) PK (x) PK (x) PK? (x) K K xpk (x) K P K(x) PK (x) K? When K is closed convex cone, (K ) (x) = K (x) (exercise) with K := {x hx, zi 0, z K} polar cone of K. This gives x = PK (x) + PK (x) a special case: if K is subspace, then K = K, and hence Proximal gradient methods x = PK (x) + PK (x) 6-7 x P
28 Proof of Fact 6.4 Let u = prox f (x), then from optimality condition we know that x u f(u). This together with conjugate subgradient theorem (homework) yields u f (x u) In view of optimality condition, this means x u = prox f (x) = x = u + (x u) = prox f (x) + prox f (x) Proximal gradient methods 6-8
29 Example: prox of support function For any closed and convex set C, support function S C is defined as S C (x) = sup z C x, z. Then prox SC (x) = x P C (x) (6.3) Proof: First of all, it is easy to verify that (exercise) Then Moreau decomposition gives S C(x) = C (x) prox SC (x) = x prox S C (x) = x prox C (x) = x P C (x) Proximal gradient methods 6-9
30 Example: l norm prox (x) = x P B (x) where B := {z z } is unit l ball Remark: projection onto l ball can be computed efficiently Proof: Since x = sup z: z x, z = S B (x), we can invoke (6.3) to arrive at prox (x) = prox (x) = x P SB B (x) Proximal gradient methods 6-30
31 Example: max function Let g(x) = max{x,, x n }, then prox g (x) = x P (x) where := {z R n + z = } is probability simplex Remark: projection onto can be computed efficiently Proof: Since g(x) = max{x,, x n } = S (x) (support function of ), we can invoke (6.3) to reach prox g (x) = x P (x) Proximal gradient methods 6-3
32 Extended Moreau decomposition A useful extension (homework): Fact 6.4 Suppose f is closed convex and λ > 0. Then x = prox λf (x) + λprox λ f (x/λ) Proximal gradient methods 6-3
33 Convergence analysis
34 Cost monotonicity Objective value is non-increasing in t: Lemma 6.5 Suppose f is convex and L-smooth. If η t /L, then F (x t+ ) F (x t ) different from subgradient methods (for which objective value might be non-monotonic in t) constant stepsize rule is recommended when f is convex and smooth Proximal gradient methods 6-34
35 Proof of cost monotonicity Main pillar: Lemma 6.6 a fundamental inequality Let y + = prox L h ( y L f(y)), then F (y + ) F (x) L x y L x y+ g(x, y) }{{} 0 by convexity where g(x, y) := f(x) f(y) f(y), x y Take x = y = x t (and hence y + = x t+ ) to complete proof Proximal gradient methods 6-35
36 Monotonicity in estimation error Proximal gradient iterates are not only monotonic w.r.t. cost, but also monotonic in estimation error Lemma 6.7 Suppose f is convex and L-smooth. If η t /L, then x t+ x x t x Proof: from Lemma 6.6, taking x = x, y = x t (and hence y + = x t+ ) yields F (x t+ ) F (x ) }{{} 0 which immediately concludes proof + g(x, y) L }{{} x x t L x x t+ 0 Proximal gradient methods 6-36
37 Proof of Lemma 6.6 Define φ(z) = f(y) + f(y), z y + L z y + h(z) It is easily seen that y + = arg min z φ(z). Two important properties: Since φ(z) is L-strongly convex, one has φ(x) φ(y + ) + L x y+ Remark: we are propergating smoothness of f to strong convexity of another function φ From smoothness, φ(y + ) = f(y) + f(y), y + y + L y+ y + h(y + ) }{{} upper bound on f(y + ) f(y + ) + h(y + ) = F (y + ) Proximal gradient methods 6-37
38 Proof of Lemma 6.6 (cont.) Taken collectively, these yield φ(x) F (y + ) + L x y+, which together with definition of φ(x) gives f(y) + f(y), x y + h(x) + L }{{} x y F (y + ) + L x y+ =f(x)+h(x) g(x,y)=f (x) g(x,y) which finishes proof Proximal gradient methods 6-38
39 Convergence for convex problems Theorem 6.8 (Convergence of proximal gradient methods for convex problems) Suppose f is convex and L-smooth. If η t /L, then F (x t ) F opt L x0 x t achieves better iteration complexity (i.e. O(/ε)) than subgradient method (i.e. O(/ε )) fast if prox can be efficiently implemented Proximal gradient methods 6-39
40 Proof of Theorem 6.8 With Lemma 6.6 in mind, set x = x, y = x t to obtain F (x t+ ) F (x ) L xt x L xt+ x g(x, x t ) }{{} 0 by convexity L xt x L xt+ x Apply it recursively and add up all inequalities to get t k=0 ( ) F (x k+ ) F (x ) L x0 x L xt x This combined with monotonicity of F (x t ) (cf. Lemma 6.6) yields F (x t ) F (x ) L x0 x t Proximal gradient methods 6-40
41 Convergence for strongly convex problems Theorem 6.9 (Convergence of proximal gradient methods for strongly convex problems) Suppose f is µ-strongly convex and L-smooth. If η t /L, then x t x ( µ ) t x 0 x L linear convergence: attains ε accuracy within O(log ε ) iterations Proximal gradient methods 6-4
42 Proof of Theorem 6.9 Taking x = x, y = x t (and hence y + = x t+ ) in Lemma 6.6 gives F (x t+ ) F (x ) L x x t L x x t+ g(x, x t ) }{{} µ x x t+ L µ x t x L xt+ x This taken collectively with F (x t+ ) F (x ) 0 yields x t+ x Applying it recursively concludes proof ( µ ) x t x L Proximal gradient methods 6-4
43 Numerical example: LASSO taken from UCLA EE36C minimizex f (x) = kax bk + kxk with i.i.d. Gaussian A R , ηt = /L, L = λmax (A> A) Proximal gradient methods 6-43
44 with R := supxœc DÏ x, x A Ô Lf R best,t opt f f ÆO Ô fl Suppose f is convex and Lipschitz continuous (i and suppoe Ï is fl-strongly convex w.r.t. Î Î. Example: -norm regularized least-squares! qt! minimize one can further remove log t factor k=0 k " supxœc DÏ x, x0 + 4, we immediatenumerical arrive at example: LASSO bk + kxk kax taken from UCLA EE36C vex and Lipschitz continuous (i.e. Îg t Îú Æ Lf ) on C, 00 fl-strongly convex w.r.t. Î Î. Then " 0 4 flR Ô Lf t 3 Ô (ff(x 0 qt Lf fl k=0 k! " qt k=0 k Mirror descent ) f )/fæ? opt?! If t = R Ô t Æ supxœc0dï x, x0 + best,t (k) Theorem 5.3 f opt 0 5 with R := sup DÏ x, x0, then 0 xœc 0 0 A 0Ô B Lf R log kt Ô f best,t f opt Æ O Ô fl randomly generated A R ; step ttk = /L with L = max(at A) Proximal gradient methods 6-43
45 Backtracking line search Recall that for unconstrained case, backtracking line search is based on sufficient decrease criterion f ( x t η f(x t ) ) f(x t ) η f(xt ) Proximal gradient methods 6-44
46 Backtracking line search Recall that for unconstrained case, backtracking line search is based on sufficient decrease criterion f ( x t η f(x t ) ) f(x t ) η f(xt ) As a result, this is equivalent to updating η t = /L t until f ( x t η t f(x t ) ) f(x t ) L t f(x t ), f(x t ) + L t f(x t ) = f(x t ) f(x t ), x t x t+ + L t xt x t+ Proximal gradient methods 6-44
47 Backtracking line search Let T L (x) := prox L h ( x L f(x)) : Algorithm 6. Backtracking line search for proximal gradient methods : Initialize η =, 0 < α /, 0 < β < : while f ( T Lt (x t ) ) > f(x t ) f(x t ), x t T Lt (x t ) + L t TLt (x t ) x t do 3: L t β Lt (or L t β L t ) here, L t corresponds to η t, and T Lt (x t ) generalizes x t+ Proximal gradient methods 6-45
48 Summary: proximal gradient methods convex & smooth (w.r.t. f) problems strongly convex & smooth (w.r.t. f) problems stepsize convergence iteration rule rate complexity ( ) ( ) η t = L O t O ε ( η t = L O ( κ )t) O ( κ log ) ε Proximal gradient methods 6-46
49 Reference [] Proximal algorithms, N. Parikh and S. Boyd, Foundations and Trends in Optimization, 03. [] First-order methods in optimization, A. Beck, Vol. 5, SIAM, 07. [3] Convex optimization and algorithms, D. Bertsekas, 05. [4] Convex optimization: algorithms and complexity, S. Bubeck, Foundations and trends in machine learning, 05. [5] Mathematical optimization, MATH30 lecture notes, E. Candes, Stanford. [6] Optimization methods for large-scale systems, EE36C lecture notes, L. Vandenberghe, UCLA. Proximal gradient methods 6-47
Lasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationAccelerated gradient methods
ELE 538B: Large-Scale Optimization for Data Science Accelerated gradient methods Yuxin Chen Princeton University, Spring 018 Outline Heavy-ball methods Nesterov s accelerated gradient methods Accelerated
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationEE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1
EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationCoordinate Update Algorithm Short Course Subgradients and Subgradient Methods
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n
More informationThis can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization
This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationThe proximal mapping
The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More information5. Subgradient method
L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More informationLecture 5: September 15
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationDescent methods. min x. f(x)
Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationLecture 8: February 9
0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationDouglas-Rachford splitting for nonconvex feasibility problems
Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying
More informationSubgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725
Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:
More informationThe Proximal Gradient Method
Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationConvex Optimization Conjugate, Subdifferential, Proximation
1 Lecture Notes, HCI, 3.11.211 Chapter 6 Convex Optimization Conjugate, Subdifferential, Proximation Bastian Goldlücke Computer Vision Group Technical University of Munich 2 Bastian Goldlücke Overview
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationOne Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties
One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationLecture 23: November 19
10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More information9. Dual decomposition and dual algorithms
EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple
More informationLecture 7: September 17
10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationSmoothing Proximal Gradient Method. General Structured Sparse Regression
for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationEE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)
EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in
More informationNesterov s Optimal Gradient Methods
Yurii Nesterov http://www.core.ucl.ac.be/~nesterov Nesterov s Optimal Gradient Methods Xinhua Zhang Australian National University NICTA 1 Outline The problem from machine learning perspective Preliminaries
More informationExponentiated Gradient Descent
CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,
More informationMath 273a: Optimization Convex Conjugacy
Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationMATH 680 Fall November 27, Homework 3
MATH 680 Fall 208 November 27, 208 Homework 3 This homework is due on December 9 at :59pm. Provide both pdf, R files. Make an individual R file with proper comments for each sub-problem. Subgradients and
More informationRecent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables
Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationGradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725
Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex
More informationLinearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学
Linearized Alternating Direction Method: Two Blocks and Multiple Blocks Zhouchen Lin 林宙辰北京大学 Dec. 3, 014 Outline Alternating Direction Method (ADM) Linearized Alternating Direction Method (LADM) Two Blocks
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationDual Decomposition.
1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationStochastic Optimization: First order method
Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO
More informationLecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent
10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for
More informationSelected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018
Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationI P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION
I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1
More informationUnconstrained minimization
CSCI5254: Convex Optimization & Its Applications Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions 1 Unconstrained
More informationLecture 1: Background on Convex Analysis
Lecture 1: Background on Convex Analysis John Duchi PCMI 2016 Outline I Convex sets 1.1 Definitions and examples 2.2 Basic properties 3.3 Projections onto convex sets 4.4 Separating and supporting hyperplanes
More informationLecture 23: November 21
10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationConvex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization
Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f
More informationDesign and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016
Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)
More informationLecture 6: September 17
10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 6: September 17 Scribes: Scribes: Wenjun Wang, Satwik Kottur, Zhiding Yu Note: LaTeX template courtesy of UC Berkeley EECS
More informationDouglas-Rachford Splitting: Complexity Estimates and Accelerated Variants
53rd IEEE Conference on Decision and Control December 5-7, 204. Los Angeles, California, USA Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants Panagiotis Patrinos and Lorenzo Stella
More informationELE 538B: Large-Scale Optimization for Data Science. Quasi-Newton methods. Yuxin Chen Princeton University, Spring 2018
ELE 538B: Large-Scale Opimizaion for Daa Science Quasi-Newon mehods Yuxin Chen Princeon Universiy, Spring 208 00 op ff(x (x)(k)) f p 2 L µ f 05 k f (xk ) k f (xk ) =) f op ieraions converges in only 5
More information4th Preparation Sheet - Solutions
Prof. Dr. Rainer Dahlhaus Probability Theory Summer term 017 4th Preparation Sheet - Solutions Remark: Throughout the exercise sheet we use the two equivalent definitions of separability of a metric space
More informationRelative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent
Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationA projection algorithm for strictly monotone linear complementarity problems.
A projection algorithm for strictly monotone linear complementarity problems. Erik Zawadzki Department of Computer Science epz@cs.cmu.edu Geoffrey J. Gordon Machine Learning Department ggordon@cs.cmu.edu
More informationFunctional Analysis Exercise Class
Functional Analysis Exercise Class Week 9 November 13 November Deadline to hand in the homeworks: your exercise class on week 16 November 20 November Exercises (1) Show that if T B(X, Y ) and S B(Y, Z)
More informationOptimization for Learning and Big Data
Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for
More informationWHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????
DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0
More informationLecture 5: September 12
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS
More informationIterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem
Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences
More information2 Regularized Image Reconstruction for Compressive Imaging and Beyond
EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationUnconstrained minimization: assumptions
Unconstrained minimization I terminology and assumptions I gradient descent method I steepest descent method I Newton s method I self-concordant functions I implementation IOE 611: Nonlinear Programming,
More informationLecture 2: Convex Sets and Functions
Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are
More information