Composite nonlinear models at scale
|
|
- Paulina Washington
- 5 years ago
- Views:
Transcription
1 Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW) Cornell ORIE 2017 AFOSR: FA NSF: DMS , CCF
2 Outline 1. Fast-gradient methods Complexity theory (review) New viewpoint: optimal quadratic averaging 2. Composite nonlinear models: F (x) = h(c(x)) Global complexity Regularity and local rapid convergence Illustration: phase retrieval 2/24
3 Notation Function f : R d R is α-convex and β-smooth if where q x f Q x Q x (y) = f(x) + f(x), y x + β y x 2 2 q x (y) = f(x) + f(x), y x + α y x 2 2 Q x f q x x 3/24
4 Notation Function f : R d R is α-convex and β-smooth if where q x f Q x Q x (y) = f(x) + f(x), y x + β y x 2 2 q x (y) = f(x) + f(x), y x + α y x 2 2 Q x f q x Condition number: x κ = β α 3/24
5 Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) 4/24
6 Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) Majorization view: x k+1 = argmin Q xk ( ) 4/24
7 Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) Majorization view: x k+1 = argmin Q xk ( ) Gradient Descent β-smooth β ɛ β-smooth & α-convex κ ln 1 ɛ Table: Iterations until f(x k ) f < ɛ 4/24
8 Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) Majorization view: x k+1 = argmin Q xk ( ) Gradient Descent Optimal Methods β-smooth β ɛ β ɛ β-smooth & α-convex κ ln 1 ɛ κ ln 1 ɛ Table: Iterations until f(x k ) f < ɛ 4/24
9 Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) Majorization view: x k+1 = argmin Q xk ( ) Gradient Descent Optimal Methods β-smooth β ɛ β ɛ β-smooth & α-convex κ ln 1 ɛ κ ln 1 ɛ Table: Iterations until f(x k ) f < ɛ (Nesterov 83, Yudin-Nemirovsky 83) 4/24
10 Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) Majorization view: x k+1 = argmin Q xk ( ) Gradient Descent Optimal Methods β-smooth β ɛ β ɛ β-smooth & α-convex κ ln 1 ɛ κ ln 1 ɛ Table: Iterations until f(x k ) f < ɛ (Nesterov 83, Yudin-Nemirovsky 83) Optimal methods have downsides: Not intuitive Not naturally monotone Difficult to augment with memory 4/24
11 Optimal quadratic averaging 5/24
12 Optimal method by optimal averaging Idea: Use lower models of f instead. 6/24
13 Optimal method by optimal averaging Idea: Use lower models of f instead. Notation: x + = x 1 β f(x) and x++ = x 1 α f(x) 6/24
14 Optimal method by optimal averaging Idea: Use lower models of f instead. Notation: x + = x 1 β f(x) and x++ = x 1 α f(x) Convexity bound f q x in canonical form: ( ) f(y) f(x) f(x) 2 + α 2α 2 y x++ 2 6/24
15 Optimal method by optimal averaging Idea: Use lower models of f instead. Notation: x + = x 1 β f(x) and x++ = x 1 α f(x) Convexity bound f q x in canonical form: ( ) f(y) f(x) f(x) 2 + α 2α 2 y x++ 2 Lower models: Q A (x) = v A + α 2 x x A 2 Q B (x) = v B + α 2 x x B 2 6/24
16 Optimal method by optimal averaging Idea: Use lower models of f instead. Notation: x + = x 1 β f(x) and x++ = x 1 α f(x) Convexity bound f q x in canonical form: ( ) f(y) f(x) f(x) 2 + α 2α 2 y x++ 2 Lower models: Q A (x) = v A + α 2 x x A 2 Q B (x) = v B + α 2 x x B 2 = for any λ [0, 1] new lower-model Q λ := λq A + (1 λ)q B = v λ + α 2 x λ 2 6/24
17 Optimal method by optimal averaging Idea: Use lower models of f instead. Notation: x + = x 1 β f(x) and x++ = x 1 α f(x) Convexity bound f q x in canonical form: ( ) f(y) f(x) f(x) 2 + α 2α 2 y x++ 2 Lower models: Q A (x) = v A + α 2 x x A 2 Q B (x) = v B + α 2 x x B 2 = for any λ [0, 1] new lower-model Q λ := λq A + (1 λ)q B = v λ + α 2 x λ 2 Key observation: v λ f 6/24
18 Optimal method by optimal averaging The minimum v λ is maximized when ( 1 λ = proj [0,1] 2 + v ) A v B α x A x B 2. The quadratic Q λ is the optimal averaging of (Q A, Q B ). 7/24
19 Optimal method by optimal averaging The minimum v λ is maximized when ( 1 λ = proj [0,1] 2 + v ) A v B α x A x B 2. The quadratic Q λ is the optimal averaging of (Q A, Q B ). 7/24
20 Optimal method by optimal averaging The minimum v λ is maximized when ( 1 λ = proj [0,1] 2 + v ) A v B α x A x B 2. The quadratic Q λ is the optimal averaging of (Q A, Q B ). Related to cutting plane, bundle methods. 7/24
21 Optimal method by optimal averaging for k = 1,..., K do Set Q(x) = (f(x k ) f(x k) 2 end ) 2α + α 2 x x ++ k 2 Let Q k (x) = v k + α 2 x c k 2 be optim. average of (Q, Q k 1 ). Set x k+1 = line_search ( c k, x + k Algorithm: Optimal averaging ) 8/24
22 Optimal method by optimal averaging for k = 1,..., K do Set Q(x) = (f(x k ) f(x k) 2 end ) 2α + α 2 x x ++ k 2 Let Q k (x) = v k + α 2 x c k 2 be optim. average of (Q, Q k 1 ). Set x k+1 = line_search ( c k, x + k Algorithm: Optimal averaging equivalent to geometric descent (Bubeck-Lee-Singh 15) ) 8/24
23 Optimal method by optimal averaging for k = 1,..., K do Set Q(x) = (f(x k ) f(x k) 2 end ) 2α + α 2 x x ++ k 2 Let Q k (x) = v k + α 2 x c k 2 be optim. average of (Q, Q k 1 ). Set x k+1 = line_search ( c k, x + k Algorithm: Optimal averaging equivalent to geometric descent (Bubeck-Lee-Singh 15) Optimal Rate (Bubeck-Lee-Singh 15, D-Fazel-Roy 16): f(x + k ) v k ɛ after O β α ln 1 ɛ iterations. ) 8/24
24 Optimal method by optimal averaging for k = 1,..., K do Set Q(x) = (f(x k ) f(x k) 2 end ) 2α + α 2 x x ++ k 2 Let Q k (x) = v k + α 2 x c k 2 be optim. average of (Q, Q k 1 ). Set x k+1 = line_search ( c k, x + k Algorithm: Optimal averaging equivalent to geometric descent (Bubeck-Lee-Singh 15) Optimal Rate (Bubeck-Lee-Singh 15, D-Fazel-Roy 16): f(x + k ) v k ɛ after O β α ln 1 ɛ iterations. Intuitive Monotone in f(x + k ) and in v k. Memory by optimally averaging (Q, Q k 1,..., Q k t ). ) 8/24
25 Optimal method by optimal averaging Figure: Logistic regression with regularization α = /24
26 Optimal method by optimal averaging Figure: Logistic regression with regularization α = proximal extensions (Chen-Ma 16) underestimate sequences (Ma et al. 17) 9/24
27 Composite nonlinear models 10/24
28 Nonsmooth & Nonconvex minimization Convex composition: min x F (x) = h(c(x)) 11/24
29 Nonsmooth & Nonconvex minimization Convex composition: min x F (x) = h(c(x)) where h: R m R is convex and 1-Lipschitz. c: R d R m is C 1 -smooth and c is β-lipschitz. 11/24
30 Nonsmooth & Nonconvex minimization Convex composition: min x F (x) = h(c(x)) where h: R m R is convex and 1-Lipschitz. c: R d R m is C 1 -smooth and c is β-lipschitz. (Burke 85, Cartis-Gould-Toint 11, Fletcher 82, Lewis-Wright 15, Nesterov 06, Powell 84, Wright 90, Yuan 83,... ) 11/24
31 Examples min x h(c(x)) Examples: Robust Phase Retrieval: min (Ax) 2 b i 1 x Robust PCA: min XY D 1 X R d r,y R r k Nonneg. Factorization: min X,Y 0 XY D 12/24
32 Prox-linear algorithm min x F (x) = h ( c(x) ) 13/24
33 Prox-linear algorithm min x F (x) = h ( c(x) ) Local Model: ( ) F x (y) := h c(x) + c(x)(y x) 13/24
34 Prox-linear algorithm min x F (x) = h ( c(x) ) Local Model: ( ) F x (y) := h c(x) + c(x)(y x) Accuracy: F x (y) F (y) β 2 y x 2 x, y 13/24
35 Prox-linear algorithm min x F (x) = h ( c(x) ) Local Model: ( ) F x (y) := h c(x) + c(x)(y x) Accuracy: F x (y) F (y) β 2 y x 2 x, y Prox-linear method (Burke, Fletcher, Nesterov, Powell,... ): x + = argmin y {F x (y) + β 2 y x 2} 13/24
36 Prox-linear algorithm min x F (x) = h ( c(x) ) Local Model: ( ) F x (y) := h c(x) + c(x)(y x) Accuracy: F x (y) F (y) β 2 y x 2 x, y Prox-linear method (Burke, Fletcher, Nesterov, Powell,... ): x + = argmin y {F x (y) + β 2 y x 2} Big assumption: x + is computable (for now) 13/24
37 Figure: f(x) = x /24
38 0.75 f x 0.5 Figure: f(x) = x /24
39 f x + (x 0.5) Figure: f(x) = x /24
40 f x + (x 0.5) Figure: f(x) = x /24
41 f x + (x 0.5) Figure: f(x) = x 2 1 No finite termination 14/24
42 Sublinear rate Prox-gradient: G(x) := β(x x + ) 15/24
43 Sublinear rate Prox-gradient: G(x) := β(x x + ) Philosophy (Nesterov 13): G(x) F (x) 15/24
44 Sublinear rate Prox-gradient: G(x) := β(x x + ) Philosophy (Nesterov 13): G(x) F (x) Thm: (D-Paquette 16) Define the Moreau envelope F t (x) := inf {F (y) + t } y 2 y x 2. Then F 2β is smooth with G(x) F 2β (x) 15/24
45 Sublinear rate Prox-gradient: G(x) := β(x x + ) Philosophy (Nesterov 13): G(x) F (x) Thm: (D-Paquette 16) Define the Moreau envelope F t (x) := inf {F (y) + t } y 2 y x 2. Then F 2β is smooth with G(x) F 2β (x) Iterations Basic Operations β ε 2 (F (x 0 ) F ) β c ε 3 (F (x 0 ) F ) Likely optimal (Carmon, Duchi, Hinder, Sidford 17) 15/24
46 Two regularity conditions Fix x S := {x : 0 F (x)}. 1) Tilt-stability: (Poliquin-Rockafellar 98) v argmin x B r( x) 1 {F (x) v, x } is α-lipschitz near v = 0 16/24
47 Two regularity conditions Fix x S := {x : 0 F (x)}. 1) Tilt-stability: (Poliquin-Rockafellar 98) v argmin x B r( x) 1 {F (x) v, x } is α-lipschitz near v = 0 2) Sharpness: (Burke-Ferris 93) F (x) F ( x) α dist(x, S) for x B r ( x) 16/24
48 Two regularity conditions Fix x S := {x : 0 F (x)}. 1) Tilt-stability: (Poliquin-Rockafellar 98) v argmin x B r( x) 1 {F (x) v, x } is α-lipschitz near v = 0 2) Sharpness: (Burke-Ferris 93) F (x) F ( x) α dist(x, S) for x B r ( x) Convergence rates: Regularity Tilt-stability Guarantee F (x k+1 ) F F (x k ) F 1 α β Sharpness x k+1 x O( x k x 2 ) (Nesterov 06, D-Lewis 15) 16/24
49 Illustration: phase retrieval 17/24
50 Example: Robust phase retrieval Problem: Find x R d satisfying (a T i x) 2 b i for a 1,..., a m R d and b 1,..., b m R. 18/24
51 Example: Robust phase retrieval Problem: Find x R d satisfying (a T i x) 2 b i for a 1,..., a m R d and b 1,..., b m R. Composite formulation: min x F (x) := 1 m (Ax)2 b 1 18/24
52 Example: Robust phase retrieval Problem: Find x R d satisfying (a T i x) 2 b i for a 1,..., a m R d and b 1,..., b m R. Composite formulation: min x F (x) := 1 m (Ax)2 b 1 Assume a i N(0, I) independently and b = (A x) 2. 18/24
53 Example: Robust phase retrieval Problem: Find x R d satisfying (a T i x) 2 b i for a 1,..., a m R d and b 1,..., b m R. Composite formulation: min x F (x) := 1 m (Ax)2 b 1 Assume a i N(0, I) independently and b = (A x) 2. Two key consequences: constants β, α > 0 such that w.h.p. Approximation: (Duchi-Ruan 17) F (y) F x (y) β 2 y x 2 2 Sharpness: (Eldar-Mendelson 14) 1 m (Ax)2 (Ay) 2 1 α x y 2 x + y 2. 18/24
54 Intuition F approximates the population objective: FP (x) = Ea [(at x)2 (at x )2 ] 19/24
55 Intuition F approximates the population objective: FP (x) = Ea [(at x)2 (at x )2 ] = conv_func λ(xxt x x T ). 19/24
56 Intuition F approximates the population objective: FP (x) = Ea [(at x)2 (at x )2 ] 19/24
57 Intuition F approximates the population objective: FP (x) = Ea [(at x)2 (at x )2 ] Figure: Contour plot of x 7 k FP (x)k. 19/24
58 Stationary point consistency Thm: (Davis-D-Paquette 17) Whenever m Cd, all stationary x satisfy ( x x x x + x x 3 4 d m or x x c 4 d m 1 + x x x, x x x 4 d m x x with high probability. ), 20/24
59 Prox-linear and subgradient methods Prox-linear method: x + = argmin y F x (y) + β 2 y x 2 21/24
60 Prox-linear and subgradient methods Prox-linear method: x + = argmin y F x (y) + β 2 y x 2 Thm: There exists R > 0 such that if x 0 x x R, then w.h.p. prox-linear iterates converge quadratically to x (Duchi-Ruan 17) 21/24
61 Prox-linear and subgradient methods Prox-linear method: x + = argmin y Polyak F x (y) + β 2 y x 2 subgradient: ( ) x + = x F (x) inf F F (x) F (x) 2 Thm: There exists R > 0 such that if x 0 x x R, then w.h.p. prox-linear iterates converge quadratically to x (Duchi-Ruan 17) 21/24
62 Prox-linear and subgradient methods Prox-linear method: x + = argmin y Polyak F x (y) + β 2 y x 2 subgradient: ( ) x + = x F (x) inf F F (x) F (x) 2 Thm: There exists R > 0 such that if x 0 x x R, then w.h.p. prox-linear iterates converge quadratically to x (Duchi-Ruan 17) subgradient iterates converge to x at constant linear rate (Davis-D-Paquette 17) 21/24
63 Prox-linear and subgradient methods Prox-linear method: x + = argmin y Polyak F x (y) + β 2 y x 2 subgradient: ( ) x + = x F (x) inf F F (x) F (x) 2 Thm: There exists R > 0 such that if x 0 x x R, then w.h.p. prox-linear iterates converge quadratically to x (Duchi-Ruan 17) subgradient iterates converge to x at constant linear rate (Davis-D-Paquette 17) Spectral initialization (Wang at al. 16), (Candès et al. 15). 21/24
64 Prox-linear and subgradient methods Prox-linear method: x + = argmin y Polyak F x (y) + β 2 y x 2 subgradient: ( ) x + = x F (x) inf F F (x) F (x) 2 Thm: There exists R > 0 such that if x 0 x x R, then w.h.p. prox-linear iterates converge quadratically to x (Duchi-Ruan 17) subgradient iterates converge to x at constant linear rate (Davis-D-Paquette 17) Spectral initialization (Wang at al. 16), (Candès et al. 15). Convex approach: (Candès-Strohmer-Voroniski 13) Other nonconvex approaches: (Candès-Li-Soltanolkotabi 15), (Tan-Vershynin 17), (Wang-Giannakis-Eldar 16), (Zhang-Chi-Liang 16), (Sun-Qu-Wright 17), Nonconvex subgradient methods: (Davis-Grimmer 17) 21/24
65 Figure: (d, m) (2 9, 2 11 ). 22/24
66 Figure: (d, m) (2 9, 2 11 ). 22/24
67 xk x / x Iteration k 22/24
68 Figure: (d, m) (2 22, 2 24 ) (left) and (d, m) (2 24, 2 25 ) (right). 23/24
69 Figure: iterates vs. x k x / x. 23/24
70 Summary: optimal quadratic averaging composite nonlinear models: complexity and regularity illustration: phase retrieval References: 1. The nonsmooth lanscape of phase retrieval, Davis-D-Paquette, arxiv: , Error bounds, quadratic growth, and linear convergence of proximal methods, D-Lewis, Math. Oper. Res., Efficiency of minimizing compositions of convex functions and smooth maps, D-Paquette, arxiv: , An optimal first order method based on optimal quadratic averaging, D-Fazel-Roy, SIAM J. Optim., /24
71 Summary: optimal quadratic averaging composite nonlinear models: complexity and regularity illustration: phase retrieval References: 1. The nonsmooth lanscape of phase retrieval, Davis-D-Paquette, arxiv: , Error bounds, quadratic growth, and linear convergence of proximal methods, D-Lewis, Math. Oper. Res., Efficiency of minimizing compositions of convex functions and smooth maps, D-Paquette, arxiv: , An optimal first order method based on optimal quadratic averaging, D-Fazel-Roy, SIAM J. Optim., Thank you! 24/24
Expanding the reach of optimal methods
Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM
More informationTaylor-like models in nonsmooth optimization
Taylor-like models in nonsmooth optimization Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with Ioffe (Technion), Lewis (Cornell), and Paquette (UW) SIAM Optimization 2017 AFOSR,
More informationAn accelerated algorithm for minimizing convex compositions
An accelerated algorithm for minimizing convex compositions D. Drusvyatskiy C. Kempton April 30, 016 Abstract We describe a new proximal algorithm for minimizing compositions of finite-valued convex functions
More informationActive sets, steepest descent, and smooth approximation of functions
Active sets, steepest descent, and smooth approximation of functions Dmitriy Drusvyatskiy School of ORIE, Cornell University Joint work with Alex D. Ioffe (Technion), Martin Larsson (EPFL), and Adrian
More informationConvergence of Cubic Regularization for Nonconvex Optimization under KŁ Property
Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Yi Zhou Department of ECE The Ohio State University zhou.1172@osu.edu Zhe Wang Department of ECE The Ohio State University
More informationThe nonsmooth landscape of phase retrieval
The nonsmooth landscape of phase retrieval Damek Davis Dmitriy Drusvyatskiy Courtney Paquette Abstract We consider a popular nonsmooth formulation of the real phase retrieval problem. We show that under
More informationThe nonsmooth landscape of phase retrieval
The nonsmooth landscape of phase retrieval Damek Davis Dmitriy Drusvyatskiy Courtney Paquette Abstract We consider a popular nonsmooth formulation of the real phase retrieval problem. We show that under
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationSubgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725
Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationON PROXIMAL POINT-TYPE ALGORITHMS FOR WEAKLY CONVEX FUNCTIONS AND THEIR CONNECTION TO THE BACKWARD EULER METHOD
ON PROXIMAL POINT-TYPE ALGORITHMS FOR WEAKLY CONVEX FUNCTIONS AND THEIR CONNECTION TO THE BACKWARD EULER METHOD TIM HOHEISEL, MAXIME LABORDE, AND ADAM OBERMAN Abstract. In this article we study the connection
More informationEfficiency of minimizing compositions of convex functions and smooth maps
Efficiency of minimizing compositions of convex functions and smooth maps D. Drusvyatskiy C. Paquette Abstract We consider global efficiency of algorithms for minimizing a sum of a convex function and
More informationInexact alternating projections on nonconvex sets
Inexact alternating projections on nonconvex sets D. Drusvyatskiy A.S. Lewis November 3, 2018 Dedicated to our friend, colleague, and inspiration, Alex Ioffe, on the occasion of his 80th birthday. Abstract
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationTame variational analysis
Tame variational analysis Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with Daniilidis (Chile), Ioffe (Technion), and Lewis (Cornell) May 19, 2015 Theme: Semi-algebraic geometry
More informationarxiv: v1 [math.oc] 9 Oct 2018
Cubic Regularization with Momentum for Nonconvex Optimization Zhe Wang Yi Zhou Yingbin Liang Guanghui Lan Ohio State University Ohio State University zhou.117@osu.edu liang.889@osu.edu Ohio State University
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationGeometric Descent Method for Convex Composite Minimization
Geometric Descent Method for Convex Composite Minimization Shixiang Chen 1, Shiqian Ma 1, and Wei Liu 2 1 Department of SEEM, The Chinese University of Hong Kong, Hong Kong 2 Tencent AI Lab, Shenzhen,
More informationIdentifying Active Constraints via Partial Smoothness and Prox-Regularity
Journal of Convex Analysis Volume 11 (2004), No. 2, 251 266 Identifying Active Constraints via Partial Smoothness and Prox-Regularity W. L. Hare Department of Mathematics, Simon Fraser University, Burnaby,
More informationSecond-Order Methods for Stochastic Optimization
Second-Order Methods for Stochastic Optimization Frank E. Curtis, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationOptimisation non convexe avec garanties de complexité via Newton+gradient conjugué
Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Clément Royer (Université du Wisconsin-Madison, États-Unis) Toulouse, 8 janvier 2019 Nonconvex optimization via Newton-CG
More informationA Proximal Method for Identifying Active Manifolds
A Proximal Method for Identifying Active Manifolds W.L. Hare April 18, 2006 Abstract The minimization of an objective function over a constraint set can often be simplified if the active manifold of the
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationWorst-Case Complexity Guarantees and Nonconvex Smooth Optimization
Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationMcMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.
McMaster University Advanced Optimization Laboratory Title: A Proximal Method for Identifying Active Manifolds Authors: Warren L. Hare AdvOl-Report No. 2006/07 April 2006, Hamilton, Ontario, Canada A Proximal
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationA user s guide to Lojasiewicz/KL inequalities
Other A user s guide to Lojasiewicz/KL inequalities Toulouse School of Economics, Université Toulouse I SLRA, Grenoble, 2015 Motivations behind KL f : R n R smooth ẋ(t) = f (x(t)) or x k+1 = x k λ k f
More informationNonsmooth optimization: conditioning, convergence, and semi-algebraic models
Nonsmooth optimization: conditioning, convergence, and semi-algebraic models Adrian Lewis ORIE Cornell International Congress of Mathematicians Seoul, August 2014 1/16 Outline I Optimization and inverse
More informationBORIS MORDUKHOVICH Wayne State University Detroit, MI 48202, USA. Talk given at the SPCOM Adelaide, Australia, February 2015
CODERIVATIVE CHARACTERIZATIONS OF MAXIMAL MONOTONICITY BORIS MORDUKHOVICH Wayne State University Detroit, MI 48202, USA Talk given at the SPCOM 2015 Adelaide, Australia, February 2015 Based on joint papers
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationPrimal-dual Subgradient Method for Convex Problems with Functional Constraints
Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationInfeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization
Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationComplexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization
Mathematical Programming manuscript No. (will be inserted by the editor) Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization Wei Bian Xiaojun Chen Yinyu Ye July
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationThe proximal point method revisited
The proimal point method revisited Dmitriy Drusvyatskiy Abstract In this short survey, I revisit the role of the proimal point method in large scale optimization. I focus on three recent eamples: a proimally
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationarxiv: v1 [math.oc] 7 Dec 2018
arxiv:1812.02878v1 [math.oc] 7 Dec 2018 Solving Non-Convex Non-Concave Min-Max Games Under Polyak- Lojasiewicz Condition Maziar Sanjabi, Meisam Razaviyayn, Jason D. Lee University of Southern California
More informationCOR-OPT Seminar Reading List Sp 18
COR-OPT Seminar Reading List Sp 18 Damek Davis January 28, 2018 References [1] S. Tu, R. Boczar, M. Simchowitz, M. Soltanolkotabi, and B. Recht. Low-rank Solutions of Linear Matrix Equations via Procrustes
More informationApolynomialtimeinteriorpointmethodforproblemswith nonconvex constraints
Apolynomialtimeinteriorpointmethodforproblemswith nonconvex constraints Oliver Hinder, Yinyu Ye Department of Management Science and Engineering Stanford University June 28, 2018 The problem I Consider
More informationHow hard is this function to optimize?
How hard is this function to optimize? John Duchi Based on joint work with Sabyasachi Chatterjee, John Lafferty, Yuancheng Zhu Stanford University West Coast Optimization Rumble October 2016 Problem minimize
More informationRapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization
Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Shuyang Ling Department of Mathematics, UC Davis Oct.18th, 2016 Shuyang Ling (UC Davis) 16w5136, Oaxaca, Mexico Oct.18th, 2016
More informationConvex Optimization on Large-Scale Domains Given by Linear Minimization Oracles
Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationProximal Minimization by Incremental Surrogate Optimization (MISO)
Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More informationI P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION
I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationWorst Case Complexity of Direct Search
Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationStochastic model-based minimization under high-order growth
Stochastic model-based minimization under high-order growth Damek Davis Dmitriy Drusvyatskiy Kellie J. MacPhee Abstract Given a nonsmooth, nonconvex minimization problem, we consider algorithms that iteratively
More informationA semi-algebraic look at first-order methods
splitting A semi-algebraic look at first-order Université de Toulouse / TSE Nesterov s 60th birthday, Les Houches, 2016 in large-scale first-order optimization splitting Start with a reasonable FOM (some
More informationCharacterization of Gradient Dominance and Regularity Conditions for Neural Networks
Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Yi Zhou Ohio State University Yingbin Liang Ohio State University Abstract zhou.1172@osu.edu liang.889@osu.edu The past
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationAn introduction to complexity analysis for nonconvex optimization
An introduction to complexity analysis for nonconvex optimization Philippe Toint (with Coralia Cartis and Nick Gould) FUNDP University of Namur, Belgium Séminaire Résidentiel Interdisciplinaire, Saint
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationOn the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method
Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationDownloaded 09/27/13 to Redistribution subject to SIAM license or copyright; see
SIAM J. OPTIM. Vol. 23, No., pp. 256 267 c 203 Society for Industrial and Applied Mathematics TILT STABILITY, UNIFORM QUADRATIC GROWTH, AND STRONG METRIC REGULARITY OF THE SUBDIFFERENTIAL D. DRUSVYATSKIY
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS
ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS Mau Nam Nguyen (joint work with D. Giles and R. B. Rector) Fariborz Maseeh Department of Mathematics and Statistics Portland State
More informationConvex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization
Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f
More informationAn interior-point trust-region polynomial algorithm for convex programming
An interior-point trust-region polynomial algorithm for convex programming Ye LU and Ya-xiang YUAN Abstract. An interior-point trust-region algorithm is proposed for minimization of a convex quadratic
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationGradient Methods Using Momentum and Memory
Chapter 3 Gradient Methods Using Momentum and Memory The steepest descent method described in Chapter always steps in the negative gradient direction, which is orthogonal to the boundary of the level set
More informationConvergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity
Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity Benjamin Grimmer Abstract We generalize the classic convergence rate theory for subgradient methods to
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationConvex Until Proven Guilty : Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions
Convex Until Proven Guilty : Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions Yair Carmon John C. Duchi Oliver Hinder Aaron Sidford Abstract We develop and analyze a variant of Nesterov
More informationProximal Methods for Optimization with Spasity-inducing Norms
Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology
More informationJanuary 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13
Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière Hestenes Stiefel January 29, 2014 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes
More informationAn example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian
An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian C. Cartis, N. I. M. Gould and Ph. L. Toint 3 May 23 Abstract An example is presented where Newton
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationMath 273a: Optimization Convex Conjugacy
Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More informationSequential convex programming,: value function and convergence
Sequential convex programming,: value function and convergence Edouard Pauwels joint work with Jérôme Bolte Journées MODE Toulouse March 23 2016 1 / 16 Introduction Local search methods for finite dimensional
More informationProvable Non-Convex Min-Max Optimization
Provable Non-Convex Min-Max Optimization Mingrui Liu, Hassan Rafique, Qihang Lin, Tianbao Yang Department of Computer Science, The University of Iowa, Iowa City, IA, 52242 Department of Mathematics, The
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationNONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM
NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM JIAYI GUO AND A.S. LEWIS Abstract. The popular BFGS quasi-newton minimization algorithm under reasonable conditions converges globally on smooth
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationLecture 25: Subgradient Method and Bundle Methods April 24
IE 51: Convex Optimization Spring 017, UIUC Lecture 5: Subgradient Method and Bundle Methods April 4 Instructor: Niao He Scribe: Shuanglong Wang Courtesy warning: hese notes do not necessarily cover everything
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationFinite Convergence for Feasible Solution Sequence of Variational Inequality Problems
Mathematical and Computational Applications Article Finite Convergence for Feasible Solution Sequence of Variational Inequality Problems Wenling Zhao *, Ruyu Wang and Hongxiang Zhang School of Science,
More informationHow to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India
How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India Chi Jin UC Berkeley Michael I. Jordan UC Berkeley Rong Ge Duke Univ. Sham M. Kakade U Washington Nonconvex optimization
More informationA globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications
A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth
More information