Composite nonlinear models at scale

Size: px
Start display at page:

Download "Composite nonlinear models at scale"

Transcription

1 Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW) Cornell ORIE 2017 AFOSR: FA NSF: DMS , CCF

2 Outline 1. Fast-gradient methods Complexity theory (review) New viewpoint: optimal quadratic averaging 2. Composite nonlinear models: F (x) = h(c(x)) Global complexity Regularity and local rapid convergence Illustration: phase retrieval 2/24

3 Notation Function f : R d R is α-convex and β-smooth if where q x f Q x Q x (y) = f(x) + f(x), y x + β y x 2 2 q x (y) = f(x) + f(x), y x + α y x 2 2 Q x f q x x 3/24

4 Notation Function f : R d R is α-convex and β-smooth if where q x f Q x Q x (y) = f(x) + f(x), y x + β y x 2 2 q x (y) = f(x) + f(x), y x + α y x 2 2 Q x f q x Condition number: x κ = β α 3/24

5 Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) 4/24

6 Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) Majorization view: x k+1 = argmin Q xk ( ) 4/24

7 Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) Majorization view: x k+1 = argmin Q xk ( ) Gradient Descent β-smooth β ɛ β-smooth & α-convex κ ln 1 ɛ Table: Iterations until f(x k ) f < ɛ 4/24

8 Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) Majorization view: x k+1 = argmin Q xk ( ) Gradient Descent Optimal Methods β-smooth β ɛ β ɛ β-smooth & α-convex κ ln 1 ɛ κ ln 1 ɛ Table: Iterations until f(x k ) f < ɛ 4/24

9 Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) Majorization view: x k+1 = argmin Q xk ( ) Gradient Descent Optimal Methods β-smooth β ɛ β ɛ β-smooth & α-convex κ ln 1 ɛ κ ln 1 ɛ Table: Iterations until f(x k ) f < ɛ (Nesterov 83, Yudin-Nemirovsky 83) 4/24

10 Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) Majorization view: x k+1 = argmin Q xk ( ) Gradient Descent Optimal Methods β-smooth β ɛ β ɛ β-smooth & α-convex κ ln 1 ɛ κ ln 1 ɛ Table: Iterations until f(x k ) f < ɛ (Nesterov 83, Yudin-Nemirovsky 83) Optimal methods have downsides: Not intuitive Not naturally monotone Difficult to augment with memory 4/24

11 Optimal quadratic averaging 5/24

12 Optimal method by optimal averaging Idea: Use lower models of f instead. 6/24

13 Optimal method by optimal averaging Idea: Use lower models of f instead. Notation: x + = x 1 β f(x) and x++ = x 1 α f(x) 6/24

14 Optimal method by optimal averaging Idea: Use lower models of f instead. Notation: x + = x 1 β f(x) and x++ = x 1 α f(x) Convexity bound f q x in canonical form: ( ) f(y) f(x) f(x) 2 + α 2α 2 y x++ 2 6/24

15 Optimal method by optimal averaging Idea: Use lower models of f instead. Notation: x + = x 1 β f(x) and x++ = x 1 α f(x) Convexity bound f q x in canonical form: ( ) f(y) f(x) f(x) 2 + α 2α 2 y x++ 2 Lower models: Q A (x) = v A + α 2 x x A 2 Q B (x) = v B + α 2 x x B 2 6/24

16 Optimal method by optimal averaging Idea: Use lower models of f instead. Notation: x + = x 1 β f(x) and x++ = x 1 α f(x) Convexity bound f q x in canonical form: ( ) f(y) f(x) f(x) 2 + α 2α 2 y x++ 2 Lower models: Q A (x) = v A + α 2 x x A 2 Q B (x) = v B + α 2 x x B 2 = for any λ [0, 1] new lower-model Q λ := λq A + (1 λ)q B = v λ + α 2 x λ 2 6/24

17 Optimal method by optimal averaging Idea: Use lower models of f instead. Notation: x + = x 1 β f(x) and x++ = x 1 α f(x) Convexity bound f q x in canonical form: ( ) f(y) f(x) f(x) 2 + α 2α 2 y x++ 2 Lower models: Q A (x) = v A + α 2 x x A 2 Q B (x) = v B + α 2 x x B 2 = for any λ [0, 1] new lower-model Q λ := λq A + (1 λ)q B = v λ + α 2 x λ 2 Key observation: v λ f 6/24

18 Optimal method by optimal averaging The minimum v λ is maximized when ( 1 λ = proj [0,1] 2 + v ) A v B α x A x B 2. The quadratic Q λ is the optimal averaging of (Q A, Q B ). 7/24

19 Optimal method by optimal averaging The minimum v λ is maximized when ( 1 λ = proj [0,1] 2 + v ) A v B α x A x B 2. The quadratic Q λ is the optimal averaging of (Q A, Q B ). 7/24

20 Optimal method by optimal averaging The minimum v λ is maximized when ( 1 λ = proj [0,1] 2 + v ) A v B α x A x B 2. The quadratic Q λ is the optimal averaging of (Q A, Q B ). Related to cutting plane, bundle methods. 7/24

21 Optimal method by optimal averaging for k = 1,..., K do Set Q(x) = (f(x k ) f(x k) 2 end ) 2α + α 2 x x ++ k 2 Let Q k (x) = v k + α 2 x c k 2 be optim. average of (Q, Q k 1 ). Set x k+1 = line_search ( c k, x + k Algorithm: Optimal averaging ) 8/24

22 Optimal method by optimal averaging for k = 1,..., K do Set Q(x) = (f(x k ) f(x k) 2 end ) 2α + α 2 x x ++ k 2 Let Q k (x) = v k + α 2 x c k 2 be optim. average of (Q, Q k 1 ). Set x k+1 = line_search ( c k, x + k Algorithm: Optimal averaging equivalent to geometric descent (Bubeck-Lee-Singh 15) ) 8/24

23 Optimal method by optimal averaging for k = 1,..., K do Set Q(x) = (f(x k ) f(x k) 2 end ) 2α + α 2 x x ++ k 2 Let Q k (x) = v k + α 2 x c k 2 be optim. average of (Q, Q k 1 ). Set x k+1 = line_search ( c k, x + k Algorithm: Optimal averaging equivalent to geometric descent (Bubeck-Lee-Singh 15) Optimal Rate (Bubeck-Lee-Singh 15, D-Fazel-Roy 16): f(x + k ) v k ɛ after O β α ln 1 ɛ iterations. ) 8/24

24 Optimal method by optimal averaging for k = 1,..., K do Set Q(x) = (f(x k ) f(x k) 2 end ) 2α + α 2 x x ++ k 2 Let Q k (x) = v k + α 2 x c k 2 be optim. average of (Q, Q k 1 ). Set x k+1 = line_search ( c k, x + k Algorithm: Optimal averaging equivalent to geometric descent (Bubeck-Lee-Singh 15) Optimal Rate (Bubeck-Lee-Singh 15, D-Fazel-Roy 16): f(x + k ) v k ɛ after O β α ln 1 ɛ iterations. Intuitive Monotone in f(x + k ) and in v k. Memory by optimally averaging (Q, Q k 1,..., Q k t ). ) 8/24

25 Optimal method by optimal averaging Figure: Logistic regression with regularization α = /24

26 Optimal method by optimal averaging Figure: Logistic regression with regularization α = proximal extensions (Chen-Ma 16) underestimate sequences (Ma et al. 17) 9/24

27 Composite nonlinear models 10/24

28 Nonsmooth & Nonconvex minimization Convex composition: min x F (x) = h(c(x)) 11/24

29 Nonsmooth & Nonconvex minimization Convex composition: min x F (x) = h(c(x)) where h: R m R is convex and 1-Lipschitz. c: R d R m is C 1 -smooth and c is β-lipschitz. 11/24

30 Nonsmooth & Nonconvex minimization Convex composition: min x F (x) = h(c(x)) where h: R m R is convex and 1-Lipschitz. c: R d R m is C 1 -smooth and c is β-lipschitz. (Burke 85, Cartis-Gould-Toint 11, Fletcher 82, Lewis-Wright 15, Nesterov 06, Powell 84, Wright 90, Yuan 83,... ) 11/24

31 Examples min x h(c(x)) Examples: Robust Phase Retrieval: min (Ax) 2 b i 1 x Robust PCA: min XY D 1 X R d r,y R r k Nonneg. Factorization: min X,Y 0 XY D 12/24

32 Prox-linear algorithm min x F (x) = h ( c(x) ) 13/24

33 Prox-linear algorithm min x F (x) = h ( c(x) ) Local Model: ( ) F x (y) := h c(x) + c(x)(y x) 13/24

34 Prox-linear algorithm min x F (x) = h ( c(x) ) Local Model: ( ) F x (y) := h c(x) + c(x)(y x) Accuracy: F x (y) F (y) β 2 y x 2 x, y 13/24

35 Prox-linear algorithm min x F (x) = h ( c(x) ) Local Model: ( ) F x (y) := h c(x) + c(x)(y x) Accuracy: F x (y) F (y) β 2 y x 2 x, y Prox-linear method (Burke, Fletcher, Nesterov, Powell,... ): x + = argmin y {F x (y) + β 2 y x 2} 13/24

36 Prox-linear algorithm min x F (x) = h ( c(x) ) Local Model: ( ) F x (y) := h c(x) + c(x)(y x) Accuracy: F x (y) F (y) β 2 y x 2 x, y Prox-linear method (Burke, Fletcher, Nesterov, Powell,... ): x + = argmin y {F x (y) + β 2 y x 2} Big assumption: x + is computable (for now) 13/24

37 Figure: f(x) = x /24

38 0.75 f x 0.5 Figure: f(x) = x /24

39 f x + (x 0.5) Figure: f(x) = x /24

40 f x + (x 0.5) Figure: f(x) = x /24

41 f x + (x 0.5) Figure: f(x) = x 2 1 No finite termination 14/24

42 Sublinear rate Prox-gradient: G(x) := β(x x + ) 15/24

43 Sublinear rate Prox-gradient: G(x) := β(x x + ) Philosophy (Nesterov 13): G(x) F (x) 15/24

44 Sublinear rate Prox-gradient: G(x) := β(x x + ) Philosophy (Nesterov 13): G(x) F (x) Thm: (D-Paquette 16) Define the Moreau envelope F t (x) := inf {F (y) + t } y 2 y x 2. Then F 2β is smooth with G(x) F 2β (x) 15/24

45 Sublinear rate Prox-gradient: G(x) := β(x x + ) Philosophy (Nesterov 13): G(x) F (x) Thm: (D-Paquette 16) Define the Moreau envelope F t (x) := inf {F (y) + t } y 2 y x 2. Then F 2β is smooth with G(x) F 2β (x) Iterations Basic Operations β ε 2 (F (x 0 ) F ) β c ε 3 (F (x 0 ) F ) Likely optimal (Carmon, Duchi, Hinder, Sidford 17) 15/24

46 Two regularity conditions Fix x S := {x : 0 F (x)}. 1) Tilt-stability: (Poliquin-Rockafellar 98) v argmin x B r( x) 1 {F (x) v, x } is α-lipschitz near v = 0 16/24

47 Two regularity conditions Fix x S := {x : 0 F (x)}. 1) Tilt-stability: (Poliquin-Rockafellar 98) v argmin x B r( x) 1 {F (x) v, x } is α-lipschitz near v = 0 2) Sharpness: (Burke-Ferris 93) F (x) F ( x) α dist(x, S) for x B r ( x) 16/24

48 Two regularity conditions Fix x S := {x : 0 F (x)}. 1) Tilt-stability: (Poliquin-Rockafellar 98) v argmin x B r( x) 1 {F (x) v, x } is α-lipschitz near v = 0 2) Sharpness: (Burke-Ferris 93) F (x) F ( x) α dist(x, S) for x B r ( x) Convergence rates: Regularity Tilt-stability Guarantee F (x k+1 ) F F (x k ) F 1 α β Sharpness x k+1 x O( x k x 2 ) (Nesterov 06, D-Lewis 15) 16/24

49 Illustration: phase retrieval 17/24

50 Example: Robust phase retrieval Problem: Find x R d satisfying (a T i x) 2 b i for a 1,..., a m R d and b 1,..., b m R. 18/24

51 Example: Robust phase retrieval Problem: Find x R d satisfying (a T i x) 2 b i for a 1,..., a m R d and b 1,..., b m R. Composite formulation: min x F (x) := 1 m (Ax)2 b 1 18/24

52 Example: Robust phase retrieval Problem: Find x R d satisfying (a T i x) 2 b i for a 1,..., a m R d and b 1,..., b m R. Composite formulation: min x F (x) := 1 m (Ax)2 b 1 Assume a i N(0, I) independently and b = (A x) 2. 18/24

53 Example: Robust phase retrieval Problem: Find x R d satisfying (a T i x) 2 b i for a 1,..., a m R d and b 1,..., b m R. Composite formulation: min x F (x) := 1 m (Ax)2 b 1 Assume a i N(0, I) independently and b = (A x) 2. Two key consequences: constants β, α > 0 such that w.h.p. Approximation: (Duchi-Ruan 17) F (y) F x (y) β 2 y x 2 2 Sharpness: (Eldar-Mendelson 14) 1 m (Ax)2 (Ay) 2 1 α x y 2 x + y 2. 18/24

54 Intuition F approximates the population objective: FP (x) = Ea [(at x)2 (at x )2 ] 19/24

55 Intuition F approximates the population objective: FP (x) = Ea [(at x)2 (at x )2 ] = conv_func λ(xxt x x T ). 19/24

56 Intuition F approximates the population objective: FP (x) = Ea [(at x)2 (at x )2 ] 19/24

57 Intuition F approximates the population objective: FP (x) = Ea [(at x)2 (at x )2 ] Figure: Contour plot of x 7 k FP (x)k. 19/24

58 Stationary point consistency Thm: (Davis-D-Paquette 17) Whenever m Cd, all stationary x satisfy ( x x x x + x x 3 4 d m or x x c 4 d m 1 + x x x, x x x 4 d m x x with high probability. ), 20/24

59 Prox-linear and subgradient methods Prox-linear method: x + = argmin y F x (y) + β 2 y x 2 21/24

60 Prox-linear and subgradient methods Prox-linear method: x + = argmin y F x (y) + β 2 y x 2 Thm: There exists R > 0 such that if x 0 x x R, then w.h.p. prox-linear iterates converge quadratically to x (Duchi-Ruan 17) 21/24

61 Prox-linear and subgradient methods Prox-linear method: x + = argmin y Polyak F x (y) + β 2 y x 2 subgradient: ( ) x + = x F (x) inf F F (x) F (x) 2 Thm: There exists R > 0 such that if x 0 x x R, then w.h.p. prox-linear iterates converge quadratically to x (Duchi-Ruan 17) 21/24

62 Prox-linear and subgradient methods Prox-linear method: x + = argmin y Polyak F x (y) + β 2 y x 2 subgradient: ( ) x + = x F (x) inf F F (x) F (x) 2 Thm: There exists R > 0 such that if x 0 x x R, then w.h.p. prox-linear iterates converge quadratically to x (Duchi-Ruan 17) subgradient iterates converge to x at constant linear rate (Davis-D-Paquette 17) 21/24

63 Prox-linear and subgradient methods Prox-linear method: x + = argmin y Polyak F x (y) + β 2 y x 2 subgradient: ( ) x + = x F (x) inf F F (x) F (x) 2 Thm: There exists R > 0 such that if x 0 x x R, then w.h.p. prox-linear iterates converge quadratically to x (Duchi-Ruan 17) subgradient iterates converge to x at constant linear rate (Davis-D-Paquette 17) Spectral initialization (Wang at al. 16), (Candès et al. 15). 21/24

64 Prox-linear and subgradient methods Prox-linear method: x + = argmin y Polyak F x (y) + β 2 y x 2 subgradient: ( ) x + = x F (x) inf F F (x) F (x) 2 Thm: There exists R > 0 such that if x 0 x x R, then w.h.p. prox-linear iterates converge quadratically to x (Duchi-Ruan 17) subgradient iterates converge to x at constant linear rate (Davis-D-Paquette 17) Spectral initialization (Wang at al. 16), (Candès et al. 15). Convex approach: (Candès-Strohmer-Voroniski 13) Other nonconvex approaches: (Candès-Li-Soltanolkotabi 15), (Tan-Vershynin 17), (Wang-Giannakis-Eldar 16), (Zhang-Chi-Liang 16), (Sun-Qu-Wright 17), Nonconvex subgradient methods: (Davis-Grimmer 17) 21/24

65 Figure: (d, m) (2 9, 2 11 ). 22/24

66 Figure: (d, m) (2 9, 2 11 ). 22/24

67 xk x / x Iteration k 22/24

68 Figure: (d, m) (2 22, 2 24 ) (left) and (d, m) (2 24, 2 25 ) (right). 23/24

69 Figure: iterates vs. x k x / x. 23/24

70 Summary: optimal quadratic averaging composite nonlinear models: complexity and regularity illustration: phase retrieval References: 1. The nonsmooth lanscape of phase retrieval, Davis-D-Paquette, arxiv: , Error bounds, quadratic growth, and linear convergence of proximal methods, D-Lewis, Math. Oper. Res., Efficiency of minimizing compositions of convex functions and smooth maps, D-Paquette, arxiv: , An optimal first order method based on optimal quadratic averaging, D-Fazel-Roy, SIAM J. Optim., /24

71 Summary: optimal quadratic averaging composite nonlinear models: complexity and regularity illustration: phase retrieval References: 1. The nonsmooth lanscape of phase retrieval, Davis-D-Paquette, arxiv: , Error bounds, quadratic growth, and linear convergence of proximal methods, D-Lewis, Math. Oper. Res., Efficiency of minimizing compositions of convex functions and smooth maps, D-Paquette, arxiv: , An optimal first order method based on optimal quadratic averaging, D-Fazel-Roy, SIAM J. Optim., Thank you! 24/24

Expanding the reach of optimal methods

Expanding the reach of optimal methods Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM

More information

Taylor-like models in nonsmooth optimization

Taylor-like models in nonsmooth optimization Taylor-like models in nonsmooth optimization Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with Ioffe (Technion), Lewis (Cornell), and Paquette (UW) SIAM Optimization 2017 AFOSR,

More information

An accelerated algorithm for minimizing convex compositions

An accelerated algorithm for minimizing convex compositions An accelerated algorithm for minimizing convex compositions D. Drusvyatskiy C. Kempton April 30, 016 Abstract We describe a new proximal algorithm for minimizing compositions of finite-valued convex functions

More information

Active sets, steepest descent, and smooth approximation of functions

Active sets, steepest descent, and smooth approximation of functions Active sets, steepest descent, and smooth approximation of functions Dmitriy Drusvyatskiy School of ORIE, Cornell University Joint work with Alex D. Ioffe (Technion), Martin Larsson (EPFL), and Adrian

More information

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Yi Zhou Department of ECE The Ohio State University zhou.1172@osu.edu Zhe Wang Department of ECE The Ohio State University

More information

The nonsmooth landscape of phase retrieval

The nonsmooth landscape of phase retrieval The nonsmooth landscape of phase retrieval Damek Davis Dmitriy Drusvyatskiy Courtney Paquette Abstract We consider a popular nonsmooth formulation of the real phase retrieval problem. We show that under

More information

The nonsmooth landscape of phase retrieval

The nonsmooth landscape of phase retrieval The nonsmooth landscape of phase retrieval Damek Davis Dmitriy Drusvyatskiy Courtney Paquette Abstract We consider a popular nonsmooth formulation of the real phase retrieval problem. We show that under

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725 Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

ON PROXIMAL POINT-TYPE ALGORITHMS FOR WEAKLY CONVEX FUNCTIONS AND THEIR CONNECTION TO THE BACKWARD EULER METHOD

ON PROXIMAL POINT-TYPE ALGORITHMS FOR WEAKLY CONVEX FUNCTIONS AND THEIR CONNECTION TO THE BACKWARD EULER METHOD ON PROXIMAL POINT-TYPE ALGORITHMS FOR WEAKLY CONVEX FUNCTIONS AND THEIR CONNECTION TO THE BACKWARD EULER METHOD TIM HOHEISEL, MAXIME LABORDE, AND ADAM OBERMAN Abstract. In this article we study the connection

More information

Efficiency of minimizing compositions of convex functions and smooth maps

Efficiency of minimizing compositions of convex functions and smooth maps Efficiency of minimizing compositions of convex functions and smooth maps D. Drusvyatskiy C. Paquette Abstract We consider global efficiency of algorithms for minimizing a sum of a convex function and

More information

Inexact alternating projections on nonconvex sets

Inexact alternating projections on nonconvex sets Inexact alternating projections on nonconvex sets D. Drusvyatskiy A.S. Lewis November 3, 2018 Dedicated to our friend, colleague, and inspiration, Alex Ioffe, on the occasion of his 80th birthday. Abstract

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Tame variational analysis

Tame variational analysis Tame variational analysis Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with Daniilidis (Chile), Ioffe (Technion), and Lewis (Cornell) May 19, 2015 Theme: Semi-algebraic geometry

More information

arxiv: v1 [math.oc] 9 Oct 2018

arxiv: v1 [math.oc] 9 Oct 2018 Cubic Regularization with Momentum for Nonconvex Optimization Zhe Wang Yi Zhou Yingbin Liang Guanghui Lan Ohio State University Ohio State University zhou.117@osu.edu liang.889@osu.edu Ohio State University

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Geometric Descent Method for Convex Composite Minimization

Geometric Descent Method for Convex Composite Minimization Geometric Descent Method for Convex Composite Minimization Shixiang Chen 1, Shiqian Ma 1, and Wei Liu 2 1 Department of SEEM, The Chinese University of Hong Kong, Hong Kong 2 Tencent AI Lab, Shenzhen,

More information

Identifying Active Constraints via Partial Smoothness and Prox-Regularity

Identifying Active Constraints via Partial Smoothness and Prox-Regularity Journal of Convex Analysis Volume 11 (2004), No. 2, 251 266 Identifying Active Constraints via Partial Smoothness and Prox-Regularity W. L. Hare Department of Mathematics, Simon Fraser University, Burnaby,

More information

Second-Order Methods for Stochastic Optimization

Second-Order Methods for Stochastic Optimization Second-Order Methods for Stochastic Optimization Frank E. Curtis, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Clément Royer (Université du Wisconsin-Madison, États-Unis) Toulouse, 8 janvier 2019 Nonconvex optimization via Newton-CG

More information

A Proximal Method for Identifying Active Manifolds

A Proximal Method for Identifying Active Manifolds A Proximal Method for Identifying Active Manifolds W.L. Hare April 18, 2006 Abstract The minimization of an objective function over a constraint set can often be simplified if the active manifold of the

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L. McMaster University Advanced Optimization Laboratory Title: A Proximal Method for Identifying Active Manifolds Authors: Warren L. Hare AdvOl-Report No. 2006/07 April 2006, Hamilton, Ontario, Canada A Proximal

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

A user s guide to Lojasiewicz/KL inequalities

A user s guide to Lojasiewicz/KL inequalities Other A user s guide to Lojasiewicz/KL inequalities Toulouse School of Economics, Université Toulouse I SLRA, Grenoble, 2015 Motivations behind KL f : R n R smooth ẋ(t) = f (x(t)) or x k+1 = x k λ k f

More information

Nonsmooth optimization: conditioning, convergence, and semi-algebraic models

Nonsmooth optimization: conditioning, convergence, and semi-algebraic models Nonsmooth optimization: conditioning, convergence, and semi-algebraic models Adrian Lewis ORIE Cornell International Congress of Mathematicians Seoul, August 2014 1/16 Outline I Optimization and inverse

More information

BORIS MORDUKHOVICH Wayne State University Detroit, MI 48202, USA. Talk given at the SPCOM Adelaide, Australia, February 2015

BORIS MORDUKHOVICH Wayne State University Detroit, MI 48202, USA. Talk given at the SPCOM Adelaide, Australia, February 2015 CODERIVATIVE CHARACTERIZATIONS OF MAXIMAL MONOTONICITY BORIS MORDUKHOVICH Wayne State University Detroit, MI 48202, USA Talk given at the SPCOM 2015 Adelaide, Australia, February 2015 Based on joint papers

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization Mathematical Programming manuscript No. (will be inserted by the editor) Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization Wei Bian Xiaojun Chen Yinyu Ye July

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

The proximal point method revisited

The proximal point method revisited The proimal point method revisited Dmitriy Drusvyatskiy Abstract In this short survey, I revisit the role of the proimal point method in large scale optimization. I focus on three recent eamples: a proimally

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

arxiv: v1 [math.oc] 7 Dec 2018

arxiv: v1 [math.oc] 7 Dec 2018 arxiv:1812.02878v1 [math.oc] 7 Dec 2018 Solving Non-Convex Non-Concave Min-Max Games Under Polyak- Lojasiewicz Condition Maziar Sanjabi, Meisam Razaviyayn, Jason D. Lee University of Southern California

More information

COR-OPT Seminar Reading List Sp 18

COR-OPT Seminar Reading List Sp 18 COR-OPT Seminar Reading List Sp 18 Damek Davis January 28, 2018 References [1] S. Tu, R. Boczar, M. Simchowitz, M. Soltanolkotabi, and B. Recht. Low-rank Solutions of Linear Matrix Equations via Procrustes

More information

Apolynomialtimeinteriorpointmethodforproblemswith nonconvex constraints

Apolynomialtimeinteriorpointmethodforproblemswith nonconvex constraints Apolynomialtimeinteriorpointmethodforproblemswith nonconvex constraints Oliver Hinder, Yinyu Ye Department of Management Science and Engineering Stanford University June 28, 2018 The problem I Consider

More information

How hard is this function to optimize?

How hard is this function to optimize? How hard is this function to optimize? John Duchi Based on joint work with Sabyasachi Chatterjee, John Lafferty, Yuancheng Zhu Stanford University West Coast Optimization Rumble October 2016 Problem minimize

More information

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Shuyang Ling Department of Mathematics, UC Davis Oct.18th, 2016 Shuyang Ling (UC Davis) 16w5136, Oaxaca, Mexico Oct.18th, 2016

More information

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Stochastic model-based minimization under high-order growth

Stochastic model-based minimization under high-order growth Stochastic model-based minimization under high-order growth Damek Davis Dmitriy Drusvyatskiy Kellie J. MacPhee Abstract Given a nonsmooth, nonconvex minimization problem, we consider algorithms that iteratively

More information

A semi-algebraic look at first-order methods

A semi-algebraic look at first-order methods splitting A semi-algebraic look at first-order Université de Toulouse / TSE Nesterov s 60th birthday, Les Houches, 2016 in large-scale first-order optimization splitting Start with a reasonable FOM (some

More information

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Yi Zhou Ohio State University Yingbin Liang Ohio State University Abstract zhou.1172@osu.edu liang.889@osu.edu The past

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

An introduction to complexity analysis for nonconvex optimization

An introduction to complexity analysis for nonconvex optimization An introduction to complexity analysis for nonconvex optimization Philippe Toint (with Coralia Cartis and Nick Gould) FUNDP University of Namur, Belgium Séminaire Résidentiel Interdisciplinaire, Saint

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Downloaded 09/27/13 to Redistribution subject to SIAM license or copyright; see

Downloaded 09/27/13 to Redistribution subject to SIAM license or copyright; see SIAM J. OPTIM. Vol. 23, No., pp. 256 267 c 203 Society for Industrial and Applied Mathematics TILT STABILITY, UNIFORM QUADRATIC GROWTH, AND STRONG METRIC REGULARITY OF THE SUBDIFFERENTIAL D. DRUSVYATSKIY

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS

ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS Mau Nam Nguyen (joint work with D. Giles and R. B. Rector) Fariborz Maseeh Department of Mathematics and Statistics Portland State

More information

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f

More information

An interior-point trust-region polynomial algorithm for convex programming

An interior-point trust-region polynomial algorithm for convex programming An interior-point trust-region polynomial algorithm for convex programming Ye LU and Ya-xiang YUAN Abstract. An interior-point trust-region algorithm is proposed for minimization of a convex quadratic

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

Gradient Methods Using Momentum and Memory

Gradient Methods Using Momentum and Memory Chapter 3 Gradient Methods Using Momentum and Memory The steepest descent method described in Chapter always steps in the negative gradient direction, which is orthogonal to the boundary of the level set

More information

Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity

Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity Benjamin Grimmer Abstract We generalize the classic convergence rate theory for subgradient methods to

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Convex Until Proven Guilty : Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions

Convex Until Proven Guilty : Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions Convex Until Proven Guilty : Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions Yair Carmon John C. Duchi Oliver Hinder Aaron Sidford Abstract We develop and analyze a variant of Nesterov

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière Hestenes Stiefel January 29, 2014 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes

More information

An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian

An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian C. Cartis, N. I. M. Gould and Ph. L. Toint 3 May 23 Abstract An example is presented where Newton

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Math 273a: Optimization Convex Conjugacy

Math 273a: Optimization Convex Conjugacy Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Sequential convex programming,: value function and convergence

Sequential convex programming,: value function and convergence Sequential convex programming,: value function and convergence Edouard Pauwels joint work with Jérôme Bolte Journées MODE Toulouse March 23 2016 1 / 16 Introduction Local search methods for finite dimensional

More information

Provable Non-Convex Min-Max Optimization

Provable Non-Convex Min-Max Optimization Provable Non-Convex Min-Max Optimization Mingrui Liu, Hassan Rafique, Qihang Lin, Tianbao Yang Department of Computer Science, The University of Iowa, Iowa City, IA, 52242 Department of Mathematics, The

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM JIAYI GUO AND A.S. LEWIS Abstract. The popular BFGS quasi-newton minimization algorithm under reasonable conditions converges globally on smooth

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

Lecture 25: Subgradient Method and Bundle Methods April 24

Lecture 25: Subgradient Method and Bundle Methods April 24 IE 51: Convex Optimization Spring 017, UIUC Lecture 5: Subgradient Method and Bundle Methods April 4 Instructor: Niao He Scribe: Shuanglong Wang Courtesy warning: hese notes do not necessarily cover everything

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Finite Convergence for Feasible Solution Sequence of Variational Inequality Problems

Finite Convergence for Feasible Solution Sequence of Variational Inequality Problems Mathematical and Computational Applications Article Finite Convergence for Feasible Solution Sequence of Variational Inequality Problems Wenling Zhao *, Ruyu Wang and Hongxiang Zhang School of Science,

More information

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India Chi Jin UC Berkeley Michael I. Jordan UC Berkeley Rong Ge Duke Univ. Sham M. Kakade U Washington Nonconvex optimization

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information