Composite nonlinear models at scale

Similar documents
Expanding the reach of optimal methods

Taylor-like models in nonsmooth optimization

An accelerated algorithm for minimizing convex compositions

Active sets, steepest descent, and smooth approximation of functions

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property

The nonsmooth landscape of phase retrieval

The nonsmooth landscape of phase retrieval

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

ON PROXIMAL POINT-TYPE ALGORITHMS FOR WEAKLY CONVEX FUNCTIONS AND THEIR CONNECTION TO THE BACKWARD EULER METHOD

Efficiency of minimizing compositions of convex functions and smooth maps

Inexact alternating projections on nonconvex sets

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Solving Corrupted Quadratic Equations, Provably

Tame variational analysis

arxiv: v1 [math.oc] 9 Oct 2018

6. Proximal gradient method

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Unified Approach to Proximal Algorithms using Bregman Distance

Geometric Descent Method for Convex Composite Minimization

Identifying Active Constraints via Partial Smoothness and Prox-Regularity

Second-Order Methods for Stochastic Optimization

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué

A Proximal Method for Identifying Active Manifolds

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

6. Proximal gradient method

Math 273a: Optimization Subgradient Methods

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.

Optimization methods

A user s guide to Lojasiewicz/KL inequalities

Nonsmooth optimization: conditioning, convergence, and semi-algebraic models

BORIS MORDUKHOVICH Wayne State University Detroit, MI 48202, USA. Talk given at the SPCOM Adelaide, Australia, February 2015

Convex Optimization Lecture 16

Fast proximal gradient methods

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Coordinate Descent and Ascent Methods

Accelerated primal-dual methods for linearly constrained convex problems

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization

Stochastic and online algorithms

The proximal point method revisited

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

arxiv: v1 [math.oc] 7 Dec 2018

COR-OPT Seminar Reading List Sp 18

Apolynomialtimeinteriorpointmethodforproblemswith nonconvex constraints

How hard is this function to optimize?

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Optimization methods

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Coordinate Update Algorithm Short Course Operator Splitting

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

ORIE 6326: Convex Optimization. Quasi-Newton Methods

Convex Optimization Algorithms for Machine Learning in 10 Slides

Worst Case Complexity of Direct Search

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Stochastic model-based minimization under high-order growth

A semi-algebraic look at first-order methods

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Dual Proximal Gradient Method

An introduction to complexity analysis for nonconvex optimization

arxiv: v1 [math.oc] 1 Jul 2016

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Downloaded 09/27/13 to Redistribution subject to SIAM license or copyright; see

A direct formulation for sparse PCA using semidefinite programming

Algorithms for Nonsmooth Optimization

ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

An interior-point trust-region polynomial algorithm for convex programming

Accelerated Proximal Gradient Methods for Convex Optimization

Gradient Methods Using Momentum and Memory

Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Convex Until Proven Guilty : Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions

Proximal Methods for Optimization with Spasity-inducing Norms

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Math 273a: Optimization Convex Conjugacy

Learning with stochastic proximal gradient

Sequential convex programming,: value function and convergence

Provable Non-Convex Min-Max Optimization

ECS289: Scalable Machine Learning

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

STA141C: Big Data & High Performance Statistical Computing

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

10. Unconstrained minimization

Lecture 25: Subgradient Method and Bundle Methods April 24

Higher-Order Methods

Finite Convergence for Feasible Solution Sequence of Variational Inequality Problems

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

Transcription:

Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW) Cornell ORIE 2017 AFOSR: FA9550-15-1-0237 NSF: DMS 1651851, CCF 1740551

Outline 1. Fast-gradient methods Complexity theory (review) New viewpoint: optimal quadratic averaging 2. Composite nonlinear models: F (x) = h(c(x)) Global complexity Regularity and local rapid convergence Illustration: phase retrieval 2/24

Notation Function f : R d R is α-convex and β-smooth if where q x f Q x Q x (y) = f(x) + f(x), y x + β y x 2 2 q x (y) = f(x) + f(x), y x + α y x 2 2 Q x f q x x 3/24

Notation Function f : R d R is α-convex and β-smooth if where q x f Q x Q x (y) = f(x) + f(x), y x + β y x 2 2 q x (y) = f(x) + f(x), y x + α y x 2 2 Q x f q x Condition number: x κ = β α 3/24

Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) 4/24

Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) Majorization view: x k+1 = argmin Q xk ( ) 4/24

Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) Majorization view: x k+1 = argmin Q xk ( ) Gradient Descent β-smooth β ɛ β-smooth & α-convex κ ln 1 ɛ Table: Iterations until f(x k ) f < ɛ 4/24

Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) Majorization view: x k+1 = argmin Q xk ( ) Gradient Descent Optimal Methods β-smooth β ɛ β ɛ β-smooth & α-convex κ ln 1 ɛ κ ln 1 ɛ Table: Iterations until f(x k ) f < ɛ 4/24

Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) Majorization view: x k+1 = argmin Q xk ( ) Gradient Descent Optimal Methods β-smooth β ɛ β ɛ β-smooth & α-convex κ ln 1 ɛ κ ln 1 ɛ Table: Iterations until f(x k ) f < ɛ (Nesterov 83, Yudin-Nemirovsky 83) 4/24

Complexity of first-order methods Gradient descent: x k+1 = x k 1 β f(x k) Majorization view: x k+1 = argmin Q xk ( ) Gradient Descent Optimal Methods β-smooth β ɛ β ɛ β-smooth & α-convex κ ln 1 ɛ κ ln 1 ɛ Table: Iterations until f(x k ) f < ɛ (Nesterov 83, Yudin-Nemirovsky 83) Optimal methods have downsides: Not intuitive Not naturally monotone Difficult to augment with memory 4/24

Optimal quadratic averaging 5/24

Optimal method by optimal averaging Idea: Use lower models of f instead. 6/24

Optimal method by optimal averaging Idea: Use lower models of f instead. Notation: x + = x 1 β f(x) and x++ = x 1 α f(x) 6/24

Optimal method by optimal averaging Idea: Use lower models of f instead. Notation: x + = x 1 β f(x) and x++ = x 1 α f(x) Convexity bound f q x in canonical form: ( ) f(y) f(x) f(x) 2 + α 2α 2 y x++ 2 6/24

Optimal method by optimal averaging Idea: Use lower models of f instead. Notation: x + = x 1 β f(x) and x++ = x 1 α f(x) Convexity bound f q x in canonical form: ( ) f(y) f(x) f(x) 2 + α 2α 2 y x++ 2 Lower models: Q A (x) = v A + α 2 x x A 2 Q B (x) = v B + α 2 x x B 2 6/24

Optimal method by optimal averaging Idea: Use lower models of f instead. Notation: x + = x 1 β f(x) and x++ = x 1 α f(x) Convexity bound f q x in canonical form: ( ) f(y) f(x) f(x) 2 + α 2α 2 y x++ 2 Lower models: Q A (x) = v A + α 2 x x A 2 Q B (x) = v B + α 2 x x B 2 = for any λ [0, 1] new lower-model Q λ := λq A + (1 λ)q B = v λ + α 2 x λ 2 6/24

Optimal method by optimal averaging Idea: Use lower models of f instead. Notation: x + = x 1 β f(x) and x++ = x 1 α f(x) Convexity bound f q x in canonical form: ( ) f(y) f(x) f(x) 2 + α 2α 2 y x++ 2 Lower models: Q A (x) = v A + α 2 x x A 2 Q B (x) = v B + α 2 x x B 2 = for any λ [0, 1] new lower-model Q λ := λq A + (1 λ)q B = v λ + α 2 x λ 2 Key observation: v λ f 6/24

Optimal method by optimal averaging The minimum v λ is maximized when ( 1 λ = proj [0,1] 2 + v ) A v B α x A x B 2. The quadratic Q λ is the optimal averaging of (Q A, Q B ). 7/24

Optimal method by optimal averaging The minimum v λ is maximized when ( 1 λ = proj [0,1] 2 + v ) A v B α x A x B 2. The quadratic Q λ is the optimal averaging of (Q A, Q B ). 7/24

Optimal method by optimal averaging The minimum v λ is maximized when ( 1 λ = proj [0,1] 2 + v ) A v B α x A x B 2. The quadratic Q λ is the optimal averaging of (Q A, Q B ). Related to cutting plane, bundle methods. 7/24

Optimal method by optimal averaging for k = 1,..., K do Set Q(x) = (f(x k ) f(x k) 2 end ) 2α + α 2 x x ++ k 2 Let Q k (x) = v k + α 2 x c k 2 be optim. average of (Q, Q k 1 ). Set x k+1 = line_search ( c k, x + k Algorithm: Optimal averaging ) 8/24

Optimal method by optimal averaging for k = 1,..., K do Set Q(x) = (f(x k ) f(x k) 2 end ) 2α + α 2 x x ++ k 2 Let Q k (x) = v k + α 2 x c k 2 be optim. average of (Q, Q k 1 ). Set x k+1 = line_search ( c k, x + k Algorithm: Optimal averaging equivalent to geometric descent (Bubeck-Lee-Singh 15) ) 8/24

Optimal method by optimal averaging for k = 1,..., K do Set Q(x) = (f(x k ) f(x k) 2 end ) 2α + α 2 x x ++ k 2 Let Q k (x) = v k + α 2 x c k 2 be optim. average of (Q, Q k 1 ). Set x k+1 = line_search ( c k, x + k Algorithm: Optimal averaging equivalent to geometric descent (Bubeck-Lee-Singh 15) Optimal Rate (Bubeck-Lee-Singh 15, D-Fazel-Roy 16): f(x + k ) v k ɛ after O β α ln 1 ɛ iterations. ) 8/24

Optimal method by optimal averaging for k = 1,..., K do Set Q(x) = (f(x k ) f(x k) 2 end ) 2α + α 2 x x ++ k 2 Let Q k (x) = v k + α 2 x c k 2 be optim. average of (Q, Q k 1 ). Set x k+1 = line_search ( c k, x + k Algorithm: Optimal averaging equivalent to geometric descent (Bubeck-Lee-Singh 15) Optimal Rate (Bubeck-Lee-Singh 15, D-Fazel-Roy 16): f(x + k ) v k ɛ after O β α ln 1 ɛ iterations. Intuitive Monotone in f(x + k ) and in v k. Memory by optimally averaging (Q, Q k 1,..., Q k t ). ) 8/24

Optimal method by optimal averaging Figure: Logistic regression with regularization α = 10 4. 9/24

Optimal method by optimal averaging Figure: Logistic regression with regularization α = 10 4. proximal extensions (Chen-Ma 16) underestimate sequences (Ma et al. 17) 9/24

Composite nonlinear models 10/24

Nonsmooth & Nonconvex minimization Convex composition: min x F (x) = h(c(x)) 11/24

Nonsmooth & Nonconvex minimization Convex composition: min x F (x) = h(c(x)) where h: R m R is convex and 1-Lipschitz. c: R d R m is C 1 -smooth and c is β-lipschitz. 11/24

Nonsmooth & Nonconvex minimization Convex composition: min x F (x) = h(c(x)) where h: R m R is convex and 1-Lipschitz. c: R d R m is C 1 -smooth and c is β-lipschitz. (Burke 85, Cartis-Gould-Toint 11, Fletcher 82, Lewis-Wright 15, Nesterov 06, Powell 84, Wright 90, Yuan 83,... ) 11/24

Examples min x h(c(x)) Examples: Robust Phase Retrieval: min (Ax) 2 b i 1 x Robust PCA: min XY D 1 X R d r,y R r k Nonneg. Factorization: min X,Y 0 XY D 12/24

Prox-linear algorithm min x F (x) = h ( c(x) ) 13/24

Prox-linear algorithm min x F (x) = h ( c(x) ) Local Model: ( ) F x (y) := h c(x) + c(x)(y x) 13/24

Prox-linear algorithm min x F (x) = h ( c(x) ) Local Model: ( ) F x (y) := h c(x) + c(x)(y x) Accuracy: F x (y) F (y) β 2 y x 2 x, y 13/24

Prox-linear algorithm min x F (x) = h ( c(x) ) Local Model: ( ) F x (y) := h c(x) + c(x)(y x) Accuracy: F x (y) F (y) β 2 y x 2 x, y Prox-linear method (Burke, Fletcher, Nesterov, Powell,... ): x + = argmin y {F x (y) + β 2 y x 2} 13/24

Prox-linear algorithm min x F (x) = h ( c(x) ) Local Model: ( ) F x (y) := h c(x) + c(x)(y x) Accuracy: F x (y) F (y) β 2 y x 2 x, y Prox-linear method (Burke, Fletcher, Nesterov, Powell,... ): x + = argmin y {F x (y) + β 2 y x 2} Big assumption: x + is computable (for now) 13/24

0.75 0.5 Figure: f(x) = x 2 1 14/24

0.75 f x 0.5 Figure: f(x) = x 2 1 14/24

f x + (x 0.5) 2 0.75 0.5 Figure: f(x) = x 2 1 14/24

f x + (x 0.5) 2 0.75 0.5 Figure: f(x) = x 2 1 14/24

f x + (x 0.5) 2 0.75 0.5 Figure: f(x) = x 2 1 No finite termination 14/24

Sublinear rate Prox-gradient: G(x) := β(x x + ) 15/24

Sublinear rate Prox-gradient: G(x) := β(x x + ) Philosophy (Nesterov 13): G(x) F (x) 15/24

Sublinear rate Prox-gradient: G(x) := β(x x + ) Philosophy (Nesterov 13): G(x) F (x) Thm: (D-Paquette 16) Define the Moreau envelope F t (x) := inf {F (y) + t } y 2 y x 2. Then F 2β is smooth with G(x) F 2β (x) 15/24

Sublinear rate Prox-gradient: G(x) := β(x x + ) Philosophy (Nesterov 13): G(x) F (x) Thm: (D-Paquette 16) Define the Moreau envelope F t (x) := inf {F (y) + t } y 2 y x 2. Then F 2β is smooth with G(x) F 2β (x) Iterations Basic Operations β ε 2 (F (x 0 ) F ) β c ε 3 (F (x 0 ) F ) Likely optimal (Carmon, Duchi, Hinder, Sidford 17) 15/24

Two regularity conditions Fix x S := {x : 0 F (x)}. 1) Tilt-stability: (Poliquin-Rockafellar 98) v argmin x B r( x) 1 {F (x) v, x } is α-lipschitz near v = 0 16/24

Two regularity conditions Fix x S := {x : 0 F (x)}. 1) Tilt-stability: (Poliquin-Rockafellar 98) v argmin x B r( x) 1 {F (x) v, x } is α-lipschitz near v = 0 2) Sharpness: (Burke-Ferris 93) F (x) F ( x) α dist(x, S) for x B r ( x) 16/24

Two regularity conditions Fix x S := {x : 0 F (x)}. 1) Tilt-stability: (Poliquin-Rockafellar 98) v argmin x B r( x) 1 {F (x) v, x } is α-lipschitz near v = 0 2) Sharpness: (Burke-Ferris 93) F (x) F ( x) α dist(x, S) for x B r ( x) Convergence rates: Regularity Tilt-stability Guarantee F (x k+1 ) F F (x k ) F 1 α β Sharpness x k+1 x O( x k x 2 ) (Nesterov 06, D-Lewis 15) 16/24

Illustration: phase retrieval 17/24

Example: Robust phase retrieval Problem: Find x R d satisfying (a T i x) 2 b i for a 1,..., a m R d and b 1,..., b m R. 18/24

Example: Robust phase retrieval Problem: Find x R d satisfying (a T i x) 2 b i for a 1,..., a m R d and b 1,..., b m R. Composite formulation: min x F (x) := 1 m (Ax)2 b 1 18/24

Example: Robust phase retrieval Problem: Find x R d satisfying (a T i x) 2 b i for a 1,..., a m R d and b 1,..., b m R. Composite formulation: min x F (x) := 1 m (Ax)2 b 1 Assume a i N(0, I) independently and b = (A x) 2. 18/24

Example: Robust phase retrieval Problem: Find x R d satisfying (a T i x) 2 b i for a 1,..., a m R d and b 1,..., b m R. Composite formulation: min x F (x) := 1 m (Ax)2 b 1 Assume a i N(0, I) independently and b = (A x) 2. Two key consequences: constants β, α > 0 such that w.h.p. Approximation: (Duchi-Ruan 17) F (y) F x (y) β 2 y x 2 2 Sharpness: (Eldar-Mendelson 14) 1 m (Ax)2 (Ay) 2 1 α x y 2 x + y 2. 18/24

Intuition F approximates the population objective: FP (x) = Ea [(at x)2 (at x )2 ] 19/24

Intuition F approximates the population objective: FP (x) = Ea [(at x)2 (at x )2 ] = conv_func λ(xxt x x T ). 19/24

Intuition F approximates the population objective: FP (x) = Ea [(at x)2 (at x )2 ] 19/24

Intuition F approximates the population objective: FP (x) = Ea [(at x)2 (at x )2 ] 1.0 0.5 0.0-0.5-1.0-1.0-0.5 0.0 0.5 1.0 Figure: Contour plot of x 7 k FP (x)k. 19/24

Stationary point consistency Thm: (Davis-D-Paquette 17) Whenever m Cd, all stationary x satisfy ( x x x x + x x 3 4 d m or x x c 4 d m 1 + x x x, x x x 4 d m x x with high probability. ), 20/24

Prox-linear and subgradient methods Prox-linear method: x + = argmin y F x (y) + β 2 y x 2 21/24

Prox-linear and subgradient methods Prox-linear method: x + = argmin y F x (y) + β 2 y x 2 Thm: There exists R > 0 such that if x 0 x x R, then w.h.p. prox-linear iterates converge quadratically to x (Duchi-Ruan 17) 21/24

Prox-linear and subgradient methods Prox-linear method: x + = argmin y Polyak F x (y) + β 2 y x 2 subgradient: ( ) x + = x F (x) inf F F (x) F (x) 2 Thm: There exists R > 0 such that if x 0 x x R, then w.h.p. prox-linear iterates converge quadratically to x (Duchi-Ruan 17) 21/24

Prox-linear and subgradient methods Prox-linear method: x + = argmin y Polyak F x (y) + β 2 y x 2 subgradient: ( ) x + = x F (x) inf F F (x) F (x) 2 Thm: There exists R > 0 such that if x 0 x x R, then w.h.p. prox-linear iterates converge quadratically to x (Duchi-Ruan 17) subgradient iterates converge to x at constant linear rate (Davis-D-Paquette 17) 21/24

Prox-linear and subgradient methods Prox-linear method: x + = argmin y Polyak F x (y) + β 2 y x 2 subgradient: ( ) x + = x F (x) inf F F (x) F (x) 2 Thm: There exists R > 0 such that if x 0 x x R, then w.h.p. prox-linear iterates converge quadratically to x (Duchi-Ruan 17) subgradient iterates converge to x at constant linear rate (Davis-D-Paquette 17) Spectral initialization (Wang at al. 16), (Candès et al. 15). 21/24

Prox-linear and subgradient methods Prox-linear method: x + = argmin y Polyak F x (y) + β 2 y x 2 subgradient: ( ) x + = x F (x) inf F F (x) F (x) 2 Thm: There exists R > 0 such that if x 0 x x R, then w.h.p. prox-linear iterates converge quadratically to x (Duchi-Ruan 17) subgradient iterates converge to x at constant linear rate (Davis-D-Paquette 17) Spectral initialization (Wang at al. 16), (Candès et al. 15). Convex approach: (Candès-Strohmer-Voroniski 13) Other nonconvex approaches: (Candès-Li-Soltanolkotabi 15), (Tan-Vershynin 17), (Wang-Giannakis-Eldar 16), (Zhang-Chi-Liang 16), (Sun-Qu-Wright 17), Nonconvex subgradient methods: (Davis-Grimmer 17) 21/24

Figure: (d, m) (2 9, 2 11 ). 22/24

Figure: (d, m) (2 9, 2 11 ). 22/24

10 1 10 2 xk x / x 10 3 10 4 10 5 10 6 10 7 0 10 20 30 40 50 60 70 Iteration k 22/24

Figure: (d, m) (2 22, 2 24 ) (left) and (d, m) (2 24, 2 25 ) (right). 23/24

Figure: iterates vs. x k x / x. 23/24

Summary: optimal quadratic averaging composite nonlinear models: complexity and regularity illustration: phase retrieval References: 1. The nonsmooth lanscape of phase retrieval, Davis-D-Paquette, arxiv:1711.03247, 2017. 2. Error bounds, quadratic growth, and linear convergence of proximal methods, D-Lewis, Math. Oper. Res., 2017. 3. Efficiency of minimizing compositions of convex functions and smooth maps, D-Paquette, arxiv:1605.00125, 2016. 4. An optimal first order method based on optimal quadratic averaging, D-Fazel-Roy, SIAM J. Optim., 2016. 24/24

Summary: optimal quadratic averaging composite nonlinear models: complexity and regularity illustration: phase retrieval References: 1. The nonsmooth lanscape of phase retrieval, Davis-D-Paquette, arxiv:1711.03247, 2017. 2. Error bounds, quadratic growth, and linear convergence of proximal methods, D-Lewis, Math. Oper. Res., 2017. 3. Efficiency of minimizing compositions of convex functions and smooth maps, D-Paquette, arxiv:1605.00125, 2016. 4. An optimal first order method based on optimal quadratic averaging, D-Fazel-Roy, SIAM J. Optim., 2016. Thank you! 24/24