Convex Stochastic and Large-Scale Deterministic Programming via Robust Stochastic Approximation and its Extensions

Size: px
Start display at page:

Download "Convex Stochastic and Large-Scale Deterministic Programming via Robust Stochastic Approximation and its Extensions"

Transcription

1 Convex Stochastic and Large-Scale Deterministic Programming via Robust Stochastic Approximation and its Extensions Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research with Anatoli Juditsky, Guanghui Lan, and Alexander Shapiro : Joseph Fourier University, Grenoble, France; : ISyE, University of Florida; : ISyE, Georgia Tech BFG 09, Leuven September 14-18, 2009

2 Overview Convex Stochastic Minimization and Convex-Concave Stochastic Saddle Point problems Classical Stochastic Approximation Robust Stochastic Approximation & Stochastic Mirror Prox construction efficiency estimates how it works Acceleration via Randomization Matrix Game & l 1 -minimization with uniform fit l 1 -minimization with Least Squares fit Minimizing maxima of convex polynomials over a simplex

3 Convex Stochastic Minimization Problem The basic Convex Stochastic Minimization problem is min f (x) x X (SM) X R n : convex compact set f (x) : X R: convex and Lipschitz continuous objective given by a Stochastic Oracle. At i-th call to the oracle, the input being x X, the oracle returns stochastic gradient G(x, ξ i ) R n of f, with independent ξ i P, i = 1, 2,... and E ξ P {G(x, ξ)} f (x) x X. In many cases (but not always!) G(x, ξ) x F(x, ξ), where F(x, ξ) is convex in x. Here (SM) is the problem of minimizing the expectation f (x) = E{F(x, ξ)} of a convex in x integrant F(x, ξ) given by a first order oracle.

4 Convex-Concave Stochastic Saddle Point Problem Convex Stochastic Minimization problem is a particular case of Convex-Concave Stochastic Saddle Point problem min max x X y Y f (x, y) (SP) Z = X Y R n = R nx R ny R n : convex compact set f (x, y) : X Y R: convex in x X and concave in y Y Lipschitz continuous function given by a Stochastic Oracle. At i-th call to the oracle, the input being z = (x, y) Z, Stochastic Oracle returns stochastic derivative G(z, ξ i ) = [G x (z, ξ i ); G y (z, ξ i )] R nx R ny of f, with independent ξ i P i = 1, 2,... and F(x, y) := E ξ P {G(x, y, ξ)} x f (x, y) y ( f (x, y))

5 Classical Stochastic Approximation min f (x) x X min f (x, y) (SP) max x X y Y (SM) The very first method for solving (SM) is the classical Stochastic Approximation (Robbins & Monro 51) which mimics the usual Subgradient Descent: { x t+1 = argmin 1 u X 2 u x t γ t G(x t, ξ t ), u }, γ x t+1 = closest to x t γ t G(x t, ξ t ) point of X t = a t When f is strongly convex on X, SA with properly chosen a ensures that E{ x t argmin x f 2 2 } O(1/t). If, in addition, argmin X f intx and f is Lipschitz continuous, then also E{f (x t ) min x f } O(1/t). Similar construction, with similar results, works for (SP) with strongly convex-concave f.

6 Classical Stochastic Approximation (continued) Good news on Classical SA: with smooth strongly convex f attaining its minimum in intx, the rate of convergence E{f (x t ) min x f } O(1/t) is the best allowed under circumstances by Laws of Statistics. Bad news on Classical SA: it requires strong convexity of the objective f and is highly sensitive to the scale factor a in the stepsize rule γ t = a t. To choose a properly, we need to know the modulus of strong convexity of f, and overestimating this modulus may result in dramatic slowing down the convergence. The Stochastic Optimization community commonly treats SA as a highly unstable and impractical optimization technique.

7 Robust Stochastic Approximation x t+1 = argmin u X { 1 2 u x t γ t G(x t, ξ t ), u }, γ t = a t (CSA) It was discovered long ago [Nem.&Yudin 79] that SA admits a robust version which does not require strong convexity of the objective in (SM). To make SA robust, it suffices to replace short steps γ t = a t with long steps γ t = a t, and [ t ] 1 to add to (CSA) the averaging x t = τ=1 γ t τ τ=1 γ τ x τ and to treat, as approximate solutions, the averages x t rather than the search points x t. Another important improvement (in its rudimentary form going back to [Nem.&Yudin 79]) is to replace the quadratic prox term 1 2 u x t 2 2 in (CSA) with a more general prox term, which allows to adjust, to some extent, the method to the geometry of the problem of interest.

8 Robust Stochastic Approximation [Jud.&Lan&Nem.&Shap. 09] min max x X y Y f (x, y) G(x, y, ξ) : E ξ {G(x, y, ξ)} x f (x, y) y [ f (x, y)] Setup for RSA: a norm on R n Z = X Y and a convex continuous distance generating function ω(z) : Z R such that the set Z o = {z Z : ω(z) } is convex, and ω Z o is C 1 and strongly convex: z, z Z o : ω (z) ω (z ), z z α z z 2 [α > 0] Robust Stochastic Approximation: z 1 = z ω := argmin Z ω( ) z t+1 = argmin {[ω(u) ω (z t ), u ] + γ t G(z t, ξ t ), u } z t u Z [ t ] 1 = τ=1 γ t τ τ=1 γ τ z τ

9 Stochastic Mirror Prox [Jud.&Nem.&Tauvel 08] min max x X y Y f (x, y) G(x, y, ξ) : E ξ {G(x, y, ξ)} x f (x, y) y [ f (x, y)] Stochastic Mirror Prox algorithm: z 1 = z ω := argmin Z ω( ) w t = argmin {[ω(u) ω (z t ), u ] + γ t G(z t, ξ 2t 1 ), u } u Z z t+1 = argmin {[ω(u) ω (z t ), u ] + γ t G(w t, ξ 2t ), u } z t u Z [ t ] 1 = τ=1 γ t τ τ=1 γ τ w τ

10 Saddle Point accuracy measure min max x X y Y f (x, y) (SP) Accuracy measure for (SM): ErrSad(x, y) = max y Y f (x, y ) min x X f (x, y). Explanation: (SP) gives rise to two optimization programs with equal optimal values: Primal: Dual: min x X max y Y f (x), f (x) = maxf (x, y) y Y f (x, y) f (y), f (y) = min x X ErrSad(x, y) is the duality gap the sum of non-optimalities of x and y as approximate solutions to the respective problems.

11 RSA & SMP: Efficiency Estimates min max x X y Y f (x, y) G(x, y, ξ) : E ξ {G(x, y, ξ)} x f (x, y) y [ f (x, y)] (SP) Theorem: Assume that E{ G(z, ξ) 2 } M 2 z Z = X Y, and let N 1. N-step RSA with properly chosen constant steps γ t = γ, 1 t N, ensures that E { ErrSad(z N ) } O(1)ΘM/ N [ Θ = 2 maxz Z [ω(z) ω(z ω ) ω (z ω ), z z ω ]/α ] With light tail G(z, ξ) /M, probabilities of large deviations in RSA admit exponential bounds: E { exp{ G(z, ξ) 2 /M 2 } } { exp{1} z Z Ω > 1 : Prob ErrSad(z N ) > O(1)ΩΘM/ } N 2 exp{ Ω} Similar results hold true for SMP.

12 Good Geometry case Let rint Z rint Z +, where Z + is the direct product of K basic blocks : K b unit Euclidean balls B i E i = R n i ; K s spectahedrons S j F j ; F j : spaces of symmetric k j k j matrices of a given block-diagonal structure with the Frobenius inner product S j : the set of all positive semidefinite matrices from F j with unit trace Good setup for RSA/SMP is given by Kb the norm z = i=1 zi K s j=1 z j 2 tr z i : ball components of z z j : spectahedron components of z tr : trace norm of a symmetric matrix. the dgf ω(z) = K b i=1 1 2 zi K s j=1 Entropy(z j) Entropy(u) = l λ l(u) ln λ l (u), λ l (u) being eigenvalues of a symmetric matrix u.

13 Good Geometry case (continued) With the above setup, we have Θ = O(1) K b + K s j=1 ln k j E{ErrSad(z N )} O(1)[K b + K s j=1 ln k j] 1/2 M/ N Good news: When K = O(1), the rate of convergence is nearly dimension-independent and, moreover, nearly the best allowed under circumstances by Laws of Statistics The computational cost of a step is dominated [ by the costs of a call to SO and a prox-step g argmin z Z g T z + ω(z) ] When Z = Z +, the cost( C of a prox-step is C = O(1) i dim (E i) + ) j C j C j : the cost of eigenvalue decomposition of A F j When Z is simple product of K balls and simplexes we have C = O(1)dim Z.

14 How it works: Convex Stochastic Programming min f (x) := E ξ P{F(x, ξ)} x X R n G(x, ξ) x F(x, ξ), E ξ P { G(x, ξ) 2 } M 2 (SM) Till now, the working horse of Convex Stochastic Minimization is the Sample Average Approximation method. In SAA, problem (SM) is replaced with its empirical counterpart min x fn (x), f N (x) = 1 N N i=1 F(x, ξ i) [ξ 1,..., ξ N : sample drawn from P] In the Good Geometry case, the SAA sample size N resulting, with probability close to 1, in ɛ-solution to (SM) is O(1)n(M/ɛ) 2 [Shap. 03], which is by factor O(n) worse than the number of steps of RSA/SMP yielding a solution of the same quality.

15 How it works (continued) min f (x) := E ξ P{F (x, ξ)} x X R n min fn (x) = 1 N x X N i=1 f (x, ξ i) (SM) (SAA) A single call to the first order oracle for f N is as costly as N calls to the SO. RSA and SMP are much cheaper computationally than SAA. When Z is simple, the overall computational effort in the N-step RSA/SMP is of the same order as the effort to compute a single subgradient of f N ( )! Our experimental results consistently demonstrate that while the solutions built by RSA and SAA with a common sample size N are of nearly the same quality, RSA by orders of magnitude outperforms SAA in terms of running time.

16 How it works: Typical experiment f = min x { f (x) = Eξ N (a,in){φ(ξ T x)} : x 0, i x 1 = 1 }, a = 1 φ( ) : piecewise linear convex loss function n = 1000, f = n = 5000, f = Method f (z 4000 ) : f (z 4000 ) f CPU f (z 4000 ) : f (z 4000 ) f CPU SAA : : N-RSA : : E-RSA : : N-RSA: = 1, ω(x) = i x i ln x i E{f (z N ) f } O(1) ln n N E-RSA: = 2, ω(x) = 1 2 x T x E{f (z N ) f } O(1) n N

17 SMP vs RSA min max x X y Y f (x, y) (SP) F(x, y) := E ξ {G(x, y, ξ)} x f (x, y) y [ f (x, y)] Question: What, if any, are the advantages of SMP as compared to RSA? Answer: SMP can utilize the fact that some components of F(x, y) are regular and well-observable. Assumption A: F(z) F(z ) L z z + σ z, z Z = X Y E ξ { G(z, ξ) F(z) 2 } σ 2 Fact: Under A, the quantity M 2 = sup z Z E ξ { G(z, ξ) 2 } can be as large as O(1)(ΘL + σ) 2 The efficiency of RSA cannot be better than E { ErrSad(z N RSA )} O(1)Θ ΘL + σ N even when σ = 0, i.e., f C 1,1 (L), no noise at all!

18 SMP vs RSA (continued) min max x X y Y f (x, y) F(x, y) := E ξ {G(x, y, ξ)} x f (x, y) y [ f (x, y)] F(z) F(z ) L x y + σ, E ξ { G(z, ξ) F(z) 2 } σ 2 Theorem: N-step SMP with properly chosen constant stepsizes ensures that E { [ ] ErrSad(zSMP N )} O(1)Θ ΘL N + σ (!) { N When E ξ exp{ G(z, ξ) F(z) 2 /σ 2 } } exp{1}, for all Ω > 0 one has [ [ ] ]} Prob {ErrSad(zSMP N ) > O(1) Θ ΘL N + σ + ΩΘσ N N exp{ Ω 2 /3} + exp{ ΩN} Good news: When Z is the product of O(1) high-dimensional balls, (!) is the best efficiency estimate achievable under circumstances.

19 Applying SMP: Stochastic Feasibility problem We want to solve a feasible system of simple and difficult convex constraints: S µ (x) 0, 1 µ p D ν (x) 0, 1 ν q x X = {x R n : x 2 1} S µ are given by precise First Order oracle and are smooth: S µ(x) 2 L µ, S µ(x) S µ(x ) 2 ln(p + q)l µ x x 2. D ν are given by Stochastic Oracle. At i-th call to the SO, x X being the input, the SO returns unbiased estimates h ν (x, ξ i ), H ν (x, ξ i ) of D ν (x), D ν(x), with independent ξ i P and E ξ {max ν [ h 2 ν (x,ξ) M 2 ν ]} 1, max ν E ξ { Hν(x,ξ) 2 2 M 2 ν } ln(p + q).

20 Stochastic Feasibility problem (continued) S µ (x) 0, 1 µ p D ν (x) 0, 1 ν q x X = {x R n : x 2 1} S ν : S µ L µ, S µ(x) S µ(x ) 2 ln(p } + q)l µ x x 2 D ν (x) = E ξ {h ν (x, ξ)}, E ξ {max h 2 ν ν(x, ξ)/mν 2 1 D ν(x) = E ξ {H ν (x, ξ)}, max ν E ξ { H(x, ξ) 2 2 /M 2 ν } ln(p + q) Properly applying N-step SMP to the Saddle Point problem min x X [ max p y 0, i y i =1 µ=1 y µα µ S µ (x) + we build a solution x N X such that q ν=1 ] y p+ν α p+ν D ν (x) S µ (x N ) ln(p + q) Lν N ζ, D ν(x N ) ln(p + q) Mν N ζ ζ 0, E{ζ} O(1).

21 Acceleration via Randomization When solving large-scale well-structured convex optimization problems, e.g., the l 1 minimization problem min x { x 1 : Ax b δ} with large-scale and possibly dense matrix A, polynomial time methods, and thus generating high accuracy solutions, become prohibitively time consuming; when medium accuracy solutions are sought, our best choice is computationally cheap first order methods. There are interesting and important cases when first order methods can be accelerated significantly by randomization passing from computationally demanding deterministic first order oracles to their computationally cheap stochastic counterparts.

22 Example 1: Matrix Game min max f (x, y) := y T Ax, k = {u R k + : i u i = 1} x n y m F(x, y) := [f x(x, y); f y(x, y)] = [A T y; Ax] A := max A ij i,j (MG) When A is large and dense, solving (MG) by polynomial time algorithms is prohibitively time consuming. The best deterministic First Order algorithms find ɛ-solution to (MG) in O(1) ln(m + n)(a/ɛ) steps, with O(1) computations of F( ) per step. The total effort is O(1) ln(m + n)mn(a/ɛ) a.o. Matrix-vector multiplications m y A T y, n x Ax required to compute F(x, y) are easy to randomize: to build an unbiased estimate of Bu = i u jb j, draw at random a column index j: Prob{j = j} = u j u 1, and return u 1 sign(u j )B j.

23 Matrix Game (continued) min x n max y m y T Ax, A := max A ij (MG) i,j Equipping (MG) with Stochastic Oracle and applying RSA, we get, with confidence 1 β, an ɛ solution to (MG) in O(1) ln(mn/β)(a/ɛ) 2 steps. A step reduces to extracting from A randomly chosen row and column plus O(m + n) a.o. overhead. ɛ-solutions costs O(1) ln(mn/β)(m + n)(a/ɛ) 2 a.o. With ɛ, A fixed and m = O(n) large, RSA outperforms by order of magnitudes the best known deterministic alternatives exhibits sublinear time behaviour: ɛ-solution is found by just O(1) ln(mn/β)(m + n)(a/ɛ) 2 of the total of mn data entries, cf. Grigoriadis & Khachiyan 95.

24 Application: l 1 minimization with uniform fit Numerous sparsity-oriented Signal Processing problems reduce to (small series of) problems min u { Au b p : u 1 1} [A : m n] (l 1 ) When p =, (l 1 ) reduces to Matrix Game min max [p q] T [ A b1 T, A b1 T ] [u; v] x=[u;v] 2n y=[p;q] 2m RSA can be used to solve l 1 -minimization problems. When m, n are really large (like 10 4 and more) and A is a general-type dense analytically given matrix, (l 1 ) becomes prohibitively difficult for all known deterministic algorithms, and randomized algorithms become a kind of "last resort".

25 How it works Opt = min u { Au b : u 1 1} [ A: dense analytically given m n matrix b: Au b δ = 1.e-3 with 16-sparse u, u 1 = 1 Errors CPU m n Method u ũ 1 u ũ 2 u ũ sec Mult DMP RSA DMP RSA DMP: Deterministic Mirror Prox ũ: resulting solution Last column: (equivalent) # of matrix-vector multiplications. ]

26 How it works (continued) x l 1 -recovery: Blue: true signal, Cyan: MDSA

27 Example 2: l 1 -minimization with Least Squares fit The Least Squares l 1 minimization problem min u { Au b 2 : u 1 1} [A : m n] reduces to the Saddle Point problem min max y T [Ax b] [F(x, y) = [A T y; b Ax]] y 2 1 x n Randomizing matrix-vector multiplications and applying RSA, we get, with confidence 1 β, an ɛ-solution with the overall effort of O(1) ln(mn/β)( mc 2 /ɛ) 2 a.o. ( ) C 2 : maximum of 2 -norms of columns in C = [A, b] Bad news: Factor m in ( ) reflecting high potential noisiness of the unbiased estimate (y, y 2 1) y 1 sign(y ı )[A ı,: ] T of A T y: it may happen that y 1 sign(y ı )[A ı,: ] T mc 2

28 l 1 -minimization with Least Squares fit (continued) min max y T [Ax b] y 2 1 x n [F(x, y) = [A T y; b Ax]] Remedy: Randomized preprocessing [A, b] UD[A, b] with orthogonal U, U ij O(m 1/2 ) and random diagonal D with iid D ii Uniform{ 1; 1} results in an equivalent problem, preserves C 2 and reduces noise in the estimate of A T y: now, with confidence 1 β, y 2 1 y 1 sign(y ı )[A ı,: ] T O(1) ln(mn/β)c 2 After preprocessing, RSA finds, with confidence 1 β, an ɛ-solution to the problem at the cost of O(1) ln 2 (mn/β)(c 2 /ɛ) 2 a.o. With properly chosen U (e.g., Hadamard or DFT matrix), the preprocessing costs just O(1)mn ln(m) a.o.

29 Example 3: Minimizing the maximum of convex polynomials over a simplex/l 1 -ball Consider the optimization problem min max p k(x), p k (x) = d l=0 A kl[x,..., x] ( ) x n 1 k m A kl [x 1,..., x l ]: symmetric l-linear forms; p k ( ): convex. We can reduce ( ) to the Saddle Point problem min k y jp k (x) max x n y m and randomize in the aforementioned manner computation of p k ( ), p k ( ): To build an unbiased estimates p k (x), P k (x) of p k (x) and P k (x) = p k (x), x n, we generate at random indices j 1,..., j d, Prob{j = j} = x j, and set p k = d l=0 A kl[e j1,..., e jl ] [ P k ] j = d l=1 la kl[e j1,..., e jl 1, e j ] 1 j n.

30 Minimizing maximum of convex polynomials (continued) min max d l=0 A kl[x,..., x] ( ) x n 1 k m A: maximum magnitude of coefficients in A kl [,..., ] Applying RSA, we find, with confidence 1 β, an ɛ-solution to ( ) at the cost of O(1) ln(mn/β)d 2 (A/ɛ) 2 steps, with a step reducing to extracting O(1)d(m + n) coefficients of the forms A kl [,,..., ], given the addresses k, l, j 1,..., j l of the coefficients, plus O(m + dn)-a.o. overhead.

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

Saddle Point Algorithms for Large-Scale Well-Structured Convex Optimization

Saddle Point Algorithms for Large-Scale Well-Structured Convex Optimization Saddle Point Algorithms for Large-Scale Well-Structured Convex Optimization Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

Gradient Sliding for Composite Optimization

Gradient Sliding for Composite Optimization Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this

More information

Difficult Geometry Convex Programs and Conditional Gradient Algorithms

Difficult Geometry Convex Programs and Conditional Gradient Algorithms Difficult Geometry Convex Programs and Conditional Gradient Algorithms Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research with

More information

Convex Optimization Under Inexact First-order Information. Guanghui Lan

Convex Optimization Under Inexact First-order Information. Guanghui Lan Convex Optimization Under Inexact First-order Information A Thesis Presented to The Academic Faculty by Guanghui Lan In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy H. Milton

More information

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Noname manuscript No. (will be inserted by the editor) Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Saeed Ghadimi Guanghui Lan Hongchao Zhang the date of

More information

Efficient Methods for Stochastic Composite Optimization

Efficient Methods for Stochastic Composite Optimization Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June

More information

1 First Order Methods for Nonsmooth Convex Large-Scale Optimization, I: General Purpose Methods

1 First Order Methods for Nonsmooth Convex Large-Scale Optimization, I: General Purpose Methods 1 First Order Methods for Nonsmooth Convex Large-Scale Optimization, I: General Purpose Methods Anatoli Juditsky Anatoli.Juditsky@imag.fr Laboratoire Jean Kuntzmann, Université J. Fourier B. P. 53 38041

More information

Stochastic Semi-Proximal Mirror-Prox

Stochastic Semi-Proximal Mirror-Prox Stochastic Semi-Proximal Mirror-Prox Niao He Georgia Institute of echnology nhe6@gatech.edu Zaid Harchaoui NYU, Inria firstname.lastname@nyu.edu Abstract We present a direct extension of the Semi-Proximal

More information

6 First-Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem s Structure

6 First-Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem s Structure 6 First-Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem s Structure Anatoli Juditsky Anatoli.Juditsky@imag.fr Laboratoire Jean Kuntzmann, Université J. Fourier B. P.

More information

ALGORITHMS FOR STOCHASTIC OPTIMIZATION WITH EXPECTATION CONSTRAINTS

ALGORITHMS FOR STOCHASTIC OPTIMIZATION WITH EXPECTATION CONSTRAINTS ALGORITHMS FOR STOCHASTIC OPTIMIZATIO WITH EXPECTATIO COSTRAITS GUAGHUI LA AD ZHIIAG ZHOU Abstract This paper considers the problem of minimizing an expectation function over a closed convex set, coupled

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

1. Introduction. In this paper we first consider the following stochastic optimization

1. Introduction. In this paper we first consider the following stochastic optimization SIAM J. OPTIM. Vol. 19, o. 4, pp. 1574 1609 c 009 Society for Industrial and Applied Mathematics ROBUST STOCHASTIC APPROXIMATIO APPROACH TO STOCHASTIC PROGRAMMIG A. EMIROVSKI, A. JUDITSKY, G. LA, AD A.

More information

2 First Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem s Structure

2 First Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem s Structure 2 First Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem s Structure Anatoli Juditsky Anatoli.Juditsky@imag.fr Laboratoire Jean Kuntzmann, Université J. Fourier B. P.

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Anatoli Juditsky, Arkadi Nemirovski, Claire Tauvel. Full terms and conditions of use:

Anatoli Juditsky, Arkadi Nemirovski, Claire Tauvel. Full terms and conditions of use: This article was downloaded by: [148.251.232.83] On: 01 October 2018, At: 10:24 Publisher: Institute for Operations Research and the Management Sciences (INFORMS) INFORMS is located in Maryland, USA Stochastic

More information

Tutorial: Mirror Descent Algorithms for Large-Scale Deterministic and Stochastic Convex Optimization

Tutorial: Mirror Descent Algorithms for Large-Scale Deterministic and Stochastic Convex Optimization Tutorial: Mirror Descent Algorithms for Large-Scale Deterministic and Stochastic Convex Optimization Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of

More information

Hypothesis Testing via Convex Optimization

Hypothesis Testing via Convex Optimization Hypothesis Testing via Convex Optimization Arkadi Nemirovski Joint research with Alexander Goldenshluger Haifa University Anatoli Iouditski Grenoble University Information Theory, Learning and Big Data

More information

DISCUSSION PAPER 2011/70. Stochastic first order methods in smooth convex optimization. Olivier Devolder

DISCUSSION PAPER 2011/70. Stochastic first order methods in smooth convex optimization. Olivier Devolder 011/70 Stochastic first order methods in smooth convex optimization Olivier Devolder DISCUSSION PAPER Center for Operations Research and Econometrics Voie du Roman Pays, 34 B-1348 Louvain-la-Neuve Belgium

More information

Stochastic Composition Optimization

Stochastic Composition Optimization Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators

More information

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f

More information

Dual subgradient algorithms for large-scale nonsmooth learning problems

Dual subgradient algorithms for large-scale nonsmooth learning problems Math. Program., Ser. B (2014) 148:143 180 DOI 10.1007/s10107-013-0725-1 FULL LENGTH PAPER Dual subgradient algorithms for large-scale nonsmooth learning problems Bruce Cox Anatoli Juditsky Arkadi Nemirovski

More information

Lecture 6: Conic Optimization September 8

Lecture 6: Conic Optimization September 8 IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

More information

Online First-Order Framework for Robust Convex Optimization

Online First-Order Framework for Robust Convex Optimization Online First-Order Framework for Robust Convex Optimization Nam Ho-Nguyen 1 and Fatma Kılınç-Karzan 1 1 Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA, 15213, USA. July 20, 2016;

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

Simple Optimization, Bigger Models, and Faster Learning. Niao He

Simple Optimization, Bigger Models, and Faster Learning. Niao He Simple Optimization, Bigger Models, and Faster Learning Niao He Big Data Symposium, UIUC, 2016 Big Data, Big Picture Niao He (UIUC) 2/26 Big Data, Big Picture Niao He (UIUC) 3/26 Big Data, Big Picture

More information

An Optimal Affine Invariant Smooth Minimization Algorithm.

An Optimal Affine Invariant Smooth Minimization Algorithm. An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre d Aspremont, CNRS & École Polytechnique. Joint work with Martin Jaggi. Support from ERC SIPA. A. d Aspremont IWSL, Moscow, June 2013,

More information

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite

More information

On Low Rank Matrix Approximations with Applications to Synthesis Problem in Compressed Sensing

On Low Rank Matrix Approximations with Applications to Synthesis Problem in Compressed Sensing On Low Rank Matrix Approximations with Applications to Synthesis Problem in Compressed Sensing Anatoli Juditsky Fatma Kılınç Karzan Arkadi Nemirovski January 25, 2010 Abstract We consider the synthesis

More information

Assignment 1: From the Definition of Convexity to Helley Theorem

Assignment 1: From the Definition of Convexity to Helley Theorem Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

A Quantum Interior Point Method for LPs and SDPs

A Quantum Interior Point Method for LPs and SDPs A Quantum Interior Point Method for LPs and SDPs Iordanis Kerenidis 1 Anupam Prakash 1 1 CNRS, IRIF, Université Paris Diderot, Paris, France. September 26, 2018 Semi Definite Programs A Semidefinite Program

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles

Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles Math. Program., Ser. A DOI 10.1007/s10107-015-0876-3 FULL LENGTH PAPER Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles Anatoli Juditsky Arkadi Nemirovski

More information

Conditional Gradient Algorithms for Norm-Regularized Smooth Convex Optimization

Conditional Gradient Algorithms for Norm-Regularized Smooth Convex Optimization Conditional Gradient Algorithms for Norm-Regularized Smooth Convex Optimization Zaid Harchaoui Anatoli Juditsky Arkadi Nemirovski May 25, 2014 Abstract Motivated by some applications in signal processing

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so

More information

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Mengdi Wang Ethan X. Fang Han Liu Abstract Classical stochastic gradient methods are well suited

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Lecture 24 November 27

Lecture 24 November 27 EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve

More information

Optimal Regularized Dual Averaging Methods for Stochastic Optimization

Optimal Regularized Dual Averaging Methods for Stochastic Optimization Optimal Regularized Dual Averaging Methods for Stochastic Optimization Xi Chen Machine Learning Department Carnegie Mellon University xichen@cs.cmu.edu Qihang Lin Javier Peña Tepper School of Business

More information

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro

More information

Reformulation and Sampling to Solve a Stochastic Network Interdiction Problem

Reformulation and Sampling to Solve a Stochastic Network Interdiction Problem Network Interdiction Stochastic Network Interdiction and to Solve a Stochastic Network Interdiction Problem Udom Janjarassuk Jeff Linderoth ISE Department COR@L Lab Lehigh University jtl3@lehigh.edu informs

More information

Near-Optimal Linear Recovery from Indirect Observations

Near-Optimal Linear Recovery from Indirect Observations Near-Optimal Linear Recovery from Indirect Observations joint work with A. Nemirovski, Georgia Tech http://www2.isye.gatech.edu/ nemirovs/statopt LN.pdf Les Houches, April, 2017 Situation: In the nature

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Lecture: Adaptive Filtering

Lecture: Adaptive Filtering ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

SMOOTH OPTIMIZATION WITH APPROXIMATE GRADIENT. 1. Introduction. In [13] it was shown that smooth convex minimization problems of the form:

SMOOTH OPTIMIZATION WITH APPROXIMATE GRADIENT. 1. Introduction. In [13] it was shown that smooth convex minimization problems of the form: SMOOTH OPTIMIZATION WITH APPROXIMATE GRADIENT ALEXANDRE D ASPREMONT Abstract. We show that the optimal complexity of Nesterov s smooth first-order optimization algorithm is preserved when the gradient

More information

Applications of Linear Programming

Applications of Linear Programming Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal

More information

Stochastic Gradient Descent with Only One Projection

Stochastic Gradient Descent with Only One Projection Stochastic Gradient Descent with Only One Projection Mehrdad Mahdavi, ianbao Yang, Rong Jin, Shenghuo Zhu, and Jinfeng Yi Dept. of Computer Science and Engineering, Michigan State University, MI, USA Machine

More information

Vincent Guigues FGV/EMAp, Rio de Janeiro, Brazil 1.1 Comparison of the confidence intervals from Section 3 and from [18]

Vincent Guigues FGV/EMAp, Rio de Janeiro, Brazil 1.1 Comparison of the confidence intervals from Section 3 and from [18] Online supplementary material for the paper: Multistep stochastic mirror descent for risk-averse convex stochastic programs based on extended polyhedral risk measures Vincent Guigues FGV/EMAp, -9 Rio de

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Lecture 3: Huge-scale optimization problems

Lecture 3: Huge-scale optimization problems Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Constrained optimization

Constrained optimization Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Proximal and First-Order Methods for Convex Optimization

Proximal and First-Order Methods for Convex Optimization Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,

More information

Random coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara

Random coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara Random coordinate descent algorithms for huge-scale optimization problems Ion Necoara Automatic Control and Systems Engineering Depart. 1 Acknowledgement Collaboration with Y. Nesterov, F. Glineur ( Univ.

More information

SGD and Randomized projection algorithms for overdetermined linear systems

SGD and Randomized projection algorithms for overdetermined linear systems SGD and Randomized projection algorithms for overdetermined linear systems Deanna Needell Claremont McKenna College IPAM, Feb. 25, 2014 Includes joint work with Eldar, Ward, Tropp, Srebro-Ward Setup Setup

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

Bregman Divergence and Mirror Descent

Bregman Divergence and Mirror Descent Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,

More information

Primal-dual subgradient methods for convex problems

Primal-dual subgradient methods for convex problems Primal-dual subgradient methods for convex problems Yu. Nesterov March 2002, September 2005 (after revision) Abstract In this paper we present a new approach for constructing subgradient schemes for different

More information

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture Note 5: Semidefinite Programming for Stability Analysis ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State

More information

Limited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives

Limited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives Limited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives Madeleine Udell Operations Research and Information Engineering Cornell University Based on joint work with Song

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Adaptive Restarting for First Order Optimization Methods

Adaptive Restarting for First Order Optimization Methods Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum

More information

We describe the generalization of Hazan s algorithm for symmetric programming

We describe the generalization of Hazan s algorithm for symmetric programming ON HAZAN S ALGORITHM FOR SYMMETRIC PROGRAMMING PROBLEMS L. FAYBUSOVICH Abstract. problems We describe the generalization of Hazan s algorithm for symmetric programming Key words. Symmetric programming,

More information

Non-asymptotic confidence bounds for the optimal value of a stochastic program

Non-asymptotic confidence bounds for the optimal value of a stochastic program on-asymptotic confidence bounds for the optimal value of a stochastic program Vincent Guigues, Anatoli Juditsky, Arkadi emirovski To cite this version: Vincent Guigues, Anatoli Juditsky, Arkadi emirovski.

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Improving the Convergence of Back-Propogation Learning with Second Order Methods the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract

More information

arxiv: v1 [math.oc] 5 Dec 2014

arxiv: v1 [math.oc] 5 Dec 2014 FAST BUNDLE-LEVEL TYPE METHODS FOR UNCONSTRAINED AND BALL-CONSTRAINED CONVEX OPTIMIZATION YUNMEI CHEN, GUANGHUI LAN, YUYUAN OUYANG, AND WEI ZHANG arxiv:141.18v1 [math.oc] 5 Dec 014 Abstract. It has been

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

[2] (a) Develop and describe the piecewise linear Galerkin finite element approximation of,

[2] (a) Develop and describe the piecewise linear Galerkin finite element approximation of, 269 C, Vese Practice problems [1] Write the differential equation u + u = f(x, y), (x, y) Ω u = 1 (x, y) Ω 1 n + u = x (x, y) Ω 2, Ω = {(x, y) x 2 + y 2 < 1}, Ω 1 = {(x, y) x 2 + y 2 = 1, x 0}, Ω 2 = {(x,

More information

SUPPORT VECTOR MACHINES VIA ADVANCED OPTIMIZATION TECHNIQUES. Eitan Rubinstein

SUPPORT VECTOR MACHINES VIA ADVANCED OPTIMIZATION TECHNIQUES. Eitan Rubinstein SUPPORT VECTOR MACHINES VIA ADVANCED OPTIMIZATION TECHNIQUES Research Thesis Submitted in partial fulfillment of the requirements for the degree of Master of Science in Operations Research and System Analysis

More information

Stochastic Subgradient Method

Stochastic Subgradient Method Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)

More information

ADVANCES IN CONVEX OPTIMIZATION: CONIC PROGRAMMING. Arkadi Nemirovski

ADVANCES IN CONVEX OPTIMIZATION: CONIC PROGRAMMING. Arkadi Nemirovski ADVANCES IN CONVEX OPTIMIZATION: CONIC PROGRAMMING Arkadi Nemirovski Georgia Institute of Technology and Technion Israel Institute of Technology Convex Programming solvable case in Optimization Revealing

More information

Semidefinite Programming

Semidefinite Programming Semidefinite Programming Basics and SOS Fernando Mário de Oliveira Filho Campos do Jordão, 2 November 23 Available at: www.ime.usp.br/~fmario under talks Conic programming V is a real vector space h, i

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information