Convex Stochastic and Large-Scale Deterministic Programming via Robust Stochastic Approximation and its Extensions
|
|
- Shona Stewart
- 5 years ago
- Views:
Transcription
1 Convex Stochastic and Large-Scale Deterministic Programming via Robust Stochastic Approximation and its Extensions Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research with Anatoli Juditsky, Guanghui Lan, and Alexander Shapiro : Joseph Fourier University, Grenoble, France; : ISyE, University of Florida; : ISyE, Georgia Tech BFG 09, Leuven September 14-18, 2009
2 Overview Convex Stochastic Minimization and Convex-Concave Stochastic Saddle Point problems Classical Stochastic Approximation Robust Stochastic Approximation & Stochastic Mirror Prox construction efficiency estimates how it works Acceleration via Randomization Matrix Game & l 1 -minimization with uniform fit l 1 -minimization with Least Squares fit Minimizing maxima of convex polynomials over a simplex
3 Convex Stochastic Minimization Problem The basic Convex Stochastic Minimization problem is min f (x) x X (SM) X R n : convex compact set f (x) : X R: convex and Lipschitz continuous objective given by a Stochastic Oracle. At i-th call to the oracle, the input being x X, the oracle returns stochastic gradient G(x, ξ i ) R n of f, with independent ξ i P, i = 1, 2,... and E ξ P {G(x, ξ)} f (x) x X. In many cases (but not always!) G(x, ξ) x F(x, ξ), where F(x, ξ) is convex in x. Here (SM) is the problem of minimizing the expectation f (x) = E{F(x, ξ)} of a convex in x integrant F(x, ξ) given by a first order oracle.
4 Convex-Concave Stochastic Saddle Point Problem Convex Stochastic Minimization problem is a particular case of Convex-Concave Stochastic Saddle Point problem min max x X y Y f (x, y) (SP) Z = X Y R n = R nx R ny R n : convex compact set f (x, y) : X Y R: convex in x X and concave in y Y Lipschitz continuous function given by a Stochastic Oracle. At i-th call to the oracle, the input being z = (x, y) Z, Stochastic Oracle returns stochastic derivative G(z, ξ i ) = [G x (z, ξ i ); G y (z, ξ i )] R nx R ny of f, with independent ξ i P i = 1, 2,... and F(x, y) := E ξ P {G(x, y, ξ)} x f (x, y) y ( f (x, y))
5 Classical Stochastic Approximation min f (x) x X min f (x, y) (SP) max x X y Y (SM) The very first method for solving (SM) is the classical Stochastic Approximation (Robbins & Monro 51) which mimics the usual Subgradient Descent: { x t+1 = argmin 1 u X 2 u x t γ t G(x t, ξ t ), u }, γ x t+1 = closest to x t γ t G(x t, ξ t ) point of X t = a t When f is strongly convex on X, SA with properly chosen a ensures that E{ x t argmin x f 2 2 } O(1/t). If, in addition, argmin X f intx and f is Lipschitz continuous, then also E{f (x t ) min x f } O(1/t). Similar construction, with similar results, works for (SP) with strongly convex-concave f.
6 Classical Stochastic Approximation (continued) Good news on Classical SA: with smooth strongly convex f attaining its minimum in intx, the rate of convergence E{f (x t ) min x f } O(1/t) is the best allowed under circumstances by Laws of Statistics. Bad news on Classical SA: it requires strong convexity of the objective f and is highly sensitive to the scale factor a in the stepsize rule γ t = a t. To choose a properly, we need to know the modulus of strong convexity of f, and overestimating this modulus may result in dramatic slowing down the convergence. The Stochastic Optimization community commonly treats SA as a highly unstable and impractical optimization technique.
7 Robust Stochastic Approximation x t+1 = argmin u X { 1 2 u x t γ t G(x t, ξ t ), u }, γ t = a t (CSA) It was discovered long ago [Nem.&Yudin 79] that SA admits a robust version which does not require strong convexity of the objective in (SM). To make SA robust, it suffices to replace short steps γ t = a t with long steps γ t = a t, and [ t ] 1 to add to (CSA) the averaging x t = τ=1 γ t τ τ=1 γ τ x τ and to treat, as approximate solutions, the averages x t rather than the search points x t. Another important improvement (in its rudimentary form going back to [Nem.&Yudin 79]) is to replace the quadratic prox term 1 2 u x t 2 2 in (CSA) with a more general prox term, which allows to adjust, to some extent, the method to the geometry of the problem of interest.
8 Robust Stochastic Approximation [Jud.&Lan&Nem.&Shap. 09] min max x X y Y f (x, y) G(x, y, ξ) : E ξ {G(x, y, ξ)} x f (x, y) y [ f (x, y)] Setup for RSA: a norm on R n Z = X Y and a convex continuous distance generating function ω(z) : Z R such that the set Z o = {z Z : ω(z) } is convex, and ω Z o is C 1 and strongly convex: z, z Z o : ω (z) ω (z ), z z α z z 2 [α > 0] Robust Stochastic Approximation: z 1 = z ω := argmin Z ω( ) z t+1 = argmin {[ω(u) ω (z t ), u ] + γ t G(z t, ξ t ), u } z t u Z [ t ] 1 = τ=1 γ t τ τ=1 γ τ z τ
9 Stochastic Mirror Prox [Jud.&Nem.&Tauvel 08] min max x X y Y f (x, y) G(x, y, ξ) : E ξ {G(x, y, ξ)} x f (x, y) y [ f (x, y)] Stochastic Mirror Prox algorithm: z 1 = z ω := argmin Z ω( ) w t = argmin {[ω(u) ω (z t ), u ] + γ t G(z t, ξ 2t 1 ), u } u Z z t+1 = argmin {[ω(u) ω (z t ), u ] + γ t G(w t, ξ 2t ), u } z t u Z [ t ] 1 = τ=1 γ t τ τ=1 γ τ w τ
10 Saddle Point accuracy measure min max x X y Y f (x, y) (SP) Accuracy measure for (SM): ErrSad(x, y) = max y Y f (x, y ) min x X f (x, y). Explanation: (SP) gives rise to two optimization programs with equal optimal values: Primal: Dual: min x X max y Y f (x), f (x) = maxf (x, y) y Y f (x, y) f (y), f (y) = min x X ErrSad(x, y) is the duality gap the sum of non-optimalities of x and y as approximate solutions to the respective problems.
11 RSA & SMP: Efficiency Estimates min max x X y Y f (x, y) G(x, y, ξ) : E ξ {G(x, y, ξ)} x f (x, y) y [ f (x, y)] (SP) Theorem: Assume that E{ G(z, ξ) 2 } M 2 z Z = X Y, and let N 1. N-step RSA with properly chosen constant steps γ t = γ, 1 t N, ensures that E { ErrSad(z N ) } O(1)ΘM/ N [ Θ = 2 maxz Z [ω(z) ω(z ω ) ω (z ω ), z z ω ]/α ] With light tail G(z, ξ) /M, probabilities of large deviations in RSA admit exponential bounds: E { exp{ G(z, ξ) 2 /M 2 } } { exp{1} z Z Ω > 1 : Prob ErrSad(z N ) > O(1)ΩΘM/ } N 2 exp{ Ω} Similar results hold true for SMP.
12 Good Geometry case Let rint Z rint Z +, where Z + is the direct product of K basic blocks : K b unit Euclidean balls B i E i = R n i ; K s spectahedrons S j F j ; F j : spaces of symmetric k j k j matrices of a given block-diagonal structure with the Frobenius inner product S j : the set of all positive semidefinite matrices from F j with unit trace Good setup for RSA/SMP is given by Kb the norm z = i=1 zi K s j=1 z j 2 tr z i : ball components of z z j : spectahedron components of z tr : trace norm of a symmetric matrix. the dgf ω(z) = K b i=1 1 2 zi K s j=1 Entropy(z j) Entropy(u) = l λ l(u) ln λ l (u), λ l (u) being eigenvalues of a symmetric matrix u.
13 Good Geometry case (continued) With the above setup, we have Θ = O(1) K b + K s j=1 ln k j E{ErrSad(z N )} O(1)[K b + K s j=1 ln k j] 1/2 M/ N Good news: When K = O(1), the rate of convergence is nearly dimension-independent and, moreover, nearly the best allowed under circumstances by Laws of Statistics The computational cost of a step is dominated [ by the costs of a call to SO and a prox-step g argmin z Z g T z + ω(z) ] When Z = Z +, the cost( C of a prox-step is C = O(1) i dim (E i) + ) j C j C j : the cost of eigenvalue decomposition of A F j When Z is simple product of K balls and simplexes we have C = O(1)dim Z.
14 How it works: Convex Stochastic Programming min f (x) := E ξ P{F(x, ξ)} x X R n G(x, ξ) x F(x, ξ), E ξ P { G(x, ξ) 2 } M 2 (SM) Till now, the working horse of Convex Stochastic Minimization is the Sample Average Approximation method. In SAA, problem (SM) is replaced with its empirical counterpart min x fn (x), f N (x) = 1 N N i=1 F(x, ξ i) [ξ 1,..., ξ N : sample drawn from P] In the Good Geometry case, the SAA sample size N resulting, with probability close to 1, in ɛ-solution to (SM) is O(1)n(M/ɛ) 2 [Shap. 03], which is by factor O(n) worse than the number of steps of RSA/SMP yielding a solution of the same quality.
15 How it works (continued) min f (x) := E ξ P{F (x, ξ)} x X R n min fn (x) = 1 N x X N i=1 f (x, ξ i) (SM) (SAA) A single call to the first order oracle for f N is as costly as N calls to the SO. RSA and SMP are much cheaper computationally than SAA. When Z is simple, the overall computational effort in the N-step RSA/SMP is of the same order as the effort to compute a single subgradient of f N ( )! Our experimental results consistently demonstrate that while the solutions built by RSA and SAA with a common sample size N are of nearly the same quality, RSA by orders of magnitude outperforms SAA in terms of running time.
16 How it works: Typical experiment f = min x { f (x) = Eξ N (a,in){φ(ξ T x)} : x 0, i x 1 = 1 }, a = 1 φ( ) : piecewise linear convex loss function n = 1000, f = n = 5000, f = Method f (z 4000 ) : f (z 4000 ) f CPU f (z 4000 ) : f (z 4000 ) f CPU SAA : : N-RSA : : E-RSA : : N-RSA: = 1, ω(x) = i x i ln x i E{f (z N ) f } O(1) ln n N E-RSA: = 2, ω(x) = 1 2 x T x E{f (z N ) f } O(1) n N
17 SMP vs RSA min max x X y Y f (x, y) (SP) F(x, y) := E ξ {G(x, y, ξ)} x f (x, y) y [ f (x, y)] Question: What, if any, are the advantages of SMP as compared to RSA? Answer: SMP can utilize the fact that some components of F(x, y) are regular and well-observable. Assumption A: F(z) F(z ) L z z + σ z, z Z = X Y E ξ { G(z, ξ) F(z) 2 } σ 2 Fact: Under A, the quantity M 2 = sup z Z E ξ { G(z, ξ) 2 } can be as large as O(1)(ΘL + σ) 2 The efficiency of RSA cannot be better than E { ErrSad(z N RSA )} O(1)Θ ΘL + σ N even when σ = 0, i.e., f C 1,1 (L), no noise at all!
18 SMP vs RSA (continued) min max x X y Y f (x, y) F(x, y) := E ξ {G(x, y, ξ)} x f (x, y) y [ f (x, y)] F(z) F(z ) L x y + σ, E ξ { G(z, ξ) F(z) 2 } σ 2 Theorem: N-step SMP with properly chosen constant stepsizes ensures that E { [ ] ErrSad(zSMP N )} O(1)Θ ΘL N + σ (!) { N When E ξ exp{ G(z, ξ) F(z) 2 /σ 2 } } exp{1}, for all Ω > 0 one has [ [ ] ]} Prob {ErrSad(zSMP N ) > O(1) Θ ΘL N + σ + ΩΘσ N N exp{ Ω 2 /3} + exp{ ΩN} Good news: When Z is the product of O(1) high-dimensional balls, (!) is the best efficiency estimate achievable under circumstances.
19 Applying SMP: Stochastic Feasibility problem We want to solve a feasible system of simple and difficult convex constraints: S µ (x) 0, 1 µ p D ν (x) 0, 1 ν q x X = {x R n : x 2 1} S µ are given by precise First Order oracle and are smooth: S µ(x) 2 L µ, S µ(x) S µ(x ) 2 ln(p + q)l µ x x 2. D ν are given by Stochastic Oracle. At i-th call to the SO, x X being the input, the SO returns unbiased estimates h ν (x, ξ i ), H ν (x, ξ i ) of D ν (x), D ν(x), with independent ξ i P and E ξ {max ν [ h 2 ν (x,ξ) M 2 ν ]} 1, max ν E ξ { Hν(x,ξ) 2 2 M 2 ν } ln(p + q).
20 Stochastic Feasibility problem (continued) S µ (x) 0, 1 µ p D ν (x) 0, 1 ν q x X = {x R n : x 2 1} S ν : S µ L µ, S µ(x) S µ(x ) 2 ln(p } + q)l µ x x 2 D ν (x) = E ξ {h ν (x, ξ)}, E ξ {max h 2 ν ν(x, ξ)/mν 2 1 D ν(x) = E ξ {H ν (x, ξ)}, max ν E ξ { H(x, ξ) 2 2 /M 2 ν } ln(p + q) Properly applying N-step SMP to the Saddle Point problem min x X [ max p y 0, i y i =1 µ=1 y µα µ S µ (x) + we build a solution x N X such that q ν=1 ] y p+ν α p+ν D ν (x) S µ (x N ) ln(p + q) Lν N ζ, D ν(x N ) ln(p + q) Mν N ζ ζ 0, E{ζ} O(1).
21 Acceleration via Randomization When solving large-scale well-structured convex optimization problems, e.g., the l 1 minimization problem min x { x 1 : Ax b δ} with large-scale and possibly dense matrix A, polynomial time methods, and thus generating high accuracy solutions, become prohibitively time consuming; when medium accuracy solutions are sought, our best choice is computationally cheap first order methods. There are interesting and important cases when first order methods can be accelerated significantly by randomization passing from computationally demanding deterministic first order oracles to their computationally cheap stochastic counterparts.
22 Example 1: Matrix Game min max f (x, y) := y T Ax, k = {u R k + : i u i = 1} x n y m F(x, y) := [f x(x, y); f y(x, y)] = [A T y; Ax] A := max A ij i,j (MG) When A is large and dense, solving (MG) by polynomial time algorithms is prohibitively time consuming. The best deterministic First Order algorithms find ɛ-solution to (MG) in O(1) ln(m + n)(a/ɛ) steps, with O(1) computations of F( ) per step. The total effort is O(1) ln(m + n)mn(a/ɛ) a.o. Matrix-vector multiplications m y A T y, n x Ax required to compute F(x, y) are easy to randomize: to build an unbiased estimate of Bu = i u jb j, draw at random a column index j: Prob{j = j} = u j u 1, and return u 1 sign(u j )B j.
23 Matrix Game (continued) min x n max y m y T Ax, A := max A ij (MG) i,j Equipping (MG) with Stochastic Oracle and applying RSA, we get, with confidence 1 β, an ɛ solution to (MG) in O(1) ln(mn/β)(a/ɛ) 2 steps. A step reduces to extracting from A randomly chosen row and column plus O(m + n) a.o. overhead. ɛ-solutions costs O(1) ln(mn/β)(m + n)(a/ɛ) 2 a.o. With ɛ, A fixed and m = O(n) large, RSA outperforms by order of magnitudes the best known deterministic alternatives exhibits sublinear time behaviour: ɛ-solution is found by just O(1) ln(mn/β)(m + n)(a/ɛ) 2 of the total of mn data entries, cf. Grigoriadis & Khachiyan 95.
24 Application: l 1 minimization with uniform fit Numerous sparsity-oriented Signal Processing problems reduce to (small series of) problems min u { Au b p : u 1 1} [A : m n] (l 1 ) When p =, (l 1 ) reduces to Matrix Game min max [p q] T [ A b1 T, A b1 T ] [u; v] x=[u;v] 2n y=[p;q] 2m RSA can be used to solve l 1 -minimization problems. When m, n are really large (like 10 4 and more) and A is a general-type dense analytically given matrix, (l 1 ) becomes prohibitively difficult for all known deterministic algorithms, and randomized algorithms become a kind of "last resort".
25 How it works Opt = min u { Au b : u 1 1} [ A: dense analytically given m n matrix b: Au b δ = 1.e-3 with 16-sparse u, u 1 = 1 Errors CPU m n Method u ũ 1 u ũ 2 u ũ sec Mult DMP RSA DMP RSA DMP: Deterministic Mirror Prox ũ: resulting solution Last column: (equivalent) # of matrix-vector multiplications. ]
26 How it works (continued) x l 1 -recovery: Blue: true signal, Cyan: MDSA
27 Example 2: l 1 -minimization with Least Squares fit The Least Squares l 1 minimization problem min u { Au b 2 : u 1 1} [A : m n] reduces to the Saddle Point problem min max y T [Ax b] [F(x, y) = [A T y; b Ax]] y 2 1 x n Randomizing matrix-vector multiplications and applying RSA, we get, with confidence 1 β, an ɛ-solution with the overall effort of O(1) ln(mn/β)( mc 2 /ɛ) 2 a.o. ( ) C 2 : maximum of 2 -norms of columns in C = [A, b] Bad news: Factor m in ( ) reflecting high potential noisiness of the unbiased estimate (y, y 2 1) y 1 sign(y ı )[A ı,: ] T of A T y: it may happen that y 1 sign(y ı )[A ı,: ] T mc 2
28 l 1 -minimization with Least Squares fit (continued) min max y T [Ax b] y 2 1 x n [F(x, y) = [A T y; b Ax]] Remedy: Randomized preprocessing [A, b] UD[A, b] with orthogonal U, U ij O(m 1/2 ) and random diagonal D with iid D ii Uniform{ 1; 1} results in an equivalent problem, preserves C 2 and reduces noise in the estimate of A T y: now, with confidence 1 β, y 2 1 y 1 sign(y ı )[A ı,: ] T O(1) ln(mn/β)c 2 After preprocessing, RSA finds, with confidence 1 β, an ɛ-solution to the problem at the cost of O(1) ln 2 (mn/β)(c 2 /ɛ) 2 a.o. With properly chosen U (e.g., Hadamard or DFT matrix), the preprocessing costs just O(1)mn ln(m) a.o.
29 Example 3: Minimizing the maximum of convex polynomials over a simplex/l 1 -ball Consider the optimization problem min max p k(x), p k (x) = d l=0 A kl[x,..., x] ( ) x n 1 k m A kl [x 1,..., x l ]: symmetric l-linear forms; p k ( ): convex. We can reduce ( ) to the Saddle Point problem min k y jp k (x) max x n y m and randomize in the aforementioned manner computation of p k ( ), p k ( ): To build an unbiased estimates p k (x), P k (x) of p k (x) and P k (x) = p k (x), x n, we generate at random indices j 1,..., j d, Prob{j = j} = x j, and set p k = d l=0 A kl[e j1,..., e jl ] [ P k ] j = d l=1 la kl[e j1,..., e jl 1, e j ] 1 j n.
30 Minimizing maximum of convex polynomials (continued) min max d l=0 A kl[x,..., x] ( ) x n 1 k m A: maximum magnitude of coefficients in A kl [,..., ] Applying RSA, we find, with confidence 1 β, an ɛ-solution to ( ) at the cost of O(1) ln(mn/β)d 2 (A/ɛ) 2 steps, with a step reducing to extracting O(1)d(m + n) coefficients of the forms A kl [,,..., ], given the addresses k, l, j 1,..., j l of the coefficients, plus O(m + dn)-a.o. overhead.
Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles
Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research
More informationSaddle Point Algorithms for Large-Scale Well-Structured Convex Optimization
Saddle Point Algorithms for Large-Scale Well-Structured Convex Optimization Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research
More informationGradient Sliding for Composite Optimization
Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this
More informationDifficult Geometry Convex Programs and Conditional Gradient Algorithms
Difficult Geometry Convex Programs and Conditional Gradient Algorithms Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research with
More informationConvex Optimization Under Inexact First-order Information. Guanghui Lan
Convex Optimization Under Inexact First-order Information A Thesis Presented to The Academic Faculty by Guanghui Lan In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy H. Milton
More informationMini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization
Noname manuscript No. (will be inserted by the editor) Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Saeed Ghadimi Guanghui Lan Hongchao Zhang the date of
More informationEfficient Methods for Stochastic Composite Optimization
Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June
More information1 First Order Methods for Nonsmooth Convex Large-Scale Optimization, I: General Purpose Methods
1 First Order Methods for Nonsmooth Convex Large-Scale Optimization, I: General Purpose Methods Anatoli Juditsky Anatoli.Juditsky@imag.fr Laboratoire Jean Kuntzmann, Université J. Fourier B. P. 53 38041
More informationStochastic Semi-Proximal Mirror-Prox
Stochastic Semi-Proximal Mirror-Prox Niao He Georgia Institute of echnology nhe6@gatech.edu Zaid Harchaoui NYU, Inria firstname.lastname@nyu.edu Abstract We present a direct extension of the Semi-Proximal
More information6 First-Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem s Structure
6 First-Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem s Structure Anatoli Juditsky Anatoli.Juditsky@imag.fr Laboratoire Jean Kuntzmann, Université J. Fourier B. P.
More informationALGORITHMS FOR STOCHASTIC OPTIMIZATION WITH EXPECTATION CONSTRAINTS
ALGORITHMS FOR STOCHASTIC OPTIMIZATIO WITH EXPECTATIO COSTRAITS GUAGHUI LA AD ZHIIAG ZHOU Abstract This paper considers the problem of minimizing an expectation function over a closed convex set, coupled
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More information1. Introduction. In this paper we first consider the following stochastic optimization
SIAM J. OPTIM. Vol. 19, o. 4, pp. 1574 1609 c 009 Society for Industrial and Applied Mathematics ROBUST STOCHASTIC APPROXIMATIO APPROACH TO STOCHASTIC PROGRAMMIG A. EMIROVSKI, A. JUDITSKY, G. LA, AD A.
More information2 First Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem s Structure
2 First Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem s Structure Anatoli Juditsky Anatoli.Juditsky@imag.fr Laboratoire Jean Kuntzmann, Université J. Fourier B. P.
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationAnatoli Juditsky, Arkadi Nemirovski, Claire Tauvel. Full terms and conditions of use:
This article was downloaded by: [148.251.232.83] On: 01 October 2018, At: 10:24 Publisher: Institute for Operations Research and the Management Sciences (INFORMS) INFORMS is located in Maryland, USA Stochastic
More informationTutorial: Mirror Descent Algorithms for Large-Scale Deterministic and Stochastic Convex Optimization
Tutorial: Mirror Descent Algorithms for Large-Scale Deterministic and Stochastic Convex Optimization Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of
More informationHypothesis Testing via Convex Optimization
Hypothesis Testing via Convex Optimization Arkadi Nemirovski Joint research with Alexander Goldenshluger Haifa University Anatoli Iouditski Grenoble University Information Theory, Learning and Big Data
More informationDISCUSSION PAPER 2011/70. Stochastic first order methods in smooth convex optimization. Olivier Devolder
011/70 Stochastic first order methods in smooth convex optimization Olivier Devolder DISCUSSION PAPER Center for Operations Research and Econometrics Voie du Roman Pays, 34 B-1348 Louvain-la-Neuve Belgium
More informationStochastic Composition Optimization
Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators
More informationConvex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization
Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f
More informationDual subgradient algorithms for large-scale nonsmooth learning problems
Math. Program., Ser. B (2014) 148:143 180 DOI 10.1007/s10107-013-0725-1 FULL LENGTH PAPER Dual subgradient algorithms for large-scale nonsmooth learning problems Bruce Cox Anatoli Juditsky Arkadi Nemirovski
More informationLecture 6: Conic Optimization September 8
IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions
More informationOnline First-Order Framework for Robust Convex Optimization
Online First-Order Framework for Robust Convex Optimization Nam Ho-Nguyen 1 and Fatma Kılınç-Karzan 1 1 Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA, 15213, USA. July 20, 2016;
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationSimple Optimization, Bigger Models, and Faster Learning. Niao He
Simple Optimization, Bigger Models, and Faster Learning Niao He Big Data Symposium, UIUC, 2016 Big Data, Big Picture Niao He (UIUC) 2/26 Big Data, Big Picture Niao He (UIUC) 3/26 Big Data, Big Picture
More informationAn Optimal Affine Invariant Smooth Minimization Algorithm.
An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre d Aspremont, CNRS & École Polytechnique. Joint work with Martin Jaggi. Support from ERC SIPA. A. d Aspremont IWSL, Moscow, June 2013,
More informationA Sparsity Preserving Stochastic Gradient Method for Composite Optimization
A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite
More informationOn Low Rank Matrix Approximations with Applications to Synthesis Problem in Compressed Sensing
On Low Rank Matrix Approximations with Applications to Synthesis Problem in Compressed Sensing Anatoli Juditsky Fatma Kılınç Karzan Arkadi Nemirovski January 25, 2010 Abstract We consider the synthesis
More informationAssignment 1: From the Definition of Convexity to Helley Theorem
Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationA Quantum Interior Point Method for LPs and SDPs
A Quantum Interior Point Method for LPs and SDPs Iordanis Kerenidis 1 Anupam Prakash 1 1 CNRS, IRIF, Université Paris Diderot, Paris, France. September 26, 2018 Semi Definite Programs A Semidefinite Program
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationOne Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties
One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationSolving variational inequalities with monotone operators on domains given by Linear Minimization Oracles
Math. Program., Ser. A DOI 10.1007/s10107-015-0876-3 FULL LENGTH PAPER Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles Anatoli Juditsky Arkadi Nemirovski
More informationConditional Gradient Algorithms for Norm-Regularized Smooth Convex Optimization
Conditional Gradient Algorithms for Norm-Regularized Smooth Convex Optimization Zaid Harchaoui Anatoli Juditsky Arkadi Nemirovski May 25, 2014 Abstract Motivated by some applications in signal processing
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationStochastic Gradient Descent. Ryan Tibshirani Convex Optimization
Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so
More informationStochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values
Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Mengdi Wang Ethan X. Fang Han Liu Abstract Classical stochastic gradient methods are well suited
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationRecent Developments in Compressed Sensing
Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationLecture 24 November 27
EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve
More informationOptimal Regularized Dual Averaging Methods for Stochastic Optimization
Optimal Regularized Dual Averaging Methods for Stochastic Optimization Xi Chen Machine Learning Department Carnegie Mellon University xichen@cs.cmu.edu Qihang Lin Javier Peña Tepper School of Business
More informationPrimal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming
Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro
More informationReformulation and Sampling to Solve a Stochastic Network Interdiction Problem
Network Interdiction Stochastic Network Interdiction and to Solve a Stochastic Network Interdiction Problem Udom Janjarassuk Jeff Linderoth ISE Department COR@L Lab Lehigh University jtl3@lehigh.edu informs
More informationNear-Optimal Linear Recovery from Indirect Observations
Near-Optimal Linear Recovery from Indirect Observations joint work with A. Nemirovski, Georgia Tech http://www2.isye.gatech.edu/ nemirovs/statopt LN.pdf Les Houches, April, 2017 Situation: In the nature
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationLecture: Adaptive Filtering
ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate
More informationLecture 13 October 6, Covering Numbers and Maurey s Empirical Method
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and
More informationSMOOTH OPTIMIZATION WITH APPROXIMATE GRADIENT. 1. Introduction. In [13] it was shown that smooth convex minimization problems of the form:
SMOOTH OPTIMIZATION WITH APPROXIMATE GRADIENT ALEXANDRE D ASPREMONT Abstract. We show that the optimal complexity of Nesterov s smooth first-order optimization algorithm is preserved when the gradient
More informationApplications of Linear Programming
Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal
More informationStochastic Gradient Descent with Only One Projection
Stochastic Gradient Descent with Only One Projection Mehrdad Mahdavi, ianbao Yang, Rong Jin, Shenghuo Zhu, and Jinfeng Yi Dept. of Computer Science and Engineering, Michigan State University, MI, USA Machine
More informationVincent Guigues FGV/EMAp, Rio de Janeiro, Brazil 1.1 Comparison of the confidence intervals from Section 3 and from [18]
Online supplementary material for the paper: Multistep stochastic mirror descent for risk-averse convex stochastic programs based on extended polyhedral risk measures Vincent Guigues FGV/EMAp, -9 Rio de
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming
E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program
More informationOptimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method
Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationLecture 3: Huge-scale optimization problems
Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationConstrained optimization
Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationProximal and First-Order Methods for Convex Optimization
Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,
More informationRandom coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara
Random coordinate descent algorithms for huge-scale optimization problems Ion Necoara Automatic Control and Systems Engineering Depart. 1 Acknowledgement Collaboration with Y. Nesterov, F. Glineur ( Univ.
More informationSGD and Randomized projection algorithms for overdetermined linear systems
SGD and Randomized projection algorithms for overdetermined linear systems Deanna Needell Claremont McKenna College IPAM, Feb. 25, 2014 Includes joint work with Eldar, Ward, Tropp, Srebro-Ward Setup Setup
More informationProximal Methods for Optimization with Spasity-inducing Norms
Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationPrimal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions
Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function
More informationBregman Divergence and Mirror Descent
Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,
More informationPrimal-dual subgradient methods for convex problems
Primal-dual subgradient methods for convex problems Yu. Nesterov March 2002, September 2005 (after revision) Abstract In this paper we present a new approach for constructing subgradient schemes for different
More informationLecture Note 5: Semidefinite Programming for Stability Analysis
ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State
More informationLimited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives
Limited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives Madeleine Udell Operations Research and Information Engineering Cornell University Based on joint work with Song
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationAdaptive Restarting for First Order Optimization Methods
Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation
More informationGradient methods for minimizing composite functions
Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum
More informationWe describe the generalization of Hazan s algorithm for symmetric programming
ON HAZAN S ALGORITHM FOR SYMMETRIC PROGRAMMING PROBLEMS L. FAYBUSOVICH Abstract. problems We describe the generalization of Hazan s algorithm for symmetric programming Key words. Symmetric programming,
More informationNon-asymptotic confidence bounds for the optimal value of a stochastic program
on-asymptotic confidence bounds for the optimal value of a stochastic program Vincent Guigues, Anatoli Juditsky, Arkadi emirovski To cite this version: Vincent Guigues, Anatoli Juditsky, Arkadi emirovski.
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationImproving the Convergence of Back-Propogation Learning with Second Order Methods
the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible
More informationAn accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems
An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract
More informationarxiv: v1 [math.oc] 5 Dec 2014
FAST BUNDLE-LEVEL TYPE METHODS FOR UNCONSTRAINED AND BALL-CONSTRAINED CONVEX OPTIMIZATION YUNMEI CHEN, GUANGHUI LAN, YUYUAN OUYANG, AND WEI ZHANG arxiv:141.18v1 [math.oc] 5 Dec 014 Abstract. It has been
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More information[2] (a) Develop and describe the piecewise linear Galerkin finite element approximation of,
269 C, Vese Practice problems [1] Write the differential equation u + u = f(x, y), (x, y) Ω u = 1 (x, y) Ω 1 n + u = x (x, y) Ω 2, Ω = {(x, y) x 2 + y 2 < 1}, Ω 1 = {(x, y) x 2 + y 2 = 1, x 0}, Ω 2 = {(x,
More informationSUPPORT VECTOR MACHINES VIA ADVANCED OPTIMIZATION TECHNIQUES. Eitan Rubinstein
SUPPORT VECTOR MACHINES VIA ADVANCED OPTIMIZATION TECHNIQUES Research Thesis Submitted in partial fulfillment of the requirements for the degree of Master of Science in Operations Research and System Analysis
More informationStochastic Subgradient Method
Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)
More informationADVANCES IN CONVEX OPTIMIZATION: CONIC PROGRAMMING. Arkadi Nemirovski
ADVANCES IN CONVEX OPTIMIZATION: CONIC PROGRAMMING Arkadi Nemirovski Georgia Institute of Technology and Technion Israel Institute of Technology Convex Programming solvable case in Optimization Revealing
More informationSemidefinite Programming
Semidefinite Programming Basics and SOS Fernando Mário de Oliveira Filho Campos do Jordão, 2 November 23 Available at: www.ime.usp.br/~fmario under talks Conic programming V is a real vector space h, i
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More information