A. Derivation of regularized ERM duality
|
|
- Raymond Edwards
- 5 years ago
- Views:
Transcription
1 A. Derivation of regularized ERM dualit For completeness, in this section we derive the dual 5 to the problem of computing proximal operator for the ERM objective 3. We can rewrite the primal problem as B convex dualit, this is equivalent to min max x,z i} R n φ i z i + x s + T Ax z = max n min x R d,z R n φ iz i + x s. subject to z i = a T i x, for i =,..., n = max = max = max = max = min The final negated problem is precisel the dual formulation. min x,z i} min x,z i} φ i z i + x s + T Ax z φ i z i i z i + x s + T Ax min φ i z i i z i } + min z i x max i z i φ i z i } max z i x φ i i AT + s T A T φ i i + AT s T A T. The first problem is a Lagrangian saddle-point problem, where the Lagrangian is defined as Lx,, z = φ i z i + x s + T Ax z. } x s + T Ax T Ax x s The dual-to-primal mapping 6 and primal-to-dual mapping 7 are implied b the KKT conditions under L, and can be derived b solving for x,, and z in the sstem Lx,, z = 0. The dualit gap in this context is defined as gap s, x, = f s, x + g s,. 9 Strong dualit dictates that gap s, x, 0 for all x R d, R n, with equalit attained when x is primal-optimal and is dual-optimal. B. Additional lemmas This section establishes technical lemmas, useful in statements and proofs throughout the paper. Lemma B. Standard bounds for smooth, strongl convex functions. Let f : R k R be differentiable and let x R k. Furthermore, let x opt be a minimizer of f. If f is L-smooth then L fx fx fx opt L x xopt }
2 If f is -strongl convex we have x xopt fx fx opt fx Proof. Appl the definition of smoothness and strong convexit at the points x and x opt and minimize the resulting quadratic form. Lemma B. Errors for primal-dual pairs. Consider F, f s, g s,, ˆx s,, and ŷ, as defined in, 3, 5, 6, and 7, respectivel. Then for all R n, Furthermore let f s, ˆx s, f opt s, R LnR L + g s, g opt s,, Then for all R n, x opt s, = argmin f s,x and opt s, = argmin g s,x. ˆx s, x opt s, nr opt s,. Proof. Because F is nr L smooth, f s, is nr L + smooth. Consequentl, for all x R d we have f s, x f opt s, xopt Since we know that x opt s, = s AT opt s, and AAT nr I, s, nr L + f s, ˆx x, f s, x opt s, nr L + x x opt s, s AT s AT opt s, = nr L + opt s, AA T nr nr L + opt s,. 0 Finall, since each φ i is /L strongl convex, G is n/l strongl convex and hence so is g s,. Therefore, Substituting in 0 ields the result. n opt s, L g s, g s, opt s,. B adding the dual error to both sides of the inequalit in Lemma B., we obtain the following corollar. Corollar B.3 Dual error bounds gap. Consider g s,, gap s,, ˆx s,, and ŷ, as defined in 5, 9, 6, and 7, respectivel. For all s R d and R n, Another corollar arises b combining Lemmas B. and B.3. gap s, ˆx s,, R LnR L + g s, g opt s,. Corollar B.4 Dual error bounds primal gradient. Consider f s,, g s,, and ˆx s,, as defined in 3, 5, and 6, respectivel. For all s R d and R n, f s, ˆx s, nr L + f s, ˆx s, f opt s, R LnR L + g s, g opt s,.
3 Lemma B.5 Gap for primal-dual pairs. Consider F, f s, g s,, gap s,, ˆx s,, and ŷ, as defined in, 3, 5, 9, 6, and 7, respectivel. For an x, s R d Separatel, for an R n and s R d, gap s, x, ŷx = F x + x s. gap s, ˆx s,, = F ˆx s, + g s, + AT. 3 Proof. To prove the first identit, let ŷ = ŷx for brevit. Recall that ŷ i = φ ia T i x argmax i x T a i i φ i i } 4 b definition, and hence x T a i ŷ i φ i ŷ i = φ i a T i x. Observe that gap s, x, ŷ = φi a T i x + φ i ŷ i x T A T ŷ + AT ŷ + x s = φ i a T i x + φ i ŷ i x T a i ŷ i + }} AT ŷ + x s =0 b 4 = AT ŷ + x s n = a i φ ia T i x + x s = F x + x s. For the second identit 3, let ˆx = ˆx s, for brevit. Fix s R d and > 0. Define rx = x s, and note that its Fenchel conjugate is r = + s T. With this notation we can write: Observe that Note also that g s, ma be rewritten as f s, x = F x + rx g s, = G + r A T. ˆx = s AT = argmin x s + x T A T } x = argmin rx + x T A T } x = argmax x T A T rx } x = r A T. g s, = G + AT s T A T = G + T A + T A s T A T = G AT ˆx T A T. Combining the above two observations, and noting that the first implies equalit in the Fenchel-Young inequalit, gap s, ˆx, = f s, ˆx + g s, = F ˆx + G + rˆx + r A T = F ˆx + G ˆx T A T = F ˆx + g s, + AT,
4 proving the claim. C. Convergence analsis of Dual APPA The goal of this section is to establish the convergence rate of Algorithm 3, Dual APPA. It is structured as follows: Mirroring Lemma 3., we establish Lemma C., concerning contraction of the norm of the gradient, F, rather than of the error in function value F F opt. Similar to Lemma 3., this lemma is self-contained and assume onl that F is -strongl convex, but not that it is the ERM objective. We then establish directl a geometric rate upper bound on the value of F over the course of the algorithm, reling, in the process on Lemma C.. Finall, we quer the strong convexit of F to establish that it too is subject to the same rate up to an extra multiplicative factor. Lemma C. Gradient-norm reduction implies contraction. Suppose F : R d R is -strongl convex and that > 0. Define f s : R d R b f s x = fx + x s. For an x 0, x R d, F x + f x0 x F x 0. 5 Corollar C.. Define γ def = / + and let τ be an scalar in the interval [γ,. For an x 0, x R d, F x τ F x 0, 6 whenever f x0 x τ γ F x γ Proof of Lemma C.. Taking gradients of f x0 and f x we have that B the triangle inequalit, Because f x0 is + -strongl convex, Combining the previous two observations proves the claim. F x = F x + x x 0 x x 0 = f x0 x x x 0. F x f x0 x + x x 0. + x x 0 f x0 x f x0 x 0 f x0 x + f x0 x 0 = f x0 x + F x 0. Throughout the remainder of this section, we consider the functions defined in the main setup Section. namel, F, f s, g s,, ˆx s,, and ŷ, as defined in, 3, 5, 6, and 7, respectivel, where F is a -strongl convex sum whose constituent summands are each L-smooth scalar functions operating on the inner product of the variable x R d with a vector a i of Euclidean norm at most R. For notational brevit, we omit the subscript, and we denote iterates of the algorithm with numerical subscripts e.g. x, x,..., and,,... rather than parenthesized superscripts. We begin b bounding the error of the dual regularized ERM problem when the center of regularization changes. This characterizes the initial error at the beginning of each Dual APPA iteration.
5 Lemma C.3 Dual error is bounded across re-centering.. Fix R n and x R d. Let x = ˆx x. Then g x g opt x c g x g opt x + c F x, where c = R LnR L + + nr L + + and c = +. In other words, the dual error g s gs opt is bounded across a re-centering step b multiples of previous sub-optimalit measurements namel, dual error and gradient norm. Proof. We first show how re-centering, from x to x, increases the dualit gap between and x the new center. The dual function value changes from to g x = G + AT x T A T g x = G + AT x T A T = G + AT = g x + AT = g x + AT = g x + x AT x = g x + x x x AT T A T i.e. it increases b x x. Meanwhile, the primal function value changes from f x x = F x + x x to f x x = F x, i.e. it decreases b x x. Hence, in total, the new dualit gap is gap x x, = gap x x, + x x gap x x, + + f xx f x x gap x x, + fx + x + f x x gap x x, + fx + x + F x, where the first inequalit follows b + -strong convexit of f x.
6 Combining the last two chains of inequalities, and abbreviating L = nr L +, we bound the re-centered dual error of : g x g opt x gap x x, gap x x, + R L L g x gx opt + R L L g x gx opt + = R L L + + fx x + F x L + fx + x + F x R L L + g x gx opt + F x g x gx opt + + F x. where the third and fourth inequalities follow from Corollar B.3 the fourth first reling on the L-smoothness of f x, that is: f x x Lf x x fx opt R L L g x gx opt. 8 Recalling the choices of numerical constants c and c, the claim is proven. Finall, we prove that for an appropriate choice of oracle parameter σ independent of the target error ɛ the iterates produced b Dual APPA are bounded above b a geometric sequence. In doing so, it will be convenient to abbreviate the same numerical constant as in 8. c 3 = R LnR L +, Proposition C.4 Convergence rate of Dual APPA in gradient norm. Define γ = / + and let τ be an scalar in [γ,. Define r = max /, τ/ }, τ, 9 and σ = max c 3 τ γ, + γ c + c + c c 3, r r } τ γ c 3 c + c + c c γ In the execution of Dual APPA Algorithm 3, for ever iteration t, where c = F x 0. g xt t g opt x t cr t, and F x t cr t, Proof. The argument proceeds b strong induction on t. We begin the inductive argument with base case t =. For the first invariant, the algorithm takes σ / and we have g x0 g opt x 0 σ g x 0 0 g opt x 0 σ F x 0 σ 4 c 3 because the algorithm explicitl takes 0 = ŷx 0 see Lemma B.4. So this part of the claim holds as r / and σ /.
7 The second invariant holds immediatel as c = F x 0 and r /. We will also explicitl handle the case t = for the second invariant. B 3 and our choice of σ, we have that and so b Corollar C., f x0 x c 3 g x0 g opt x 0 c 3 σ and the claim follows as r τ / F x 0 F x τ F x 0 τ c. τ γ F x 0. + γ Now consider t. Concerning the first invariant, the algorithm takes σ r c + c + c c 3 and so we have g xt t g opt x 0 σ g x t t gx opt t [ c + c c 3 g xt t gx opt σ t + c F x t ] [ c + c c 3 cr t + c cr t ] σ σ cc + c + c c 3 r t cr t. For the second invariant, we have alread handled the case of t =, and so assume that t 3. B Lemmas C. and C., and b the choice of σ, we have that F x t + γ f xt x t + γ F x t + γ c 3 g xt t gx opt t + γ F x t + γ c 3 σ g x t t gx opt t + γ F x t [ + γ c 3 c + c c 3 g xt 3 t gx opt σ t 3 + c F x t 3 ] + γ F x t + γ c 3 σ c + c + c c 3 cr t + γ cr t τ γ cr t + γ cr t τ cr t cr t, where the final inequalit is due to the choice of r τ. This completes the inductive argument. Equipped with Proposition C.4, we translate it into a statement about the convergence rate of Dual APPA in error, rather than in gradient norm, to prove Theorem.0: Proof of Theorem.0. Define γ, τ, r, and σ as in Proposition C.4. In the execution of Dual APPA Algorithm 3, at each iteration t, we claim that F x t F opt nr L ɛ 0 r t+, 4
8 where ɛ 0 = F x 0 F opt. Consequentl, for an ɛ > 0, in order to achieve ɛ error in Dual APPA, it suffices to take a number of iterations T nr L log + log ɛ 0. r ɛ The first part of the claim follows directl from Proposition C.4, using the smoothness and strong convexit of F. The second part follows from a calculation sufficient to make the right hand side of 4 smaller than a given ɛ. D. Convergence analsis of Accelerated APPA The goal of this section is to establish the convergence rate of Algorithm, Accelerated APPA, and prove Theorem.6. Note that the results in this section use nothing about the structure of F other than strong convexit and thus the appl to the general ERM problem ; the are written in greater generalit to make this distinction clear. Aspects of the proofs in this section bear resemblance to the analsis in Shalev-Shwartz & Zhang 04, which achieves similar results in a more specialized setting. The rest of this section is structured as follows: In Lemma D. we show that appling a primal oracle to the inner minimization problem gives us a quadratic lower bound on F x. In Lemma D. we use this lower bound to construct a series of lower bounds for the main objective function f, and accelerate the APPA algorithm, comprising the bulk of the analsis. In Lemma D.3 we show that the requirements of Lemma D. can be met b using a primal oracle that decreases the error b a constant factor. In Lemma D.4 we analze the initial error requirements of Lemma D.. In Lemma D.5 we provide an auxiliar lemma that combines two quadratic functions that we use in the earlier proofs. The proof of Theorem.6 follows immediatel from these lemmas. Lemma D.. Let f R n R be -strongl convex and let f x0,x def = fx + x x 0. Suppose x + satisfies. f x0,x + min x R n f x 0,x + ɛ, Then for def = /, g def = x 0 x +, and all x R n we have fx fx + g + x x 0 + g + ɛ. Note that as = / we are onl losing a factor of in the strong convexit parameter for our lower bound. This allows us to account for the error without loss in our ultimate convergence rates. Proof. Let x opt = argmin x R n f x0,x. Since f is -strongl convex clearl f x0, is + strongl convex and b Lemma B. f x0,x f x0,x opt + x x opt. 5 B Cauch-Schwartz we know + x x + + x x opt + x opt x + + x xopt + + x opt x +,
9 which implies + x x opt + x x x opt x +. On the other hand, since f x0,x + f x0,x opt + ɛ b 5 we have + x+ x opt ɛ and therefore f x0,x f x0,x + f x0,x f x0,x opt ɛ + x x opt ɛ + x x x opt x + ɛ + x x + + ɛ. Now since x x + = x x 0 + g = x x 0 + g, x x 0 + g, and using the fact that f x0,x = fx + x x 0, we have [ ] fx fx g + + g, x x 0 + x x 0 + ɛ. The right hand side of the above equation is a quadratic function. Looking at its gradient with respect to x we see that it obtains its minimum when x = x 0 + g and has a minimum value of fx + g + ɛ. Lemma D.. Let f : R n def R be -strongl convex and suppose that in each iteration t we have ψ t = ψ opt t + x vt such that fx ψ t x for all x. Then if we let ρ def = +, f, x def = fx + x for 3 and t def = x t + ρ / v t, +ρ / +ρ / E[f t,x t+ ] f opt t, ρ 3/ 4 fx t ψ opt t, t def g = t x t+, t+ def v = ρ / v t + ρ [ / t + g t ]. We have that E[fx t ψ opt t ] ρ / t fx 0 ψ opt 0. Proof. Regardless of how t is chosen we know b Lemma D. that for γ = + fx fx t+ gt + x t γ gt and all x Rn + f,x t+ f opt. 6 t t,
10 Thus, for β = ρ / we can let [ ψ t+ x def = βψ t x + β fx t+ gt + x t γ gt + ] t, = β f,x t+ f opt t [ ] ψ opt t + x vt + β + f t,x t+ f opt t, ] = ψ opt t+ + x vt+. [ fx t+ gt + x t γ gt where in the last line we used Lemma D.5. Again, b Lemma D.5 we know that ψ opt t+ = βψ t + β fx t+ gt + f,x t+ f opt t t, + β β vt t γ gt βψ t + βfx t+ β g t + β βγ g t, v t t In the second step we used the following fact: β + f t,x t+ f opt t,. β + β β γ = β + βγ β. Furthermore, expanding the term x t + γ gt and instantiating x with x t in 6 ields fx t+ fx t gt + γ g t, t x t + + f,x t+ f opt t t,. Consequentl we know [ fx t+ ψ opt t+ β[fxt ψ opt β t ] + β ] g t + γβ g t, t x t βv t t + + f,x t+ f opt t t, Note that we have chosen t so that the inner product term equals 0, and we choose β = ρ / which ensures β β + 0. Also, b assumption we know E[f,x t+ f opt t t, ] ρ 3/ 4 fx t ψ opt t, which implies E[fx t+ ψ opt t+ ] β + + ρ 3/ fx t ψ opt t ρ / /fx t ψ opt t. 4 In the final step we are using the fact that + ρ and ρ.
11 Lemma D.3. Under the setting of Lemma D., we have f,x t f opt t t, fxt ψ opt t. In particular in order to achieve E[f,x t+ t ρ 3/ 8 fx t ψ opt t we onl need an oracle that shrinks the function error b a factor of ρ 3/ 8 in expectation. Proof. We know f t,x t fx t = xt t = ρ + ρ / xt v t. We will tr to show the lower bound f opt is larger than t ψopt, t b the same amount. This is because for all x we have f t,x = fx + x t ψ opt t + x vt + x t. The RHS is a quadratic function, whose optimal point is at x = v t + t + and whose optimal value is equal to ψ opt t + v t t + v t t = ψ opt t ρ / xt v t. B definition of ρ, we know f t,x t f opt t, fxt ψ opt t. + +ρ / x t v t is exactl equal to ρ +ρ / x t v t, therefore Remark In the next lemma we show that moving to the regularized problem has the same effect on the primal function value and the lower bound. This is a result of the choice of β in the proof of Lemma D.. However this does not mean the choice of β is ver fragile, we can choose an β that is between the current β and ; the influence to this lemma will be that the increase in primal function becomes smaller than the increase in the lower bound so the lemma continues to hold. Lemma D.4. Let ψ opt 0 = fx 0 + fx 0 f opt, and v 0 = x 0 def, then ψ 0 = ψ opt 0 + x v 0 is a valid lower bound for f. In particular when = LR then fx 0 ψ opt 0 κfx 0 f opt. Proof. This lemma is a direct corollar of Lemma D. with x + = x 0. Lemma D.5. Suppose that for all x we have f x def = ψ + x v and f x = ψ + x v then where αf x + αf x = ψ α + x v α v α = αv + αv and ψ α = αψ + αψ + α α v v Proof. Setting the gradient of αf x + αf x to 0 we know that v α must satisf α v α v + α v α v = 0 and thus Finall, v α = αv + αv. [ ψ α = α ψ + ] [ v α v + α ψ + ] v α v = αψ + αψ + [ α α v v + αα v v ] = αψ + αψ + α α v v
Linear programming: Theory
Division of the Humanities and Social Sciences Ec 181 KC Border Convex Analsis and Economic Theor Winter 2018 Topic 28: Linear programming: Theor 28.1 The saddlepoint theorem for linear programming The
More informationIn applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem
1 Conve Analsis Main references: Vandenberghe UCLA): EECS236C - Optimiation methods for large scale sstems, http://www.seas.ucla.edu/ vandenbe/ee236c.html Parikh and Bod, Proimal algorithms, slides and
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationApproximation of Continuous Functions
Approximation of Continuous Functions Francis J. Narcowich October 2014 1 Modulus of Continuit Recall that ever function continuous on a closed interval < a x b < is uniforml continuous: For ever ɛ > 0,
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationPICONE S IDENTITY FOR A SYSTEM OF FIRST-ORDER NONLINEAR PARTIAL DIFFERENTIAL EQUATIONS
Electronic Journal of Differential Equations, Vol. 2013 (2013), No. 143, pp. 1 7. ISSN: 1072-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu ftp ejde.math.txstate.edu PICONE S IDENTITY
More informationNotes on Convexity. Roy Radner Stern School, NYU. September 11, 2006
Notes on Convexit Ro Radner Stern School, NYU September, 2006 Abstract These notes are intended to complement the material in an intermediate microeconomic theor course. In particular, the provide a rigorous
More informationUC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 6 Fall 2009
UC Berkele Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 6 Fall 2009 Solution 6.1 (a) p = 1 (b) The Lagrangian is L(x,, λ) = e x + λx 2
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationLagrange duality. The Lagrangian. We consider an optimization program of the form
Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization
More informationLinear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak- Lojasiewicz Condition
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polak- Lojasiewicz Condition Hamed Karimi, Julie Nutini, and Mark Schmidt Department of Computer Science, Universit of British Columbia
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationLecture 16: FTRL and Online Mirror Descent
Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which
More informationLagrange Relaxation and Duality
Lagrange Relaxation and Duality As we have already known, constrained optimization problems are harder to solve than unconstrained problems. By relaxation we can solve a more difficult problem by a simpler
More informationIteration-complexity of first-order penalty methods for convex programming
Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)
More informationCS711008Z Algorithm Design and Analysis
CS711008Z Algorithm Design and Analysis Lecture 8 Linear programming: interior point method Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 / 31 Outline Brief
More informationTHE INVERSE FUNCTION THEOREM
THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationAn interior-point gradient method for large-scale totally nonnegative least squares problems
An interior-point gradient method for large-scale totally nonnegative least squares problems Michael Merritt and Yin Zhang Technical Report TR04-08 Department of Computational and Applied Mathematics Rice
More informationInterior-Point Methods for Linear Optimization
Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationQuiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006
Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in
More informationConvex Optimization Conjugate, Subdifferential, Proximation
1 Lecture Notes, HCI, 3.11.211 Chapter 6 Convex Optimization Conjugate, Subdifferential, Proximation Bastian Goldlücke Computer Vision Group Technical University of Munich 2 Bastian Goldlücke Overview
More informationCubic regularization of Newton s method for convex problems with constraints
CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationPrimal-Dual Interior-Point Methods for Linear Programming based on Newton s Method
Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology. The Problem The logarithmic barrier approach
More informationSelf Equivalence of the Alternating Direction Method of Multipliers
Self Equivalence of the Alternating Direction Method of Multipliers Ming Yan Wotao Yin August 11, 2014 Abstract In this paper, we show interesting self equivalence results for the alternating direction
More information4 Inverse function theorem
Tel Aviv Universit, 2013/14 Analsis-III,IV 53 4 Inverse function theorem 4a What is the problem................ 53 4b Simple observations before the theorem..... 54 4c The theorem.....................
More informationLocally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem
56 Chapter 7 Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem Recall that C(X) is not a normed linear space when X is not compact. On the other hand we could use semi
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming
E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program
More informationA Greedy Framework for First-Order Optimization
A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts
More informationL p Spaces and Convexity
L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationLogarithmic Regret Algorithms for Strongly Convex Repeated Games
Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationAccelerating Stochastic Optimization
Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz
More informationConvex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version
Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationFenchel Duality between Strong Convexity and Lipschitz Continuous Gradient
Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient Xingyu Zhou The Ohio State University zhou.2055@osu.edu December 5, 2017 Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 1
More informationUC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009
UC Berkeley Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 5 Fall 2009 Reading: Boyd and Vandenberghe, Chapter 5 Solution 5.1 Note that
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationOnline Convex Optimization
Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed
More informationRobust Nash Equilibria and Second-Order Cone Complementarity Problems
Robust Nash Equilibria and Second-Order Cone Complementarit Problems Shunsuke Haashi, Nobuo Yamashita, Masao Fukushima Department of Applied Mathematics and Phsics, Graduate School of Informatics, Koto
More informationExploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms
Exploiting Strong Convexit from Data with Primal-Dual First-Order Algorithms Jialei Wang Lin Xiao Abstract We consider empirical ris minimization of linear predictors with convex loss functions. Such problems
More informationThe Proximal Gradient Method
Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,
More informationWeek 3 September 5-7.
MA322 Weekl topics and quiz preparations Week 3 September 5-7. Topics These are alread partl covered in lectures. We collect the details for convenience.. Solutions of homogeneous equations AX =. 2. Using
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationNonlinear Systems Theory
Nonlinear Systems Theory Matthew M. Peet Arizona State University Lecture 2: Nonlinear Systems Theory Overview Our next goal is to extend LMI s and optimization to nonlinear systems analysis. Today we
More information1. Introduction. An optimization problem with value function objective is a problem of the form. F (x, y) subject to y K(x) (1.2)
OPTIMIZATION PROBLEMS WITH VALUE FUNCTION OBJECTIVES ALAIN B. ZEMKOHO Abstract. The famil of optimization problems with value function objectives includes the minmax programming problem and the bilevel
More informationExact SDP Relaxations for Classes of Nonlinear Semidefinite Programming Problems
Exact SDP Relaxations for Classes of Nonlinear Semidefinite Programming Problems V. Jeyakumar and G. Li Revised Version:August 31, 2012 Abstract An exact semidefinite linear programming (SDP) relaxation
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationPARTIAL SEMIGROUPS AND PRIMALITY INDICATORS IN THE FRACTAL GENERATION OF BINOMIAL COEFFICIENTS TO A PRIME SQUARE MODULUS
PARTIAL SEMIGROUPS AND PRIMALITY INDICATORS IN THE FRACTAL GENERATION OF BINOMIAL COEFFICIENTS TO A PRIME SQUARE MODULUS D. DOAN, B. KIVUNGE 2, J.J. POOLE 3, J.D.H SMITH 4, T. SYKES, AND M. TEPLITSKIY
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationDual Decomposition.
1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationQuantitative stability of the Brunn-Minkowski inequality for sets of equal volume
Quantitative stabilit of the Brunn-Minkowski inequalit for sets of equal volume Alessio Figalli and David Jerison Abstract We prove a quantitative stabilit result for the Brunn-Minkowski inequalit on sets
More informationAdditional Homework Problems
Additional Homework Problems Robert M. Freund April, 2004 2004 Massachusetts Institute of Technology. 1 2 1 Exercises 1. Let IR n + denote the nonnegative orthant, namely IR + n = {x IR n x j ( ) 0,j =1,...,n}.
More informationLecture 5. Theorems of Alternatives and Self-Dual Embedding
IE 8534 1 Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 2 A system of linear equations may not have a solution. It is well known that either Ax = c has a solution, or A T y = 0, c
More informationMAT389 Fall 2016, Problem Set 2
MAT389 Fall 2016, Problem Set 2 Circles in the Riemann sphere Recall that the Riemann sphere is defined as the set Let P be the plane defined b Σ = { (a, b, c) R 3 a 2 + b 2 + c 2 = 1 } P = { (a, b, c)
More informationSolution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark
Solution Methods Richard Lusby Department of Management Engineering Technical University of Denmark Lecture Overview (jg Unconstrained Several Variables Quadratic Programming Separable Programming SUMT
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationLecture 13 Newton-type Methods A Newton Method for VIs. October 20, 2008
Lecture 13 Newton-type Methods A Newton Method for VIs October 20, 2008 Outline Quick recap of Newton methods for composite functions Josephy-Newton methods for VIs A special case: mixed complementarity
More informationSOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1
SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1 Masao Fukushima 2 July 17 2010; revised February 4 2011 Abstract We present an SOR-type algorithm and a
More informationLecture 4: Exact ODE s
Lecture 4: Exact ODE s Dr. Michael Doughert Januar 23, 203 Exact equations are first-order ODE s of a particular form, and whose methods of solution rel upon basic facts concerning partial derivatives,
More informationThe Entropy Power Inequality and Mrs. Gerber s Lemma for Groups of order 2 n
The Entrop Power Inequalit and Mrs. Gerber s Lemma for Groups of order 2 n Varun Jog EECS, UC Berkele Berkele, CA-94720 Email: varunjog@eecs.berkele.edu Venkat Anantharam EECS, UC Berkele Berkele, CA-94720
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationLecture: Duality of LP, SOCP and SDP
1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:
More informationSec. 1.1: Basics of Vectors
Sec. 1.1: Basics of Vectors Notation for Euclidean space R n : all points (x 1, x 2,..., x n ) in n-dimensional space. Examples: 1. R 1 : all points on the real number line. 2. R 2 : all points (x 1, x
More informationA Beckman-Quarles type theorem for linear fractional transformations of the extended double plane
Rose- Hulman Undergraduate Mathematics Journal A Beckman-Quarles tpe theorem for linear fractional transformations of the extended double plane Andrew Jullian Mis a Josh Keilman b Volume 12, No. 2, Fall
More informationImplications of the Constant Rank Constraint Qualification
Mathematical Programming manuscript No. (will be inserted by the editor) Implications of the Constant Rank Constraint Qualification Shu Lu Received: date / Accepted: date Abstract This paper investigates
More informationFinite Dimensional Optimization Part III: Convex Optimization 1
John Nachbar Washington University March 21, 2017 Finite Dimensional Optimization Part III: Convex Optimization 1 1 Saddle points and KKT. These notes cover another important approach to optimization,
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationSecond Proof: Every Positive Integer is a Frobenius Number of Three Generators
International Mathematical Forum, Vol., 5, no. 5, - 7 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.988/imf.5.54 Second Proof: Ever Positive Integer is a Frobenius Number of Three Generators Firu Kamalov
More informationExamination paper for TMA4180 Optimization I
Department of Mathematical Sciences Examination paper for TMA4180 Optimization I Academic contact during examination: Phone: Examination date: 26th May 2016 Examination time (from to): 09:00 13:00 Permitted
More informationPrimal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions
Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function
More informationMath 361: Homework 1 Solutions
January 3, 4 Math 36: Homework Solutions. We say that two norms and on a vector space V are equivalent or comparable if the topology they define on V are the same, i.e., for any sequence of vectors {x
More informationREVIEW OF DIFFERENTIAL CALCULUS
REVIEW OF DIFFERENTIAL CALCULUS DONU ARAPURA 1. Limits and continuity To simplify the statements, we will often stick to two variables, but everything holds with any number of variables. Let f(x, y) be
More informationGradient Sliding for Composite Optimization
Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this
More informationLMI Methods in Optimal and Robust Control
LMI Methods in Optimal and Robust Control Matthew M. Peet Arizona State University Lecture 15: Nonlinear Systems and Lyapunov Functions Overview Our next goal is to extend LMI s and optimization to nonlinear
More informationA sharp Blaschke Santaló inequality for α-concave functions
Noname manuscript No will be inserted b the editor) A sharp Blaschke Santaló inequalit for α-concave functions Liran Rotem the date of receipt and acceptance should be inserted later Abstract We define
More informationPDE Constrained Optimization selected Proofs
PDE Constrained Optimization selected Proofs Jeff Snider jeff@snider.com George Mason University Department of Mathematical Sciences April, 214 Outline 1 Prelim 2 Thms 3.9 3.11 3 Thm 3.12 4 Thm 3.13 5
More informationarxiv: v7 [math.oc] 22 Feb 2018
A SMOOTH PRIMAL-DUAL OPTIMIZATION FRAMEWORK FOR NONSMOOTH COMPOSITE CONVEX MINIMIZATION QUOC TRAN-DINH, OLIVIER FERCOQ, AND VOLKAN CEVHER arxiv:1507.06243v7 [math.oc] 22 Feb 2018 Abstract. We propose a
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationLecture 8. Strong Duality Results. September 22, 2008
Strong Duality Results September 22, 2008 Outline Lecture 8 Slater Condition and its Variations Convex Objective with Linear Inequality Constraints Quadratic Objective over Quadratic Constraints Representation
More informationSecond Order Optimality Conditions for Constrained Nonlinear Programming
Second Order Optimality Conditions for Constrained Nonlinear Programming Lecture 10, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk)
More informationOn the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method
Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical
More information8. Conjugate functions
L. Vandenberghe EE236C (Spring 2013-14) 8. Conjugate functions closed functions conjugate function 8-1 Closed set a set C is closed if it contains its boundary: x k C, x k x = x C operations that preserve
More informationSome Results Concerning Uniqueness of Triangle Sequences
Some Results Concerning Uniqueness of Triangle Sequences T. Cheslack-Postava A. Diesl M. Lepinski A. Schuyler August 12 1999 Abstract In this paper we will begin by reviewing the triangle iteration. We
More informationLecture 5. The Dual Cone and Dual Problem
IE 8534 1 Lecture 5. The Dual Cone and Dual Problem IE 8534 2 For a convex cone K, its dual cone is defined as K = {y x, y 0, x K}. The inner-product can be replaced by x T y if the coordinates of the
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian
More informationAccelerated Randomized Primal-Dual Coordinate Method for Empirical Risk Minimization
Accelerated Randomized Primal-Dual Coordinate Method for Empirical Risk Minimization Lin Xiao (Microsoft Research) Joint work with Qihang Lin (CMU), Zhaosong Lu (Simon Fraser) Yuchen Zhang (UC Berkeley)
More informationA GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE
A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-14-1 June 30,
More informationStability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games
Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,
More informationRelative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent
Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order
More informationStochastic model-based minimization under high-order growth
Stochastic model-based minimization under high-order growth Damek Davis Dmitriy Drusvyatskiy Kellie J. MacPhee Abstract Given a nonsmooth, nonconvex minimization problem, we consider algorithms that iteratively
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationLECTURE 10 LECTURE OUTLINE
LECTURE 10 LECTURE OUTLINE Min Common/Max Crossing Th. III Nonlinear Farkas Lemma/Linear Constraints Linear Programming Duality Convex Programming Duality Optimality Conditions Reading: Sections 4.5, 5.1,5.2,
More informationOrthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6
Chapter 6 Orthogonality 6.1 Orthogonal Vectors and Subspaces Recall that if nonzero vectors x, y R n are linearly independent then the subspace of all vectors αx + βy, α, β R (the space spanned by x and
More informationProblem 1 (Exercise 2.2, Monograph)
MS&E 314/CME 336 Assignment 2 Conic Linear Programming January 3, 215 Prof. Yinyu Ye 6 Pages ASSIGNMENT 2 SOLUTIONS Problem 1 (Exercise 2.2, Monograph) We prove the part ii) of Theorem 2.1 (Farkas Lemma
More information