Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725
|
|
- Gerald Collins
- 6 years ago
- Views:
Transcription
1 Dual Methods Lecturer: Ryan Tibshirani Conve Optimization /
2 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple. Proimal Newton method: let (0) R n, and repeat: v (k) = argmin v g( (k 1) ) T v vt 2 g( (k 1) )v + h( (k 1) + v) (k) = (k 1) + t k v (k), k = 1, 2, 3,... Step sizes are typically chosen by backtracking Iterations here are typically very epensive (computing d (k) is typically a formidable task) But typically very few iterations are needed until convergence: under appropriate conditions, get local quadratic convergence 2
3 Reminder: conjugate functions Recall that given f : R n R, the function f (y) = ma y T f() is called its conjugate Conjugates appear frequently in dual programs, since f (y) = min f() y T If f is closed and conve, then f = f. Also, f (y) y f() argmin z f(z) y T z If f is strictly conve, then f (y) = argmin z f(z) y T z 3
4 Proof details We will show that f (y) y f(), assuming that f is conve and closed Proof of = : Suppose y f(). Then M y, the set of maimizers of y T z f(z) over z. But f (y) = ma z y T z f(z) and f (y) = cl(conv( z My {z})). Thus f (y). Proof of = : From what we showed above, if f (y), then y f (), but f = f. Clearly y f() argmin z f(z) y T z Lastly if f is strictly conve, then we know that f(z) y T z has a unique minimizer over z, and this must be f (y) 4
5 Outline Today: Dual (sub)gradient methods Dual decomposition Augmented Lagrangians A peak at ADMM 5
6 Dual (sub)gradient methods Even if we can t derive dual (conjugate) in closed form, we can still use dual-based subgradient or gradient methods Eample: consider the problem min f() subject to A = b Its dual problem is ma u f ( A T u) b T u where f is conjugate of f. Defining g(u) = f ( A T u) b T u, note that g(u) = A f ( A T u) b 6
7 Therefore, using what we know about conjugates g(u) = A b where argmin z f(z) + u T Az The dual subgradient method (for maimizing the dual objective) starts with an initial dual guess u (0), and repeats for k = 1, 2, 3,... (k) argmin f() + (u (k 1) ) T A u (k) = u (k 1) + t k (A (k) b) Step sizes t k, k = 1, 2, 3,..., are chosen in standard ways 7
8 Recall that if f is strictly conve, then f is differentiable, and so this becomes dual gradient ascent, which repeats for k = 1, 2, 3,... (k) = argmin f() + (u (k 1) ) T A u (k) = u (k 1) + t k (A (k) b) (Difference is that each (k) is unique, here.) Again, step sizes t k, k = 1, 2, 3,... are chosen in standard ways Also, proimal gradients and acceleration can be applied as they would usually 8
9 Lipschitz gradients and strong conveity Assume that f is a closed and conve function. Then f is strongly conve with parameter d f Lipschitz with parameter 1/d Proof of = : Recall, if g strongly conve with minimizer, then g(y) g() + d 2 y 2, for all y Hence defining u = f (u), v = f (v), f( v ) u T v f( u ) u T u + d 2 u v 2 2 f( u ) v T u f( v ) v T v + d 2 u v 2 2 Adding these together, using Cauchy-Schwartz, and rearranging shows that u v 2 u v 2 /d 9
10 10 Convergence guarantees The following results hold from combining the last fact with what we already know about gradient descent: If f is strongly conve with parameter d, then dual gradient ascent with fied step sizes t k = d, k = 1, 2, 3,..., converges at the rate O(1/ɛ) If f is strongly conve with parameter d, and f is Lipschitz with parameter L, then dual gradient ascent with step sizes t k = 2/(1/d + 1/L), k = 1, 2, 3,..., converges at the rate O(log(1/ɛ)) Note that these results describe convergence of the dual objective to its optimal value
11 11 Consider min Dual decomposition B f i ( i ) subject to A = b i=1 Here = ( 1,... B ) R n divides into B blocks of variables, with each i R n i. We can also partition A accordingly A = [A 1... A B ], where A i R m n i Simple but powerful observation, in calculation of (sub)gradient, is that the minimization decomposes into B separate problems: + i + argmin B f i ( i ) + u T A i=1 argmin f i ( i ) + u T A i i, i = 1,... B i
12 12 Dual decomposition algorithm: repeat for k = 1, 2, 3,... (k) i argmin f i ( i ) + (u (k 1) ) T A i i, i = 1,... B i u (k) = u (k 1) + t k ( B i=1 A i (k) i ) b Can think of these steps as: Broadcast: send u to each of the B processors, each optimizes in parallel to find i Gather: collect A i i from each processor, update the global dual variable u 1 u 2 u 3 u
13 13 Dual decomposition with inequality constraints Consider min B f i ( i ) subject to i=1 B A i i b i=1 Dual decomposition (projected subgradient method): repeat for k = 1, 2, 3,... argmin f i ( i ) + (u (k 1) ) T A i i, i = 1,... B i ( ( B ) ) u (k) = u (k 1) + t k A i (k) i b (k) i i=1 where u + denotes the positive part of u, i.e., (u + ) i = ma{0, u i }, i = 1,..., m +
14 14 Price coordination interpretation (Vandenberghe): Have B units in a system, each unit chooses its own decision variable i (how to allocate its goods) Constraints are limits on shared resources (rows of A), each component of dual variable u j is price of resource j Dual update: u + j = (u j ts j ) +, j = 1,... m where s = b B i=1 A i i are slacks Increase price uj if resource j is over-utilized, s j < 0 Decrease price uj if resource j is under-utilized, s j > 0 Never let prices get negative
15 15 Augmented Lagrangian method also known as: method of multipliers Disadvantage of dual ascent: require strong conditions to ensure convergence. Improved by augmented Lagrangian method, also called method of multipliers. We transform the primal problem: min f() + ρ 2 A b 2 2 subject to A = b where ρ > 0 is a parameter. Clearly equivalent to original problem, and objective is strongly conve when A has full column rank. Use dual gradient ascent: repeat for k = 1, 2, 3,... (k) = argmin u (k) = u (k 1) + ρ(a (k) b) f() + (u (k 1) ) T A + ρ 2 A b 2 2
16 16 Notice step size choice t k = ρ, k = 1, 2, 3,... in dual algorithm. Why? Since (k) minimizes f() + (u (k 1) ) T A + ρ 2 A b 2 2 over, we have ( ) 0 f( (k) ) + A T u (k 1) + ρ(a (k) b) = f( (k) ) + A T u (k) This is the stationarity condition for the original primal problem; can show under mild conditions that A (k) b approaches zero (i.e., primal iterates approach feasibility), hence in the limit KKT conditions are satisfied and (k), u (k) approach optimality Advantage: much better convergence properties. Disadvantage: lose decomposability! (Separability is compromised by augmented Lagrangian...)
17 17 Alternating direction method of multipliers Alternating direction method of multipliers or ADMM: the best of both worlds, i.e., we get strong convergence properties, along with decomposability. Consider min,z f() + g(z) subject to A + Bz = c As before, we augment the objective min f() + g(z) + ρ 2 A + Bz c 2 2 subject to A + Bz = c for a parameter ρ > 0. We define augmented Lagrangian L ρ (, z, u) = f() + g(z) + u T (A + Bz c) + ρ 2 A + Bz c 2 2
18 18 ADMM repeats the steps, for k = 1, 2, 3,... (k) = argmin z (k) = argmin z L ρ (, z (k 1), u (k 1) ) L ρ ( (k), z, u (k 1) ) u (k) = u (k 1) + ρ(a (k) + Bz (k) c) Note that the usual method of multipliers would have replaced the first two steps by a joint minimization ( (k), z (k) ) = argmin,z L ρ (, z, u (k 1) )
19 19 Convergence guarantees Under modest assumptions on f, g (these do not require A, B to be full rank), the ADMM iterates satisfy, for any ρ > 0: Residual convergence: r (k) = A (k) Bz (k) c 0 as k, i.e., primal iterates approach feasibility Objective convergence: f( (k) ) + g(z (k) ) f + g, where f + g is the optimal primal objective value Dual convergence: u (k) u, where u is a dual solution For details, see Boyd et al. (2010). Note that we do not generically get primal convergence, but this is true under more assumptions Convergence rate: not known in general, theory is currently being developed, e.g., in Hong and Luo (2012), Deng and Yin (2012), Iutzeler et al. (2014), Nishihara et al. (2015). Roughly, it behaves like a first-order method (or a bit faster)
20 20 ADMM in scaled form It is often easier to epress the ADMM algorithm in a scaled form, where we replace the dual variable u by a scaled variable w = u/ρ. In this parametrization, the ADMM steps are: (k) = argmin z (k) = argmin z f() + ρ 2 A + Bz(k 1) c + w (k 1) 2 2 g(z) + ρ 2 A(k) + Bz c + w (k 1) 2 2 w (k) = w (k 1) + A (k) + Bz (k) c Note that here the kth iterate w (k) is just given by a running sum of residuals: w (k) = w (0) + k ( A (i) + Bz (i) c ) i=1
21 21 Eample: alternating projections Consider finding a point in intersection of conve sets C, D R n : min I C () + I D () To get this into ADMM form, we epress it as min,z I C () + I D (z) subject to z = 0 Each ADMM cycle involves two projections: (k) ( = argmin P C z (k 1) w (k 1)) z (k) ( = argmin P D (k) + w (k 1)) z w (k) = w (k 1) + (k) z (k) Like the classical alternating projections method, but more efficient
22 References S. Boyd and N. Parikh and E. Chu and B. Peleato and J. Eckstein (2010), Distributed optimization and statistical learning via the alternating direction method of multipliers W. Deng and W. Yin (2012), On the global and linear convergence of the generalized alternating direction method of multipliers M. Hong and Z. Luo (2012), On the linear convergence of the alternating direction method of multipliers F. Iutzeler and P. Bianchi and Ph. Ciblat and W. Hachem, (2014), Linear convergence rate for distributed optimization with the alternating direction method of multipliers R. Nishihara and L. Lessard and B. Recht and A. Packard and M. Jordan (2015), A general analysis of the convergence of ADMM L. Vandenberghe, Lecture Notes for EE 236C, UCLA, Spring
Dual Ascent. Ryan Tibshirani Convex Optimization
Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More information10-725/36-725: Convex Optimization Spring Lecture 21: April 6
10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationAlternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization
Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More informationDuality Uses and Correspondences. Ryan Tibshirani Convex Optimization
Duality Uses and Correspondences Ryan Tibshirani Conve Optimization 10-725 Recall that for the problem Last time: KKT conditions subject to f() h i () 0, i = 1,... m l j () = 0, j = 1,... r the KKT conditions
More informationLecture 23: November 19
10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo
More informationMath 273a: Optimization Lagrange Duality
Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler assume function f is proper
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationGradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationDistributed Optimization via Alternating Direction Method of Multipliers
Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition
More informationLecture 23: Conditional Gradient Method
10-725/36-725: Conve Optimization Spring 2015 Lecture 23: Conditional Gradient Method Lecturer: Ryan Tibshirani Scribes: Shichao Yang,Diyi Yang,Zhanpeng Fang Note: LaTeX template courtesy of UC Berkeley
More informationDecentralized Quadratically Approximated Alternating Direction Method of Multipliers
Decentralized Quadratically Approimated Alternating Direction Method of Multipliers Aryan Mokhtari Wei Shi Qing Ling Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania
More informationLecture 16: October 22
0-725/36-725: Conve Optimization Fall 208 Lecturer: Ryan Tibshirani Lecture 6: October 22 Scribes: Nic Dalmasso, Alan Mishler, Benja LeRoy Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationThe Alternating Direction Method of Multipliers
The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationA SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS
A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)
More informationLecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent
10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for
More informationarxiv: v1 [math.oc] 23 May 2017
A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method
More informationDuality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and
More informationLecture 14: Newton s Method
10-725/36-725: Conve Optimization Fall 2016 Lecturer: Javier Pena Lecture 14: Newton s ethod Scribes: Varun Joshi, Xuan Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationLecture 24 November 27
EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve
More informationSolving Dual Problems
Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem
More information2 Regularized Image Reconstruction for Compressive Imaging and Beyond
EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement
More information1 Kernel methods & optimization
Machine Learning Class Notes 9-26-13 Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationRecent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables
Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationLecture 6: September 17
10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 6: September 17 Scribes: Scribes: Wenjun Wang, Satwik Kottur, Zhiding Yu Note: LaTeX template courtesy of UC Berkeley EECS
More informationKarush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationMotivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:
CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through
More information9. Dual decomposition and dual algorithms
EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple
More informationLecture 5: September 15
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS
More informationPreconditioning via Diagonal Scaling
Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationIn applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem
1 Conve Analsis Main references: Vandenberghe UCLA): EECS236C - Optimiation methods for large scale sstems, http://www.seas.ucla.edu/ vandenbe/ee236c.html Parikh and Bod, Proimal algorithms, slides and
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationLecture 6: September 12
10-725: Optimization Fall 2013 Lecture 6: September 12 Lecturer: Ryan Tibshirani Scribes: Micol Marchetti-Bowick Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple
More information10701 Recitation 5 Duality and SVM. Ahmed Hefny
10701 Recitation 5 Duality and SVM Ahmed Hefny Outline Langrangian and Duality The Lagrangian Duality Eamples Support Vector Machines Primal Formulation Dual Formulation Soft Margin and Hinge Loss Lagrangian
More informationDual Decomposition.
1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:
More informationSubgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725
Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal
More informationCoordinate Update Algorithm Short Course Subgradients and Subgradient Methods
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n
More informationConvex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa
Convex Optimization Lecture 12 - Equality Constrained Optimization Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 19 Today s Lecture 1 Basic Concepts 2 for Equality Constrained
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationALADIN An Algorithm for Distributed Non-Convex Optimization and Control
ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of
More informationLecture 14: Optimality Conditions for Conic Problems
EE 227A: Conve Optimization and Applications March 6, 2012 Lecture 14: Optimality Conditions for Conic Problems Lecturer: Laurent El Ghaoui Reading assignment: 5.5 of BV. 14.1 Optimality for Conic Problems
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationLecture 7: September 17
10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationNonparametric Quantile Regression for Prediction of Parking Data
Nonparametric Quantile Regression for Prediction of Parking Data Tanner Fiez Department of Electrical Engineering University of Washington I. INTRODUCTION In this paper we will eplore the approach taken
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationA Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations
A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations Chuangchuang Sun and Ran Dai Abstract This paper proposes a customized Alternating Direction Method of Multipliers
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationConvex Optimization Overview (cnt d)
Conve Optimization Overview (cnt d) Chuong B. Do November 29, 2009 During last week s section, we began our study of conve optimization, the study of mathematical optimization problems of the form, minimize
More informationTopic 8c Multi Variable Optimization
Course Instructor Dr. Raymond C. Rumpf Office: A 337 Phone: (915) 747 6958 E Mail: rcrumpf@utep.edu Topic 8c Multi Variable Optimization EE 4386/5301 Computational Methods in EE Outline Mathematical Preliminaries
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationUVA CS 4501: Machine Learning
UVA CS 4501: Machine Learning Lecture 16 Extra: Support Vector Machine Optimization with Dual Dr. Yanjun Qi University of Virginia Department of Computer Science Today Extra q Optimization of SVM ü SVM
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationDistributed Optimization: Analysis and Synthesis via Circuits
Distributed Optimization: Analysis and Synthesis via Circuits Stephen Boyd Prof. S. Boyd, EE364b, Stanford University Outline canonical form for distributed convex optimization circuit intepretation primal
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationConvexity II: Optimization Basics
Conveity II: Optimization Basics Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 See supplements for reviews of basic multivariate calculus basic linear algebra Last time: conve sets and functions
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationAsynchronous Non-Convex Optimization For Separable Problem
Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationConstrained Optimization
1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange
More informationLecture 23: November 21
10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationGradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725
Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationECE Optimization for wireless networks Final. minimize f o (x) s.t. Ax = b,
ECE 788 - Optimization for wireless networks Final Please provide clear and complete answers. PART I: Questions - Q.. Discuss an iterative algorithm that converges to the solution of the problem minimize
More informationLecture 8: February 9
0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationRandomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints
Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline
More informationEE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)
EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in
More informationComputational Optimization. Constrained Optimization Part 2
Computational Optimization Constrained Optimization Part Optimality Conditions Unconstrained Case X* is global min Conve f X* is local min SOSC f ( *) = SONC Easiest Problem Linear equality constraints
More informationQuasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization
Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationLecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming
Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 11 and
More informationBregman Alternating Direction Method of Multipliers
Bregman Alternating Direction Method of Multipliers Huahua Wang, Arindam Banerjee Dept of Computer Science & Engg, University of Minnesota, Twin Cities {huwang,banerjee}@cs.umn.edu Abstract The mirror
More informationAn Optimization-based Approach to Decentralized Assignability
2016 American Control Conference (ACC) Boston Marriott Copley Place July 6-8, 2016 Boston, MA, USA An Optimization-based Approach to Decentralized Assignability Alborz Alavian and Michael Rotkowitz Abstract
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationDistributed Smooth and Strongly Convex Optimization with Inexact Dual Methods
Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Mahyar Fazlyab, Santiago Paternain, Alejandro Ribeiro and Victor M. Preciado Abstract In this paper, we consider a class of
More informationSparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More informationDescent methods. min x. f(x)
Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained
More information