Primal/Dual Decomposition Methods

Size: px

Start display at page:

Download "Primal/Dual Decomposition Methods"

Nathaniel Lawson
5 years ago
Views:

1 Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC Convex Optimization Fall , HKUST, Hong Kong

2 Outline of Lecture Subgradients Subgradient methods Primal decomposition Dual decomposition Summary (Acknowledgement to Stephen Boyd for material for this lecture.) Daniel P. Palomar 1

3 Gradient and First-Order Approximation Recall basic inequality for convex differentiable f: f (y) f (x) + f (x) T (y x) y First-order approximation of f at x is a global underestimator, a supporting hyperplane. What if f is not differentiable? The answer is given by the concept of subgradient. Daniel P. Palomar 2

4 Subgradient of a Function g is a subgradient of f (not necessarily convex) at x if f (y) f (x) + g T (y x) y Daniel P. Palomar 3

5 Subgradient of a Function g is a subgradient if and only if f (x) + g T (y x) is a global affine underestimator of f. If f is convex and differentiable, then f (x) is a subgradient of f at x. Subgradients come up in several contexts: algorithms for nondifferentiable convex optimization convex analysis, e.g., optimality conditions and duality for nondifferentiable problems. Daniel P. Palomar 4

6 Example of Subgradient Consider f = max {f 1, f 2 } with f 1 and f 2 convex and differentiable Subgradient at point x: If f 1 (x) > f 2 (x), the subgradient is unique g = f 1 (x). If f 2 (x) > f 1 (x), the subgradient is unique g = f 2 (x). If f 1 (x) = f 2 (x), the subgradients form an interval [ f 1 (x), f 2 (x)]. Daniel P. Palomar 5

7 Subdifferential The set of all subgradients of f at x is called the subdifferential of f at x and is denoted f (x). f (x) is a closed convex set (can be empty). If f is convex: f (x) is nonempty for x relint dom f if f is differentiable at x, then f (x) = { f (x)} (i.e., a singleton). If f (x) = {g}, then f is differentiable at x and g = f (x). Daniel P. Palomar 6

8 Consider f (x) = x : Example of Subdifferential Daniel P. Palomar 7

9 Subgradient Calculus Weak subgradient calculus: formulas for finding one subgradient g f (x). Strong subgradient calculus: formulas for finding the whole subdifferential f (x), i.e., all subgradients of f at x. Many algorithms for nondifferentiable convex optimization require only one subgradient, so weak calculus suffices. Some algorithms and optimality conditions need the whole subdifferential. In practice, if you can compute f (x), you can usually compute a subgradient g f (x). Daniel P. Palomar 8

10 Some Basic Rules (From now on, we will assume that f is convex and x relint dom f.) f (x) = { f (x)} if f is differentiable at x Scaling: (αf) = α f (assuming α > 0) Addition: (f 1 + f 2 ) = f 1 + f 2 (addition of sets) Affine transformation of variables: g (x) = A T f (Ax + b) Finite pointwise maximum: if f = max i f i, then if g (x) = f (Ax + b), then f (x) = Co { f i f i (x) = f (x)}, i.e., convex hull of union of subdifferentials of active functions at x. Daniel P. Palomar 9

11 Optimality Conditions: Unconstrained Case Recall that for f convex and differentiable, x minimizes f (x) if and only if: 0 = f (x ). The generalization to nondifferentiable convex f is 0 f (x ). Proof. By definition: f (y) f (x ) + 0 T (y x ) y. Daniel P. Palomar 10

12 Example: Piecewise Linear Maximization We want to minimize f (x) = max i { a T i x + b i }. x minimizes f (x) 0 f (x ) = Co { a i a T i x + b i = f (x ) } there is a λ with λ 0, 1 T λ = 1, λ i a i = 0 where λ i = 0 if a T i x + b i < f (x ). Interestingly, these are exactly the KKT conditions for the problem in epigraph form: minimize t,x t subject to a T i x + b i t, i = 1,, m. i Daniel P. Palomar 11

13 Optimality Conditions: Constrained Case minimize x f 0 (x) subject to f i (x) 0, i = 1,, m where each f i is convex, defined on R n (hence subdifferentiable) and strict feasibility holds (Slater s condition). The KKT necessary and sufficient conditions are f i (x ) 0, λ i 0 0 f 0 (x ) + m λ i f i (x ) i=1 λ i f i (x ) = 0 (complementary slackness). Daniel P. Palomar 12

14 Numerical Methods for Nondifferentiable Problems In R, we can always use the bisection method for nondifferentiable functions to reduce the interval (uncertainty) by half at each step. Can we generalize this to R n? The problem is that R n is not ordered, as opposed to R. The answer is: yes. These methods are called localization methods: cutting-plane methods (date back to the 1960s in the Russian literature): the uncertainty set is a polyhedron ellipsoid method (goes back to the 1970s in the Russian literature): the uncertainty set is an ellipsoid. Another convenient method is the subgradient method. Daniel P. Palomar 13

15 Subgradient Method Subgradient method is a simple algorithm (looks similar to a gradient method) to minimize a nondifferentiable convex function f: x (k+1) = x (k) α k g (k) where x (k) is the kth iterate g (k) is any subgradient of f at x (k) α k > 0 is the kth stepsize Note that it is not a descent method (unlike a gradient method), so we need to keep track of the best point so far: f (k) best = ( min f x (i)). i=1,,k Daniel P. Palomar 14

16 Different stepsize methods: Stepsize Rules constant stepsize: α k = α constant step length: α k = γ/ g (k) (so x (k+1) x (k) 2 = γ) square summable but not summable: stepsizes satisfying α 2 k <, α k = k=1 k=1 nonsummable diminishing: stepsizes satisfying lim α k = 0, k k=1 α k =. Daniel P. Palomar 15

17 Convergence Results Under some technical conditions (boundedness of optimum value, boundedness of subgradients by G), the limiting value of the subgradient method f = lim k f (k) best satisfies: constant stepsize: f f G 2 α/2, i.e., converges to G 2 α/2- suboptimal (converges to f if f differentiable and α small enough) constant step length: suboptimal f f Gγ/2, i.e., converges to Gγ/2- diminishing stepsize rule: f = f, i.e., converges. Daniel P. Palomar 16

18 Example: Piecewise Linear Minimization Consider the following nondifferentiable optimization problem: minimize f (x) = max i { a T i x + b i }. To find a subgradient, simply choose an index j for which and take g = a j. The subgradient update is a T j x + b j = max i { a T i x + b i } x (k+1) = x (k) α k a (k) j. Daniel P. Palomar 17

19 Example: Piecewise Linear Minimization (II) Problem instance with n = 20 variables, m = 100 terms, f 1.1 Constant step length, first 100 iterations, f (k) f : Daniel P. Palomar 18

20 Example: Piecewise Linear Minimization (III) Constant step length, f (k) best f : Daniel P. Palomar 19

21 Example: Piecewise Linear Minimization (IV) Diminishing stepsize rule (α k stepsize rule (α k = 1/k): = 0.1/ k) and square summable Daniel P. Palomar 20

22 Projected Subgradient Method for Constrained Optimization Consider the following constrained optimization problem: minimize x f (x) subject to x X. The projected subgradient method guarantees that each update produces a feasible point via a projection: [ x (k+1) = x (k) α k g (k)] X where [ ] X denotes projection onto the convex set X. Daniel P. Palomar 21

23 Decomposition Methods The idea of a decomposition method is to solve a problem by solving smaller subproblems coordinated by a master problem. Different reasons to use a decomposition method: to allow the resolution of a problem otherwise unsolvable for memory reasons (useful in areas such as biology or image processing) to speed up the resolution of the problem via parallel computation to solve the problem in a distributed way (desirable for some wireless networks) to derive nice, insightful, and efficient numerical algorithms as alternative to the use of general-purpose interior-point methods. Daniel P. Palomar 22

24 Decomposition Methods (II) In some (few) fortunate cases, problems decouple naturally: minimize f 1 (x 1 ) + f 2 (x 2 ) x subject to x 1 X 1, x 2 X 2. We can solve for x 1 and x 2 separately (in parallel). Even if they are solved sequentially, there is still the advantage of memory and of speed (if the computational effort is superlinear in problem size). Daniel P. Palomar 23

25 Decomposition Methods (III) In general, however, problems do not decouple so easily and that s when can resort to decomposition methods. There are two main types of decompostion methods: primal decomposition: deals with complicating variables that couple the subproblems the primal master problem controls directly the resources dual decomposition: deals with complicating constraints that couple the subproblems the dual master problem controls the prices of the resources. Daniel P. Palomar 24

26 Primal Decomposition Consider the following problem with a coupling variable: minimize x,y f 1 (x 1, y) + f 2 (x 2, y) subject to x 1 X 1, x 2 X 2 y Y. y is the complicating variable or coupling variable. When y is fixed the problem is separable in x 1 and x 2 and then decouples into two subproblems that can be solved independently. x 1 and x 2 are local or private variables; and y can be interpreted as a global or public variable that serves as an interface or boundary variable between the two subproblems. Daniel P. Palomar 25

27 Primal Decomposition (II) For a given fixed y we define the subproblems: subproblem 1 : minimize x1 X 1 f 1 (x 1, y) subproblem 2 : minimize x2 X 2 f 2 (x 2, y) with optimal values f 1 (y) and f 2 (y). Since min x,y f min y min x f, it follows that the original problem is equivalent to the master primal problem: minimize y Y f 1 (y) + f 2 (y). Daniel P. Palomar 26

28 Primal Decomposition (III) Observations: subproblems can be solved independently for a given y we don t have a closed-form expression for each function fi its corresponding gradient or subgradient (differentiable?) (y) and instead, to evaluate each function fi to solve an optimization problem (y) at some point y we need interestingly, we can easily obtain a subgradient of fi when we evaluate the function. (y) for free Daniel P. Palomar 27

29 Primal Decomposition: Solving the Master Problem If the original problem is convex, so is master problem. To solve the master problem, we can use different methods such as bisection (if y is scalar) gradient or Newton method (if f i differentiable) subgradient, cutting-plane, or ellipsoid method. The subgradient method is very simple and allows itself to distributed implementation; however, its converge is slow in practice. Projected subgradient method: y(k + 1) = [y(k) α k (s 1 (k) + s 2 (k))] Y where s i (k) is a subgradient of f i (y (k)). Daniel P. Palomar 28

30 Primal Decomposition Algorithm repeat 1. Solve the suproblems: Find x 1 (k) X 1 that minimizes f 1 (x 1 (k), y(k)), and a subgradient s 1 (k) f 1 (y(k)). Find x 2 (k) X 2 that minimizes f 2 (x 2 (k), y(k)), and a subgradient s 2 (k) f 2 (y(k)). 2. Update the complicating variable to minimize the primal master subproblem: 3. k = k + 1 y(k + 1) = [y(k) α k (s 1 (k) + s 2 (k))] Y. Daniel P. Palomar 29

31 Primal Decomposition: Subgradients Lemma: Let f (y) be the optimal value of the convex problem minimize x f 0 (x) subject to h i (x) y i, i = 1,, m. A subgradient of f (y) is λ (y). Proof. The Lagrangian is L (x, λ) = f 0 (x) + λ T (h (x) y). Then, f (y 0 ) = f 0 (x (y 0 )) = g (λ (y 0 )) L (x, λ (y 0 )) = f 0 (x) + λ T (h (x) y 0 ) = f 0 (x) + λ T (h (x) y) + λ T (y y 0 ) f 0 (x) + λ T (y y 0 ) y Daniel P. Palomar 30

32 where the last inequality holds for any x such that h (x) y. In particular, f (y 0 ) min h(x) y f 0 (x) + λ T (y y 0 ) = f (y) + λ T (y y 0 ) or f (y) f (y 0 ) λ T (y y 0 ). Lemma: Let f (y) be the optimal value of the convex problem minimize x f 0 (x, y) subject to h i (x) y i, i = 1,, m. A subgradient of f (y) is (s 0 (x (y), y) λ (y)). Daniel P. Palomar 31

33 Dual Decomposition Consider the following problem with a coupling constraint: minimize f 1 (x 1 ) + f 2 (x 2 ) x subject to x 1 X 1, x 2 X 2 h 1 (x 1 ) + h 2 (x 2 ) h 0 h 1 (x 1 ) + h 2 (x 2 ) h 0 is the complicating or coupling constraint. If the coupling constraint is relaxed with a Lagrange multiplier λ, then the problem decouples into two subproblems that can be solved independently. x 1 and x 2 are local or private variables; and λ can be interpreted as a global or public price variable that serves as an interface or boundary variable between the two subproblems. Daniel P. Palomar 32

34 Dual Decomposition (II) The partial Lagrangian after relaxing the coupling constraint is: L (x, λ) = f 1 (x 1 ) + f 2 (x 2 ) + λ T (h 1 (x 1 ) + h 2 (x 2 ) h 0 ). The dual function is g (λ) = inf x X L (x, λ) } = inf x1 X 1 {f 1 (x 1 ) + λ T h 1 (x 1 ) } + inf x2 X 2 {f 2 (x 2 ) + λ T h 2 (x 2 ) λ T h 0 which clearly decouples. Daniel P. Palomar 33

35 Dual Decomposition (III) For a given fixed λ we define the subproblems: subproblem 1 : minimize x1 X 1 f 1 (x 1 ) + λ T h 1 (x 1 ) subproblem 2 : minimize x2 X 2 f 2 (x 2 ) + λ T h 2 (x 2 ) with optimal values g 1 (λ) and g 2 (λ). From strong duality, the original problem is equivalent to the master dual problem: maximize λ 0 g 1 (λ) + g 2 (λ) λ T h 0. Daniel P. Palomar 34

36 Dual Decomposition (IV) Observations: subproblems can be solved independently for a given λ we don t have a closed-form expression for each function g i (λ) and its corresponding gradient or subgradient (differentiable?) instead, to evaluate each function g i (λ) at some point λ we need to solve an optimization problem as in the primal case, we can easily obtain a subgradient of g i (λ) for free when we evaluate the function. Daniel P. Palomar 35

37 Dual Decomposition: Solving the Master Problem The dual master problem is always convex regardless of the original problem. However, we still need convexity to have strong duality (under some constraint qualifications like Slater s condition). To solve the master problem, we can use different methods such as bisection (if λ is scalar) gradient or Newton method (if g i differentiable) subgradient, cutting-plane, or ellipsoid method. Projected subgradient method: λ(k + 1) = [λ(k) + α k (s 1 (k) + s 2 (k) h 0 )] + where s i (k) is a subgradient of g i (λ (k)). Daniel P. Palomar 36

38 Dual Decomposition Algorithm repeat 1. Solve the suproblems: Find x 1 (k) X 1 that minimizes f 1 (x 1 (k)) + λ(k) T h 1 (x 1 (k)), and a subgradient s 1 (k) g 1 (λ(k)). Find x 2 (k) X 2 that minimizes f 2 (x 2 (k)) + λ(k) T h 2 (x 2 (k)), and a subgradient s 2 (k) g 2 (λ(k)). 2. Update the complicating variable to minimize the primal master subproblem: 3. k = k + 1 λ(k + 1) = [λ(k) + α k (s 1 (k) + s 2 (k) h 0 )] +. Daniel P. Palomar 37

39 Dual Decomposition: Subgradients Lemma: Let g (λ) be the dual function corresponding to the problem minimize x f 0 (x) subject to h i (x) 0, i = 1,, m. A subgradient of g (λ) is h (x (λ)). Proof. The Lagrangian is L (x, λ) = f 0 (x) + λ T h (x) and the dual function g (λ) = inf x L (x, λ). Then, g (λ) = inf x f 0 (x) + λ T h (x) f 0 (x (λ 0 )) + λ T h (x (λ 0 )) = f 0 (x (λ 0 )) + λ T 0 h (x (λ 0 )) + (λ λ 0 ) T h (x (λ 0 )) = g (λ 0 ) + (λ λ 0 ) T h (x (λ 0 )). Daniel P. Palomar 38

40 Summary We have described the concept of subgradient as a generalization of gradient. We have considered subgradient methods as formally similar to gradient methods. Finally, we have derived the two basic decomposition techniques: primal decomposition dual decomposition. Daniel P. Palomar 39

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives Subgradients subgradients and quasigradients subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE392o, Stanford University Basic inequality recall basic