Improved Fast Dual Gradient Methods for Embedded Model Predictive Control

Size: px

Start display at page:

Download "Improved Fast Dual Gradient Methods for Embedded Model Predictive Control"

Howard Mills
5 years ago
Views:

1 Preprints of the 9th World Congress The International Federation of Automatic Control Improved Fast Dual Gradient Methods for Embedded Model Predictive Control Pontus Giselsson Electrical Engineering, Stanford Universit Abstract: Recentl, several authors have suggested the use of first order methods, such as fast dual ascent and the alternating direction method of multipliers, for embedded model predictive control. The main reason is that the can be implemented using simple arithmetic operations onl. In this paper, we present results that enable for significant improvements when using fast dual ascent for embedded model predictive control. These improvements rel on a new characterization of the set of matrices that can be used to describe a quadratic upper bound to the negative dual function. For man interesting formulations, it is shown that the provided bound cannot be improved and that it is tighter than bounds previousl presented in the literature. The improved quadratic upper bound is used in fast dual gradient methods as an approimation of the dual function. Since it better approimates the dual function than previous upper bounds, the convergence is improved. The performance enhancement is most significant for ill-conditioned problems. This is illustrated b a numerical evaluation on a AFTI-6 aircraft model where the algorithm outperforms previous proposals of first order methods for embedded control with one to three orders of magnitude.. INTRODUCTION Several authors including O Donoghue et al. 03); Jerez et al. 03); Richter et al. 03); Patrinos and Bemporad 04) have recentl proposed first order optimization methods as appropriate for embedded model predictive control. In O Donoghue et al. 03); Jerez et al. 03), the alternating direction method of multipliers ADMM, see Bod et al. 0)) were used and high computational speeds were reported when implemented on embedded hardware. In Richter et al. 03); Patrinos and Bemporad 04), the optimal control problems arising in model predictive were solved using different formulations of fast dual gradient methods. In Richter et al. 03), the equalit constraints, i.e. the dnamic constraints, are dualized and a diagonal cost and bo constraints are assumed. The resulting dual problem is solved using a fast gradient method. In Patrinos and Bemporad 04), the same splitting as in O Donoghue et al. 03); Jerez et al. 03) is used, but a fast gradient method is used to solve the resulting problem as opposed to ADMM in O Donoghue et al. 03); Jerez et al. 03). In this paper, we will show how to generalize the fast dual gradient methods presented in Richter et al. 03); Patrinos and Bemporad 04) to achieve a faster convergence. Fast gradient methods as used in Richter et al. 03); Patrinos and Bemporad 04) have been around since During the preparation of this paper, the author was a member of the LCCC Linnaeus Center at Lund Universit. Financial support from the Swedish Research Council for the author s Postdoctoral studies at Stanford Universit is gratefull acknowledged. Further, Eric Chu is gratefull acknowledged for constructive feedback and Aleander Domahidi is gratefull acknowledged for suggesting the AFTI-6 control problem as a benchmark. the earl 80 s when the seminal paper Nesterov 983) was published. However, fast gradient methods did not render much attention before the mid 00 s, after which an increased interest has emerged. Several etensions and generalizations of the fast gradient method have been proposed, e.g. in Nesterov 003, 005). In Beck and Teboulle 009), the fast gradient method was generalized to allow for minimization of composite objective functions. Further, a unified framework for fast gradient methods and their generalizations were presented in Tseng 008). To use fast gradient methods for composite minimization, one objective term should be conve and differentiable with a Lipschitz continuous gradient, while the other should be proper, closed, and conve. The former condition is equivalent to the eistence of a quadratic upper bound to the function, with the same curvature in all directions. The curvature is specified b the Lipschitz constant to the gradient. In fast gradient methods, the quadratic upper bound serves as an approimation of the function to be d, since the bound is d in ever iteration of the algorithm. If the quadratic upper bound does not well approimate the function to be d, slow convergence properties are epected. B instead allowing for a quadratic upper bound with different curvature in different directions, as in generalized fast gradient methods Zuo and Lin 0), the bound can closer approimate the function to be d. For an appropriate choice of non-uniform quadratic upper bound, this can significantl improve the performance of the algorithm. In Nesterov, 005, Theorem ), a Lipschitz constant to the gradient of the dual function to strongl conve problems is presented. This result quantifies the curvature of a uniform quadratic upper bound to the negative dual function. This result was improved in Richter et al., 03, Copright 04 IFAC 303

2 Theorem 7) when the primal cost is restricted to being quadratic. Using these quadratic upper bounds, with the same curvature in all directions, as dual function approimation in a fast dual gradient method, ma result in slow convergence. Especiall for ill-conditioned problems where the upper bound does not well approimate the negative dual function. In this paper, the main result is a new characterization of the set of matrices that can be used to describe quadratic upper bounds to the negative dual function. This result generalizes and improves previous results in Nesterov 005); Richter et al. 03) to allow for a bound with different curvature in different directions. The provided set can be used to create a quadratic upper bound to the negative dual function with different curvature in different directions. Using this quadratic upper bound instead of the traditional uniform upper bound in fast dual gradient methods, significant improvements can beachieved.wealsoshowthatinmancases,theresulting quadratic upper bounds cannot be made tighter b using matrices outside this set. In model predictive control, much offline computational effort can be devoted to improve the online eecution time of the solver. This is done, e.g., in eplicit MPC, see Bemporad et al. 00), where the eplicit parametric solution is computed beforehand, and found through a lookup table online. In this paper, the offline computational effort is devoted to choose a matri from the provided set that s the difference, in some metric, between the negative dual function and the resulting quadratic upper bound. The computed matri is the same in all samples in the controller and can therefore be computed offline. The algorithm is evaluated on a pitch control problem in an AFTI-6 aircraft that has previousl been studied in Kapasouris et al. 990); Bemporad et al. 997). This is a challenging problem for first order methods since it is ver ill-conditioned. The numerical evaluation shows that the method presented in this paper outperforms the methods presented in O Donoghue et al. 03); Jerez et al. 03); Richter et al. 03); Patrinos and Bemporad 04) with one to three orders of magnitude. For space consideration, all proofs are omitted from this paper and can be found in the full versionpaper, Giselsson 04).. Notation. PRELIMINARIES AND NOTATION We denote b R, R n, R m n, the sets of real numbers, vectors, and matrices. S n R n n is the set of smmetric matrices, and S n Sn, [S n ] Sn, are the sets of positive [semi] definite matrices. Further, L M and L M where L,M S n denotes L M S n and L M S n respectivel. We also use notation, = T, = T, and H = T H. Finall, I X denotes theindicatorfunctionforthesetx,i.e.i X ) 0, X, else.. Preliminaries In this section, we introduce generalizations of well used concepts. We generalize the notion of strong conveit as well as the notion of Lipschitz continuit of the gradient of conve functions. We also define conjugate functions and state a known result on dual properties of a function and its conjugate. For differentiable and conve functions f : R n R that have a Lipschitz continuous gradient with constant L, we have that f ) f ) L ) holds for all, R n. This is equivalent to that f ) f ) f ), L ) holds for all, R n Nesterov, 003, Theorem..5). In this paper, we allow for a generalized version of the quadratic upper bound ) to f, namel that f ) f ) f ), L 3) holds for all, R n where L S n. The bound ) is obtained b setting L = LI in 3). Remark. For concave functions f, i.e. where f is conve, the Lipschitz condition ) is equivalent to that the following quadratic lower bound f ) f ) f ), L 4) holds for all, R n. The generalized counterpart naturall becomes that f ) f ) f ), L 5) holds for all, R n. Net, we state a Lemma on equivalent characterizations of the condition 3). Lemma. Assume that f : R n R is conve and differentiable. The condition that f ) f ) f ), L 6) holds for some L S n and all, R n is equivalent to that f ) f ), L. 7) holds for all, R n. The standard definition of a differentiable and strongl conve function f : R n R is that it satisfies f ) f ) f ), σ 8) foran, R n, where the modulus σ R describes a lower bound on the curvature of the function. In this paper, the definition 8) is generalized to allow for a quadratic lower bound with different curvature in different directions. Definition 3. A differentiable function f : R n R is strongl conve with matri H if and onl if f ) f ) f ), H holds for all, R n, where H S n. Remark 4. The traditional definition of strong conveit 8) is obtained from Definition 3 b setting H = σi. Lemma 5. Assume that f : R n R is differentiable and strongl conve with matri H. The condition that f ) f ) f ), H 9) 304

3 holds for all, R n is equivalent to that f ) f ), H 0) holds for all, R n. The condition 9) is a quadratic lower bound on the function value, while the condition3) is a quadratic upper bound on the function value. These two properties are linked through the conjugate function f ) sup T f) }. More precisel, we have the following result. Proposition 6. Assume that f : R n R } is closed, proper, and strongl conve with modulus σ on the relative interior of its domain. Then the conjugate function f is conve and differentiable, and f ) = ), where ) = argma T f) }.Further, f is Lipschitz continuous with constant L = σ. A straight-forward generalization is given b the chain-rule and was proven innesterov, 005, Theorem )which also proves the less general Proposition 6). Corollar 7. Assume that f : R n R } is closed, proper, and strongl conve with modulus σ on the relative interior of its domain. Further, define g ) f A). Then g is conve and differentiable, and g ) = A T A), where A) = argma A) T f) }. Further, g is Lipschitz continuous with constant L = A σ. For the case when f) = T H g T, i.e. f is a quadratic, a tighter Lipschitz constant to g ) = f A) was provided in Richter et al., 03, Theorem 7), namel L = AH A T. 3. PROBLEM FORMULATION We consider optimization problems of the form f)h)gb) subject to A = b ) where R n, A R m n, B R p n, b R m. We assume that the following assumption holds throughout the paper: Assumption 8. a) The function f : R n R is differentiable and strongl conve with matri H. b) Theetendedvaluedfunctionsh : R n R }and g : R n R }, are proper, closed, and conve. c) A R m n has full row rank. Remark 9. Eamples of functions that satisf Assumption 8a) and8b) aref) = T Hg T with H S n for Assumption 8a), and g = I X, g =, g = I X, or g = 0 for Assumption 8b). If Assumption 8c) is not satisfied, redundant equalit constraints can, without affecting the solution of ), be removed to satisf the assumption. The optimization problem) can equivalentl be written as f)h)g) ) subject to A = b B = We introduce dual variables λ R m for the equalit constraints A = b and dual variables µ R p for the equalit constraints B =. This gives the following Lagrange dual problem sup λ,µ inf f)h)λ T A b)g)µ T B ) }, [ sup A T λ B T µ) T f) h) } = sup λ,µ b T λ sup µ T g) }] = sup F A T λ B T µ) b T λ g µ) } 3) λ,µ where F is the conjugate function to F := f h and g is the conjugate function to g. For ease of eposition, we introduce ν = λ,µ) R mp, C = [A T B T ] T R mp) n, and c = b,0) R mp and the following function dν) := F C T ν) c T ν = F A T λ B T µ) b T λ. 4) The function F is evaluated b solving an optimization problem. The minimand to this problem is denoted b ν) := argmin F)ν T C } 5) = argmin f)h)λ T Aµ T B }. From Corollar7 we get that the function d is concaveand differentiable with gradient dν) = C ν) c and that d is Lipschitz continuous with constant L = C /λ minh), i.e., that dν ) dν ) L ν ν 6) holds for all ν,ν R mp. As stated in Remark, 6) is equivalent to that the following quadratic lower bound to the concave function d holds for all ν,ν R mp dν ) dν ) dν ),ν ν L ν ν. In the following section we will show that the function d satisfies the following tighter condition dν ) dν ) dν ),ν ν ν ν L 7) for all ν,ν R mp and L S mp that satisfies L CH C T. 4. DUAL FUNCTION PROPERTIES The following theorem states that the function d defined in 4) satisfies 7) for an L CH C T. The results of this section are proven in Giselsson 04). Theorem 0. The function d defined in 4) is concave, differentiable and satisfies dν ) dν ) dν ),ν ν ν ν L 8) for ever ν,ν R mp and L S mp that satisfies L CH C T. Net,weshowthatiff isastronglconvequadraticfunction and h satisfies certain conditions, then Theorem 0 gives the best possible bound of the form 8). Proposition. Assume that f) = T Hζ T with H S n and ζ Rn and that there eists a set X R n with non-empt interior on which h is linear, i.e. 305

4 h) = ξ T X θ X for all X. Further, assume that there eists ν such that ν) intx). Then for an matri L CH C T, there eist ν and ν such that 8) does not hold. Proposition shows that the bound in Theorem 0 is indeed the best obtainable bound of the form 8) if f is a quadratic and h specifies the stated assumptions. Eamples of functions that satisf the assumptions on h in Proposition include linear functions, indicator functions of closed conve constraint sets with non-empt interior, and the -norm. 5. FAST DUAL GRADIENT METHODS In this section, we will describe generalized fast gradient methods and show how the can be applied to solve the dual problem 3). Generalized fast gradient methods can be applied to solve problems of the form l) ψ) 9) where R n, ψ : R n R } is proper, closed and conve, l : R n R is conve, differentiable, and satisfies l ) l ) l ), L 0) for all, R n and some L S n. Before we state the algorithm, we define the generalized pro operator pro L ψ ) := argmin ψ) L and note that pro L ψ L l) ) = = argmin L l) L ψ) } } ) = argmin l) l), L ψ) }. The generalized fast gradient method is stated below. Algorithm. Generalized fast gradient method Set: = 0 R n,t = For k k = pro L ψ k L l k ) ) t k = 4t k ) k = k t k ) k k ) t k ) The standard fast gradient method as presented in Beck and Teboulle 009) is obtained b setting L = LI in Algorithm, where L is the Lipschitz constant to l. The main step of the fast gradient method is to perform a pro-step, i.e., to ) which can be seen as an approimation of the function l ψ. For the standard fast gradient method, l is approimated with a quadratic upper bound that has the same curvature, described b L, in all directions. If this quadratic upper bound is a bad approimation of the function to be d, slow convergence properties are epected. The generalization to allow for a matri L in the algorithm allows for quadratic upper bounds with different curvature in different directions. This enables for quadratic upper bounds that much better approimate the function l and consequentl gives improved convergence. The generalized fast gradient method has a convergence rate of see Zuo and Lin 0)) l ψ k ) l ψ ) 0 L k ) 3) where l ψ := lψ. The convergence rate of the standard fast gradient method as given in Beck and Teboulle 009), is obtained b setting L = LI in 3). The objective here is to appl the generalized fast gradient method to solve the dual problem 3). B introducing gν) = g [0 I]ν), the dual problem 3) can be epressed ma ν dν) gν), where d is defined in 4). As shown in Theorem 0, the function d satisfies the properties required to appl generalized fast gradient methods. Namel that 0) holdsfor anl S mp such that L CH C T. Further, since g is a closed, proper, and conve function so is g, see Rockafellar, 970, Theorem.), and b Rockafellar, 970, Theorem 5.7) so is g. This implies that generalized fast gradient methods, i.e. Algorithm, can be used to solve the dual problem 3). We set d = l and g = ψ, and restrict L = blkdiagl λ,l µ ) to get the following algorithm. Algorithm. Generalized fast dual gradient method Set: z = λ 0 R m,v = µ 0 R p,t = For k k = argmin f)h)z k ) T Av k ) T B } λ k = z k L λ Ak b) µ k = pro Lµ g vk L µ Bk ) t k = 4t k ) z k = λ k t k t )λ k λ k ) k v k = µ k t k )µ k µ k ) t k where k is the primal variable at iteration k that is used to help compute the gradient dν k ), where ν k = z k,v k ). To arrive at the λ k and µ k iterations, we let ξ k = λ k,µ k ), and note that ξ k = pro L g ν k L dν k ) ) 4) = argmin ν ν νk L dν k ) L g [0 I]ν) } [ argmin = z z zk L λ zdν k ) } ] L λ arg min v v vk L µ vdν k ) L µ g v) } [ z k L ] λ = Ak b) pro Lµ g vk L. µ Bk ) In the following proposition we state the convergence rate properties of Algorithm. Proposition. Suppose that Assumption 8 holds. If L = blkdiagl λ,l µ ) S mp is chosen such that L CH C T. Then Algorithm converges with the rate Dν ) Dν k ) ν ν 0 L k ), k 5) 306

5 where D = d g and k is the iteration number. Remark 3. B forming a specific running average of previous primal variables, it is possible to prove a O/k) convergence rate for the distance to the primal variable optimum and a O/k ) convergence rate for the worst case primal infeasibilit, see Patrinos and Bemporad 04). For some choices of conjugate functions g, pro Lµ g ) can be difficult to evaluate. For standard pro operatorsgiven b pro I g )), Moreau decomposition Rockafellar, 970, Theorem 3.5) states that pro I g )proi g) =. In the following proposition, we will generalize this result to hold for the generalized pro-operator used here. Proposition 4. Assume that g : R n R is a proper, closed, and conve function. Then pro L g )L pro L g L) = for ever R n and an L S n. Remark 5. If g = I X where I X is the indicator function, then g is the support function. Evaluating the pro operator ) with g being a support function is difficult. However, through Proposition 4, this can be rewritten to onl require the a projection operation onto the set X. If X is a bo constraint and L is diagonal, then the projection becomes a ma-operation and hence ver cheap to implement. Remark 6. We are not restricted to have one auiliar term g onl. We can have an number of auiliar terms g i that all decompose according to the computations in 4), i.e., we get one pro-operation in the algorithm for ever auiliar term g i. 6. MODEL PREDICTIVE CONTROL In this section, we pose some standard model predictive control problems and show how the can be solved using the methods presented in this paper. The resulting algorithms will have simple arithmetic operations onl which allows for easier implementation in embedded sstems. We also show how to choose the L-matri in each case. Eample 7. We consider MPC optimization problems of the form N T t Q t u T ) t Ru t T NQ f N t=0 subject to t = Φ t Γu t, t = 0,...,N min t ma, t = 0,...,N u min u t u ma, t = 0,...,N 0 = where, t R n, u t R nu, Φ R n n, Γ R n nu and Q S n, R S nu, Q f S n are all diagonal. Letting = 0,..., N,u 0,...,u N ), this can be cast as T H subject to A = b min ma where H, A, b, min, and ma are structured according to. We choose f) = T H, g = 0, and h = I Y where I Y is the indicator function to Y = R N)nNnu min ma }. This implicitl implies that we introduce dual variables λ for the equalit constraints A = b. The algorithm becomes: k = argmin T H I Y )z T A } 6) λ k = z k L λ Ak b ) 7) t k = 4t k ) 8) z k = λ k t k )λ k λ k ) 9) t k where the first step 6) can be implemented as k = ma min H A T z k ) ), ma,min 30) due to the structure of the problem. From Theorem 0, we know that L λ must satisf L λ AH A T. B Assumption 8 we have that A has full row rank, which is common in MPC. Further, A is sparse in MPC which renders L λ = AH A T a good choice. The algorithm requires the computation of L λ z, where z = Ak b, in each iteration. Since L λ = AH A is sparse, this can efficientl be implemented b offline storing the sparse Cholesk factorization R T R = S T L λ S, where R is sparse and upper triangular, and S is a permutation matri. The online computation of L λ z then reduces to one forward and one backward solve, which can be ver efficientl implemented. The algorithm in this eample is a generalization of the algorithm in Richter et al. 03), where the matri L is chosen as L = AH A T I. In the numerical section we will see that this generalization can significantl improve the convergence rate. Net, we present an algorithm that works for arbitrar positive definite cost matrices, and arbitrar linear constraints. Eample 8. We consider the MPC optimization problems of the form N t=0 subject to t = Φ t Γu t, B t d, B u u t d u, 0 =,B N N d N T t Q t u T t Ru t ) T NQ f N t = 0,...,N t = 0,...,N t = 0,...,N where, t R n, u t R nu, Φ R n n, Γ R n nu, B R p n, B u R pu n, B N R pn n, d R p, d u R pu, d N R pn, Q S n,r S nu, and Q f S n. We let = 0,..., N,u 0,...,u N ) and define B = blkdiag B,B N, B u ) where B = blkdiagb,...,b ) and B u = blkdiagb u,...,b u ). We also introduce d = d,...,d,d N,d u,...,d u ).Thisimpliesthatallinequalit constraints are described b B d. Using this notation, the optimization problem can be rewritten as T H subject to A = b B = v v d We let f) = T H, h = I A=b, and g = I Y where Y = R N)nNnu B d}. Since h is the indicator function for the equalit constraints A = b, we do not need to introduce dual variables for those 307

6 Table. Numerical evaluation of the dual gradient method proposed in this paper. eec time ms) nbr iters Algorithm Parameters avg. ma avg. ma 6)-9) L λ = AH A T Richter et al. 03) L λ = AH A T I )-34) L µ = BH B T 0 4 I Patrinos and Bemporad 04) L µ = BH B T I O Donoghue et al. 03); Jerez et al. 03) ρ = O Donoghue et al. 03); Jerez et al. 03) ρ = O Donoghue et al. 03); Jerez et al. 03) ρ = constraints. However, we introduce dual variables µ for B = v. Letting H A = AH A T, the algorithm becomes k = H A T H A AH B T v k b ) B T v k ) 3) µ k = pro Lµ g vk L µ Bk ) 3) t k = 4t k ) 33) v k = µ k t k )µ k µ k ) 34) t k where the k iterate follows from solving min f) I A=b ) v k ) T B }. In an implementation, the k - update can be implemented as in 3). Then, for efficienc, the matri multiplications should be computed offline and stored for online use. Depending on the sparsit of H, A, and B, it might be more efficient to use the KKT-sstem from which 3) is deduced, namel [ ][ ] [ ] H A T k B = T v k. A 0 ξ b Then, [ ] a sparse LDL-factorization of the KKT-matri H A T A 0 is computed offline for online use. The online computational burden to compute the k -update then becomes one forward and one backward solve. Whichever method that has the lower number of flops should be chosen. B restricting L µ to be diagonal, the second step, i.e. 3), can be implemented as µ k = ma 0,v k L µ B k d) ). To implement the algorithm, the matri L µ must also be chosen. From Theorem 0 we know that L µ BH B T. Usuall in MPC, B is a thin matri which implies that BH B T is positive semi-definite onl. One option is to choose L µ = BH B T ǫi, and to solve pro Lµ g vk L µ B k ) parametricall for fast eecution. Another option is to choose L µ b solving the following semi-definite program: tr L µ subject to L µ BH B T L µ L µ where L µ describes the sparsit structure of L µ, e.g., diagonal. The splitting method used here is the same as the one used in Patrinos and Bemporad 04). However, this is more general since we allow for L µ -matrices that are not a multiple of the identit matri. Also, the same splitting is used in O Donoghue et al. 03); Jerez et al. 03), where ADMM see Bod et al. 0)) is used to solve the optimization problem. 7. NUMERICAL EXAMPLE The proposed algorithms are evaluated b appling them to the AFTI-6 aircraft model in Kapasouris et al. 990); Bemporad et al. 997). This problem is also a tutorial eample in the MPC toolbo in MATLAB. As in Bemporad et al. 997) and the MPC toolbo tutorial, the continuous time model from Kapasouris et al. 990) is sampled using zero-order hold ever 0.05 s. The sstem has four states =,, 3, 4 ), two outputs =, ), two inputs u = u,u ), and obes the following dnamics = [ ] = [ ] [ ] u, where denotes the state in the net time step. The dnamics, input, and output matrices are denoted b Φ, Γ, C respectivel, i.e. we have = Φ Γu, = C. The sstem is unstable, the magnitude of the largest eigenvalue of the dnamics matri is.33. The outputs are the attack and pitch angles, while the inputs are the elevator and flaperon angles. The inputs are phsicall constrained to satisf u i 5, i =,. The outputs are soft constrained to satisf s s and s s 4 respectivel, where s = s,s,s 3,s 4 ) 0 are slack variables. The cost in each time step is l,u,s) = r ) T Q r )u T Rus T Ss ) where Q = C T Q C Q, where Q = 0 I and Q = diag0 4,0,0 3,0), r is such that r = C r where r is the output reference that can var in each step, R = 0 I, and S = 0 6 I. This gives condition number 0 0 of the full cost matri. Further, the terminal cost is Q, and the control and prediction horizon is N = 0. The numerical data in Table is obtained b following a reference trajector on the output. The objective is to change the pitch angle from 0 to 0 and then back to 0 while the angle of attack satisfies the output constraints The constraints on the angle of attack limits the rate on how fast the pitch angle can be changed. All algorithms in the comparison are implemented in MATLAB on a Linu machine using a single core running at.9 GHz. To create an easil transferable and fair termination criterion, the optimal solution to each optimization problem is computed to high accurac using an interior point solver. The optimalit condition is k / 0.005, where k is the primal iterate in the algorithm. This implies that a relative accurac of 0.5% of the primal solution is required. 308

7 The algorithms in Eample 7, i.e. 6)-9), and Eample 8, i.e. 3)-34), have been applied to this problem. Due to the slack variables, 30) cannot replace 6) for the k update. However, the k minimization is separable in the constraints and each of the projections can be solved b a multiparametric program with two regions. This is almost as computationall inepensive as the k update in 30). Further, we use L λ = AH A T. Algorithm 6)- 9) is a generalization of Richter et al. 03) that allows for general matrices L λ. The algorithm in Richter et al. 03) is obtained b setting L λ = AH A T I. The numerical evaluation in Table reveals that this generalization improves the eecution time with more than three orders of magnitude for this problem. The formulation in Eample 8, i.e. 3)-34), directl covers this MPC formulation with soft constraints. For this algorithm, we choose L µ = BH B T 0 4 I and solve 3), i.e. pro Lµ g vk L µ Bk ), parametricall. It turns out that also this parametric program is cheap to implement as it requires one ma-operation onl. The resulting algorithm is a generalization of the algorithm in Patrinos and Bemporad 04). The algorithm in Patrinos and Bemporad 04) is given b setting L µ = BH B T I in the iterations 3)-34). Table indicates that this generalization improves the algorithm b one to two orders of magnitude compared to Patrinos and Bemporad 04). Further, 3)-34) is based on the same splitting as the method in O Donoghue et al. 03); Jerez et al. 03). The difference is that here, the problem is solved with a generalized dual gradient method, while in O Donoghue et al. 03); Jerez et al. 03) it is solved using ADMM. In ADMM, the ρ-parameter need to be chosen. However, no eact guidelines are et known for this choice, and the performance of the algorithm often relies heavil on this parameter. We compare our algorithm with ADMM using the best ρ that we found, ρ = 4, and with one larger and one smaller ρ. Table reports that the eecution time for our method is one to two orders of magnitude smaller or more if the ρ-parameter in O Donoghue et al. 03); Jerez et al. 03) is chosen suboptimall) than the proposed algorithm in O Donoghue et al. 03); Jerez et al. 03). 8. CONCLUSIONS We have proposed a generalization of dual fast gradient methods. This generalization allows the algorithm to, in each iteration, a quadratic upper bound to the negative dual function with different curvature in different directions. This is in contrast to the standard fast dual gradient method where a quadratic upper bound to the negative dual with the same curvature in all directions is d in each iteration. This generalization is made possible b the main contribution of this paper that characterizes the set of matrices that can be used to describe a quadratic upper bound to the negative dual function. The numerical evaluation on an ill-conditioned aircraft problem reveals that the proposed algorithms outperform, with one to three orders of magnitude, other first order algorithms that have recentl been proposed to be suitable for embedded model predictive control. REFERENCES A. Beck and M. Teboulle. A fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM J. Imaging Sciences, ):83 0, October 009. A. Bemporad, A. Casavola, and E. Mosca. Nonlinear control of constrained linear sstems via predictive reference management. IEEE Transactions on Automatic Control, 43): , 997. ISSN doi: 0.09/ A. Bemporad, M. Morari, V. Dua, and E.N. Pistikopoulos. The eplicit linear quadratic regulator for constrained sstems. Automatica, 38):3 0, Januar 00. S. Bod, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3):, 0. P. Giselsson. Improving fast dual ascent for MPC - Part II: The embedded case. Automatica, 04. Submitted. Available J. L. Jerez, P. J. Goulart, S. Richter, G. A. Constantinides, E. C. Kerrigan, and M. Morari. Embedded online optimization for model predictive control at megahertz rates. IEEE Transactions on Automatic Control, 03. Submitted. P. Kapasouris, M. Athans, and G. Stein. Design of feedback control sstems for unstable plants with saturating actuators. In Proceedings of the IFAC Smposium on Nonlinear Control Sstem Design, pages Pergamon Press, 990. Y. Nesterov. A method of solving a conve programming problem with convergence rate O /k ). Soviet Mathematics Doklad, 7):37 376, 983. Y. Nesterov. Introductor Lectures on Conve Optimization: A Basic Course. Springer Netherlands, st edition, 003. ISBN Y. Nesterov. Smooth minimization of non-smooth functions. Math. Program., 03):7 5, Ma 005. B. O Donoghue, G. Stathopoulos, and S. Bod. A splitting method for optimal control. IEEE Transactions on Control Sstems Technolog, 6):43 44, 03. P. Patrinos and A. Bemporad. An accelerated dual gradient-projection algorithm for embedded linear model predictive control. IEEE Transactions on Automatic Control, 59):8 33, 04. S. Richter, C. N. Jones, and M. Morari. Certification aspects of the fast gradient method for solving the dual of parametric conve programs. Mathematical Methods of Operations Research, 773):305 3, 03. K. T. Rockafellar. Conve Analsis, volume 8. Princeton Univercit Press, Princeton, NJ, 970. P. Tseng. On accelerated proimal gradient methods for conve-concave optimization. Technical report. Available: papers/apgm.pdf, Ma 008. W. Zuo and Z. Lin. A generalized accelerated proimal gradient approach for total-variation-based image restoration. IEEE Transactions on Image Processing, 00): , October

Monotonicity and Restart in Fast Gradient Methods

53rd IEEE Conference on Decision and Control December 5-7, 204. Los Angeles, California, USA Monotonicity and Restart in Fast Gradient Methods Pontus Giselsson and Stephen Boyd Abstract Fast gradient methods