GRASP in Switching Input Optimal Control Synthesis

Size: px

Start display at page:

Download "GRASP in Switching Input Optimal Control Synthesis"

Jesse Blair
6 years ago
Views:

1 MIC th Metaheuristics International Conference 381 GRASP in Switching Input Optimal Control Synthesis Paola Festa Giancarlo Raiconi Dept. of Mathematics and Computer Science, University of Salerno Via S. Allende, Baronissi (SA), Italy 1 Switching input control In this paper, several optimal control problems are introduced, referred to the switching input control scheme. The system is characterized by several control variables and at any time only one of them can be acted. At any time, the control decision involves the input variable to be activated and its value. Under suitable assumption about both the finite time and the infinite time but periodic case, the optimal control can be computed as standard feedback law, once the switching policy is defined. The problem turns in a high dimension discrete problem and a search procedure can be used to solve it. Following the line used in the companion problem of optimal selection of output variables for filtering purpose [1], here we apply the meta-heuristic Greedy Randomized Search Procedure with good results. The system is modeled as linear invariant discrete time, deteristic or stochastic. In the deteristic case, we suppose that the state if fully observed: x(k +1) = Ax(k)+b(k)u(k) (1) x(0) = x 0,x(k) R n,k =0, 1, 2..., A R n n,b(k) R n (2) In the stochastic case, the state can be either fully observed or partially observed with possible output error: x(k +1) = Ax(k)+b(k)u(k)+w(k) (3) y(k) = Cx(k)+v(k) (4) E(x(0)) = 0, E(x(0)x T (0)) = P 0 R n n (5) w(k) R n,e(w(k)) = 0, E(w(k)w T (j)) = Qδ kj (6) C R p n,v(k) R p,e(v(k)) = 0, E(v(k)v T (j)) = Rδ kj (7) E(w(k)x T (0)) = 0,E(v(k)x T (0)) = 0, E(v(k)w T (j)) = 0, (8) where P 0,Qare assigned symmetric semidefinite n n covariance matrices and R is a symmetric definite positive p p covariance matrix. In all cases, the term b(k)u(k) represents the switching control term. In particular, b(k) R n represents the input variable to be activated at time k. It is a vector that at any time can assume only a value in a finite preassigned set b(k) b 1,b 2,..., b q},b i R n. The real scalar variable u(k) represents the value of the input variable itself. An alternative description of the adopted control scheme can be given by introducing binary control variables α i (k),i =1, 2,..., q. In this case, the control action consists of detering at any time k which ones among the α i -s have to be set to 1 and the value of the control u. q q b(k)u(k) = α i (k)b i u(k); α i (k) 0, 1}, α i (k) =1k =0, 1, 2... (9) i=1 i=1

2 382 MIC th Metaheuristics International Conference 2 Finite time optimal control The optimality criterion is a quadratic error state added to proper control energy function. In the finite time deteristic case, the problem is } N 1 J = u 2 (k)+x T (k)mx(k)+x T (N)W N x(n) (10) k =0, 1,..., N 1 under constraints defined by equations (1),(2),(9). Note that fixed symmetric semidefinite positive n n matrices are involved. The problem can be easily transformed into a combinatorial optimization problem by first imizing with respect to the control variable u(.) and then with respect switching variables α s. Once the sequence α i (k),k =0, 1,...N 1 is fixed, the problem is a classical linear quadratic control problem for the time varying system (1)-(2). For any k the optimal control is given in the following feedback form. û(k) =b T (k)w (k)x(k), (11) where W (k) satisfies the discrete time Riccati equation: W (k) =A T W (k +1)A + N (AT W (k +1)A + N)b T (k)b(k)(a T W (k +1))A + N) 1+b T (k)(a T (12) W (k +1)A + N)b(k) with endpoint condition W (N) =W N. The corresponding cost function is given by J = u(.) J(α, u) =J(α, û) =xt (0)W (0)x(0). (13) Summarizing, the problem is reduced now to the following discrete one. α i (k) k =0, 1, 2,..., N 1 J = x T 0 W (0)x 0 } under constraints (9). Since such a formulation requires the aprioriknowledge of the initial state, it is not suitable for on-line applications, where the initial state is usually unknown. In these cases, it is useful to change the cost function to a scalar measure of the symmetric semidefinite positive matrix W (0) such as the trace or the deterant. To imize such indexes is equivalent to maximize quality indexes introduced by Muller and Weber [6] for controllability of deteristic systems. Indeed, by defining as in [6] the scalar function of eigenvalues of the symmetric semidefinite positive matrix W, µ s (W )= ( n i=1 1 n λs i (W )) 1 s,wehave: J 1 = µ 1 (W 1 n )= Tr(W ),J 0 = µ 0 (W 1 ) = lim µ 0 (W 1 )= W 1 n. s Finite time stochastic optimal control In the stochastic case, the system is described by equation (3) and the optimization criterion is equivalent to (7). k =0, 1,..., N 1 E(J) =E [ N 1 u 2 (k)+x T (k)mx(k)+x T (N)W N x(n) Given any valid sequence of α-s, the optimal control is always expressed by formulas (11),(12) and the criterion value is given by ( ) J = N 1 E [J(α, u)] = E [J(α, û)] = Tr P u(.) 0 W (0) + W (k)q. (15) ]} (14)

3 MIC th Metaheuristics International Conference 383 In the presence of incomplete state information it can be used the separation principle, which is valid under further hypothesis of Gaussian disturbances. In this case, û(.) can be expressed as function of the optimal Kalman filter estimate of the state û(k) =b T (k)w (k) x(k), and the value J of optimality criterion depends also from the covariance of filter Discrete formulation Even in the stochastic case, the final problem consists of imizing a function J or J of 0 1variables α i (k),i=1, 2,..p, k =0, 1,...N 1 subject to constraints (9). If the time horizon is very long, the size of the solution space (q N ) can be very big. Moreover, computation of performance index is somewhat costly in terms of computational effort, so an optimization method able to find an approximate solution with a relatively small number of function evaluations seems to be adequate. A further characteristic observed in the behavior of the particular problem is the presence of many local ima in such a way that a classical local search procedure cannot give good results. Our approach is to use a randomized multi-start heuristic algorithm in order to limit the number of function evaluations. 3 Infinite time horizon problems (periodic) In the stochastic case, the optimal performance index (11) diverges when N and two possible approaches can be followed: either to introduce a discount factor 0 <β<1 in the index : k =0, 1,... lim N [ N E(J) =E β k ( u 2 (k)+x T (k)mx(k) )]} (16) or to consider the average cost per stage: k =0, 1,..., lim N [ E(J) N = 1 N 1 N E u 2 (k)+x T (k)mx(k) ]} (17) Whatever method we chose, it can be shown that the limits on (16) and (17) converge under suitable hypotheses on system matrices and on the sequence α i (k) (i.e. the sequence is such that the system A, b(k) is stabilizable in time varying sense[2]). Therefore, at least in principle, the same reasoning used for the finite time case can be extended to infinite time case. Unfortunately, such an approach seems not applicable in practice because of two issues: 1. The sequence α i (k) is infinite, so it does not make any sense to find imum with any search procedure. 2. Since the optimal sequence α i (k) is infinite, it is not practically possible to embed the optimal switching sequence in the feedback controller. From a practical point of view, it is much more attractive to have a periodic switching policy, that can be easily embedded in the controller programg. Moreover, choosing a sufficiently long period, the performance index related to any non periodic sequence can be strictly approached. The formulation of the infinite time periodic optimal control problem, (stochastic problem with complete state information) is the following. k =0, 1,..., N 1 lim m 1 N E mn 1 k=(m 1)N u 2 (k)+x T (k)mx(k) (18)

4 384 MIC th Metaheuristics International Conference It is easy to show that if the periodic system A, b(k),k =0, 1, 2,,,N 1 is stabilizable [3], then it holds that Assug as teral condition any symmetric semidefinite positive matrix W N,thesolution of equation (12) converges to a periodic symmetric definite positive matrix sequence W (k) as k, independently from W N ; W (k) satisfies Riccati equation with periodic two point boundary conditions W (0) = W (N); The average cost per stage computed with the optimal control corresponding to the steady state matrix û(k) =b T (k)w (k)x(k) isgivenby J p (α) = lim m 1 N E mn 1 k=(m 1)N u 2 (k)+x T (k)mx(k) = 1 N N 1 Tr(W (k)q). (19) Even in this case, the problem is reduced to imize a function with respect to variables α s under constraints (9). In this case, to compute a function is more complicated because of the splitted boundary conditions. In fact, the most straightforward way to compute the periodic function W (k) is to iterate the backward Riccati equation starting from a semidefinite positive matrix until convergence to a periodic solution is attained. In most cases, this occurs in relatively few iterations, but if the sequence A, b(k) is scarcely controllable, convergence can be extremely slow (if A, b(k) is not detectable equation does not converge at all). Fortunately, in such cases the matrix solution has a growing value of trace and such phenomenon can be used in search procedure to stop computation, saving a lot of time. In the incomplete information case, the situation is even more complex because function value depends also from the steady state value of the covariance of the filter and then a function evaluation involves iteration until convergence of the backward Riccati equation for controller and of the forward Riccati equation for the filter. 4 GRASP A Greedy Randomized Adaptive Search Procedure (GRASP) is a metaheuristic for finding approximate solutions for difficult combinatorial problems. GRASP is a multistart method characterized by two phases: a construction phase and a local search also known as local improvement phase. During the construction phase a feasible solution is iteratively constructed. One element at time is randomly chosen from a Restricted Candidate List (RCL), whose elements are sorted according to some greedy criterion, and added to the building admissible solution. As the found solution could not be locally optimal with respect to the adopted neighborhood definition, the local search phase tries to improve it. These two phases are iterated and the best solution found is kept as an approximation of the optimal one. GRASP has been proposed in 1989 by Feo and Resende in [4]. Since 1989 numerous papers on the basic aspects of GRASP, as well as enhancements to the basic metaheuristic have been appeared in the literature. GRASP has been applied to a wide range of combinatorial optimization problems, ranging from scheduling and routing to drawing and turbine balancing, as reported in a very recent annotated bibliography of the GRASP literature[5], due to Festa and Resende. Greedy construction phase: At eachgrasp constructionphase a starting point for the local search procedure is built by following the first N steps of the sequential greedy procedure. Any step of Riccati equation iteration can be spilt into two steps: W (k) =A T W (k +1)A + N L(k, b(k)) L(k, b(k)) = (AT W (k+1)a+n)b(k)b T (k)(a T W (k+1)a+n) 1+b T (k)(a T W (k+1)a+n)b(k). The greedy criterion is to maximize the trace of term L, which is subtracted to the quadratic cost matrix. GRASP construction phase paradigm requires to choose at any step a member of the

5 MIC th Metaheuristics International Conference 385 RCL defined on the basis of the adopted greedy criterion. In our case, the complete procedure to choose the randomized greedy initial α for a fixed N is the following construction phase: 1 choose a large value for W N, for example 1 ɛ I,set0<λ<1 put k = N 2 for i =1, 2,..., q, compute c i = Tr(L, b i )andsetc m =c i },c M =maxc i } 3 choose at random j 1, 2,..., q} such that c M λ(c M c m ) c j,setα l (k 1) = δ jl 4 if k>1, compute W (k 1), put k = k 1, and go to step 2, else exit with the matrix α ls } full. Local search phase: Onceastartingpointα is found, it must be verified if it is locally optimal and possibly an improving solution must be searched in a prescribed neighborhood of α. For our purpose, a distance (α, η) must be introduced in order to measure the similarity of two different candidate solutions α, η. The definition of depends on the physical meaning of relevant parameters. We used the Hamg distance defined as H (α, η) = 1 2 q i=1 N j=1 α i(j) η i (j). In our application, H measures the number of different columns between matrices α and β. Actually, as well as other similarly defined metrics, H compares two strings of length N and is unable to recognize equivalent sequences 1, but this is not a drawback when used for a local search procedure. For designing an efficient local search phase, the size of the neighborhood must be not too large, because this phase is performed iteratively starting from a great number of initial values. The whole GRASP algorithm pseudo-code is reported in the following. 1 Read systems data. Put: it =0andĴ equal to a large number; 2 Build an initial GRASP sequence α 0 ; 3 Find. of J on a neighborhood of α 0.Put:α arg = α (α, α 0 ) s} J(α)},J = J(α ); 4 put it = it +1ifJ < Ĵ put Ĵ = J, α = α ; 5 if it = maxit stop and return the solution α else go to Step 2. Numerical experiments carried out by applying the proposed metaheuristic based algorithm have given very good results, compared with a standard local search method which suffers to fall in local ima. References [1] Festa P., Raiconi G.: Using GRASP for choosing best periodic observation strategy in stochastic systems filtering. to appear on: Cooperative Control and Optimization, P.Pardalos&Eds.Kluwer A.P [2] Anderson B.D.O., Moore J.B.: Detectability and stabilizability of time varying discrete time linear systems, SIAM Journal on Control and Optimization. 19, (1981), pp [3] Bittanti S., Colaneri P.: De Nicolao G.: The difference periodic Riccati equation for the periodic prediction problem. IEEE Transaction on Automatic Control, 33, (1988), pp [4] Feo T.A. Resende M.G.C.: A probabilistic heuristic for computationally difficult set covering problems. Operation Research Letters, 8, (1989), pp [5] Festa P., Resende M.G.C: GRASP: A bibliogaphy, to appear on: Essays and surveys on metaheuristic, P. Hansen, C. C. Rebeiro (eds.), Kluwer A. P., 2001 [6] Muller P.C., Weber H. I.: Analysis and optimization of certain qualities of controllobility and observability for linear dynamical systems, Automatica, 8, (1972), In our application, two sequences are equivalent if cyclically shifted, i.e.: β ij = α ij+ν, j =1,.., N ν and β in ν+j = α ij j =1, 2,..ν.

Optimal control and estimation

Automatic Control 2 Optimal control and estimation Prof. Alberto Bemporad University of Trento Academic year 2010-2011 Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011