Minimax Optimal Control of Stochastic Uncertain Systems with Relative Entropy Constraints

Size: px

Start display at page:

Download "Minimax Optimal Control of Stochastic Uncertain Systems with Relative Entropy Constraints"

Holly Hunt
5 years ago
Views:

1 [IEEE Trans. Aut. Control, 45, , 2000] Mnmax Optmal Control of Stochastc Uncertan Systems wth Relatve Entropy Constrants Ian R. Petersen Matthew R. James Paul Dupus Abstract Ths paper consders a new class of dscrete tme stochastc uncertan systems n whch the uncertanty s descrbed by a constrant on the relatve entropy between a nomnal nose dstrbuton and the perturbed nose dstrbuton. Ths uncertanty descrpton s a natural extenson to the case of stochastc uncertan systems, of the sum quadratc constrant uncertanty descrpton. The paper solves problems of worst case robust performance analyss and output feedback mnmax optmal controller synthess n a general nonlnear settng. Specalzng these results to the lnear case leads to a mnmax LQG optmal controller. Ths controller s defned n terms of Rccat dfference equatons and a Kalman Flter lke state equaton. 1 Introducton One of the key deas to emerge n the feld of modern control s the use of optmzaton and optmal control theory to gve a systematc procedure for the desgn of feedback control Ths work was supported by the Australan Research Councl and n part by the Offce of Naval Research (ONR N ). Part of ths research took place whle the last author was a Vstng Fellow of the Faculty for Engneerng and Informaton Technology at The Australan Natonal Unversty. Support through the ANU 1995 Facultes Research Grants Scheme s gratefully acknowledged. Department of Electrcal Engneerng, Australan Defence Force Academy, Campbell, 2600, Australa, emal: rp@ee.adfa.oz.au. Department of Engneerng, Faculty of Engneerng and Informaton Technology, Australan Natonal Unversty, Canberra, ACT 0200, Australa; e-mal: Matthew.James@anu.edu.au LCDS, Dvson of Appled Mathematcs, Box F, Brown Unversty, 192 George St., Provdence, RI 02912, USA. 1

2 systems. For example, n the case of lnear systems wth full state measurements, the lnear quadratc regulator (LQR) approach provdes one of the most useful technques for desgnng state feedback controllers; e.g., see [1]. Also, n the case of lnear systems wth partal nformaton, the soluton to the lnear quadratc Gaussan (LQG) stochastc optmal control problem provdes a useful technque for the desgn of multvarable output feedback controllers; see [1]. However, the above optmal control technques suffer from a major dsadvantage n that they do not provde a systematc means for addressng the ssue of robustness. Indeed n the partal nformaton case, an LQG controller may lead to very poor robustness propertes; e.g., see [2]. The lack of robustness whch may result when desgnng a control system va standard optmal control methods such as mentoned above has been a major motvaton for research n the area of robust control. Research n ths area has resulted n a large number of approaches beng avalable to deal wth the ssue of robustness n control system desgn. These methods nclude those based on Khartonov s Theorem (see [3]), H control theory (e.g., see [4]) and the quadratc stablzablty approach (e.g., see [5]). One mportant dea found n robust control theory s the noton of an uncertan system. Ths nvolves modellng not only the nomnal dynamcs of the system to be controlled but also modellng the uncertanty n the system. Many dfferent methods of modellng uncertanty have been consdered. These nclude H norm bounded lnear tme nvarant uncertanty blocks (e.g., see [6]), tme-varyng norm bounded uncertanty blocks (e.g., see [5]) and more recently, uncertanty blocks satsfyng an ntegral quadratc constrant (e.g., see [7]). By consderng the problem of controllng uncertan systems, the ssue of robustness wth respect to specfed uncertantes can be systematcally addressed. In partcular, results such as [7] on mnmax optmal control of uncertan systems provde a method of ncorporatng the ssue of robustness nto the optmal controller desgn methodologes mentoned above. However, t should be noted that [7] s a state feedback result and n the partal nformaton case, the exstng results avalable are much less satsfactory. One of the underlyng motvatons for the results to be presented n ths paper s to develop an uncertan system framework whereby the standard LQG stochastc optmal controller desgn methodology can be extended nto a mnmax stochastc optmal control methodology for uncertan systems wth partal nformaton. Ths would allow the desgner to retan all of the exstng methods for choosng LQG weghtng functons to acheve the desred nomnal performance (e.g., see [1]) whlst the ssue of robustness can be addressed by approprate choce of the uncertanty structure n the uncertan system model. One of the man contrbutons of ths paper s to ntroduce a new class of nonlnear dscrete tme stochastc uncertan systems. The class of stochastc uncertan systems to be ntroduced nvolves modellng uncertanty n terms of a constrant referred to as a relatve entropy constrant. Ths uncertanty descrpton s a modfcaton of the uncertanty 2

3 descrpton whch was presented n [8]. As n [8], our uncertanty descrpton allows us to model structured uncertan dynamcs n dscrete tme systems subject to stochastc nose processes. However, our uncertanty structure s slghtly more restrcted than that consdered n [8] n that we must assume that all uncertanty nputs affect the system va the same channel as the stochastc nose process. Our motvaton for consderng uncertan systems whch are subject to stochastc nose process dsturbances s twofold. In the frst nstance, many engneerng control problems nvolve dealng wth systems whch are subject to dsturbances and measurement nose whch can be well modelled by stochastc processes. A second motvaton s that n gong from the state feedback LQR optmal control problem to the measurement feedback LQG optmal control problem, a crtcal change n the model s the ntroducton of nose dsturbances. Hence, t would be expected that n order to obtan a reasonable generalzaton of LQG control for uncertan systems, t would be necessary to consder uncertan systems whch are subject to dsturbances n the form of stochastc nose processes. The reason why we use a slghtly dfferent uncertan system model n ths paper as opposed to that used n [8] s that n the partal nformaton case, the uncertan system model chosen n [8] led to an apparently ntractable partal nformaton stochastc dfferental game problem. By usng the uncertanty descrpton proposed here, we are led to a more tractable partal nformaton rsk senstve control problem. The organzaton of the paper s as follows. In Secton 2 of the paper, we consder a problem of worst case performance analyss for stochastc uncertan systems. Ths secton begns by defnng a very general class of nonlnear dscrete tme stochastc uncertan systems consdered over a fnte tme nterval. The uncertan systems consdered are descrbed by a nomnal system whch s drven by a stochastc nose process wth a specfed statstcal descrpton. Also consdered s a perturbed system n whch a general class of stochastc nose processes are allowed. These perturbed nose processes must satsfy a certan constrant on the relatve entropy between the nomnal nose process statstcs and the perturbed nose process statstcs; see Secton 2 for a defnton of relatve entropy. Ths relatve entropy constrant s an extenson of the stochastc uncertan constrant consdered n [8] and the ntegral quadratc constrant uncertanty descrpton such as consdered n [9]. Ths uncertanty descrpton allows for uncertanty n both the nonlnear system dynamcs and the statstcs of the appled nose process. The problem consdered n Secton 2 s a problem of worst case performance analyss wth respect to a specfed cost functon. Ths s a constraned maxmzaton problem where the maxmzaton s wth respect to the uncertanty n the system or equvalently wth respect to the perturbed nose probablty measure. The frst man result presented n ths secton uses a Lagrange multpler technque to convert the constraned maxmzaton problem nto an unconstraned maxmzaton problem. Ths result s an extenson of S- procedure deas such as presented n [8]. The second man result of ths secton uses a 3

4 result on the dualty between free energy and relatve entropy (e.g., see [10]) to convert ths unconstraned maxmzaton problem nto a problem of evaluatng a rsk senstve cost functonal. Both of the results of Secton 2 are used n subsequent sectons on controller synthess. Secton 3 of the paper consders a problem of mnmax optmal controller synthess for a general class of dscrete tme nonlnear stochastc uncertan systems wth partal nformaton. Usng the results of Secton 2, t s frst shown that these problems can be converted nto correspondng unconstraned stochastc game problems wth partal nformaton. The stochastc game problem s then converted nto an equvalent rsk senstve optmal control problem wth partal nformaton. The subsequent sectons show how exstng results can be appled to solve these partal nformaton rsk senstve optmal control problems. Secton 4 of the paper apples the results of [11] and [12] to solve the above mentoned rsk senstve optmal control problems n a general nonlnear settng. In partcular, the dynamc programmng approach of [11] s appled n the specal case of full state nformaton and the nformaton state approach of [12] s appled n the measurement feedback case. Secton 5 of the paper consders that specal case n whch the underlyng nomnal system s lnear. In ths case, the results of [13] and [14] are appled to solve the correspondng rsk senstve optmal control problems. In the state feedback case, the results of [13] lead to a Rccat equaton soluton to the rsk senstve control problem. Indeed, the correspondng control algorthm obtaned n ths case n vrtually dentcal to that whch was obtaned n the prevous paper [8]. However, [8] deals wth a somewhat dfferent uncertanty descrpton than that whch s consdered n ths paper. As mentoned above, one of the man motvatons for ths paper was to consder the mnmax stochastc optmal control problem n the lnear partal nformaton case. Ths problem s the robust generalzaton of the LQG problem. In Secton 5 of the paper, we use the results of [14] to solve the rsk senstve control problem correspondng to ths specal case. The soluton to the dscrete tme, lnear, partal nformaton rsk senstve control problem was orgnally obtaned n [15]. However for our purposes, t was more convenent to use the results of [14] snce they gve an explct characterzaton of the optmal rsk senstve cost. The applcaton of the results of [14] leads to a partal nformaton controller whch s obtaned va the soluton of two Rccat type recursons and nvolves a state estmator based controller structure of the type whch occurs n H control. Indeed, the soluton to the lnear partal nformaton rsk senstve optmal control problem s known to be closely related to the H control problem; e.g., see [16]. The man achevement of ths paper s the soluton of a partal nformaton mnmax optmal control problem for stochastc uncertan lnear systems. A number of other authors have consdered related control problems n the lnear partal nformaton case. In partcular, the mxed H 2 /H problem (e.g., see [17, 18, 19, 20]) s a closely related problem 4

5 to that consdered n ths paper. However, n many cases our problem formulaton whch nvolves mnmzng the worst case cost for a specfc uncertan system s the natural way to formulate a robust performance problem. Ths s n contrast to the problems consdered n H 2 /H control theory whch typcally nvolve mnmzng nomnal performance subject to a robust stablty constrant. Also our soluton whch s gven n terms of Rccat equatons s a drect extenson of the standard LQG method. 2 Worst Case Performance for Stochastc Uncertan Systems The results of ths secton are concerned wth the problem of characterzng the worst case performance for a stochastc uncertan system. In order to apply the results of ths secton n our subsequent consderaton of mnmax optmal control problems, t wll be convenent to ntroduce a very general defnton of a stochastc uncertan system. Ths defnton wll be specalzed n the sequel to more concrete classes of stochastc uncertan system. Reference System The stochastc uncertan systems under consderaton are defned n terms of a reference or nomnal system and a perturbed system. In ths secton, the reference system to be consdered s defned by a sequence of operators mappng the state sequence and nose sequence nto the next value of the state: x k+1 = G k (x 0k, w 0k ). (1) Here, x 0N = {x 0, x 1,..., x N } denotes the state sequence and w 0N = {w 0, w 1,..., w N } denotes the nose sequence. For each k {0, 1,..., N}, w k R r and x k R n. Thus, we can wrte w 0N R r(n+1) and x 0N+1 R n(n+2). It s assumed that for each k, the operator G k ( ) defnes a Borel measurable mappng R n(k+1) R r(k+1) R n. The ntal condton vector x 0 and the nose nput sequence w 0N are random varables wth a specfed jont probablty measure µ(dw 0N dx 0 ). In the sequel, we wll make use of the followng decomposton of the measure µ µ(dw 0N dx 0 ) = µ x0 (dx 0 )µ 0 (dw 0 x 0 )µ 1 (dw 1 x 0, w 0 )µ 2 (dw 2 x 0, w 0, w 1 )... µ N (dw N x 0, w 0,... w N 1 ). 5

6 Perturbed System The correspondng perturbed system s also defned by a sequence of operators mappng the state sequence and nose sequence nto the next value of the state: x k+1 = G k (x 0k, w 0k ) z k = b k (x k ) (2) where w 0N s a perturbed nose sequence and z 0N+1 s the output sequence. For each k, z k R q and the functon b k ( ) s assumed to be Borel measurable. The jont probablty measure of w 0N and x 0 s denoted ν(d w 0N dx 0 ) and s contaned n the set P of all probablty measures on the varables w 0N and x 0. In the sequel, we wll make use of the followng decomposton of the measure ν: ν(d w 0N dx 0 ) = ν x0 (dx 0 )ν 0 (d w 0 x 0 )ν 1 (d w 1 x 0, w 0 )ν 2 (d w 2 x 0, w 0, w 1 )... ν N (d w N x 0, w 0,... w N 1 ). It follows mmedately from ths defnton that P s a convex set of probablty measures. Remarks Note that the class of probablty measures P defned above s actually equvalent to the class of probablty measures P defned n [8]. The nterpretaton of equatons (1) and (2) together wth the correspondng set of nose probablty measures P as a stochastc uncertan system s as follows. The set P wll be further restrcted n the sequel wth the ntroducton of the relatve entropy constrant on the nose probablty measures. Ths restrcted set together wth equaton (2) wll defne a famly of stochastc systems. Ths famly of stochastc systems s a stochastc uncertan system. The true system s assumed to be a member of ths famly. Stochastc Uncertan System Assocated wth the systems (1), (2) and set of probablty measures P, we now defne a stochastc uncertan dynamcal system. Ths s acheved by specfyng the class of admssble perturbed nose processes for the perturbed system (2). The admssble perturbed nose processes wll be random processes such that a certan Relatve Entropy Constrant s satsfed. To defne ths relatve entropy constrant, frst let d 1 > 0, d 2 > 0,..., d s > 0 be gven postve constants. We also recall the followng standard defnton of relatve entropy; e.g., see [10]. Gven any two probablty measures γ( ) and θ( ) defned on the same measurable space, the relatve entropy R(γ( ) θ( )) s defned by R(γ( ) θ( )) = { ( ) log dγ dθ dθ f θ( ) << γ( ) & log dγ dθ 1(dθ) + otherwse. 6

7 The relatve entropy can be regarded as a measure of the dstance between the probablty measure γ( ) and the probablty measure θ( ). The propertes of relatve entropy can be found n Secton 1.4 of [10]. Assocated wth the systems (1) and (2) s the followng set of non-negatve valued Borel measurable functons: q j ( ) : R p R = 0, 1,..., N j = 1, 2,..., s. (3) These functons wll determne the admssble perturbed nose processes n the system (2). Indeed, the admssble perturbed nose random processes are defned by the elements of the set P such that the followng condton s satsfed: Relatve Entropy Constrant A probablty measure ν P defnes an admssble perturbed nose random process f R(ν x0 ( ) µ x0 ( )) d E ν N + [R(ν k ( x 0, w 0,... w k 1 ) µ k ( x 0, w 0,... w k 1 )) q k (z k )] [ N ] = R(ν( ) µ( )) E ν q k (z k ) + d 0 (4) for all = 1, 2,..., s. Note that the equalty above follows from the chan rule for relatve entropy; see [10]. The set of all probablty measures ν P defnng admssble perturbed nose random processes s denoted Ξ. Note that t follows from one of the propertes of relatve entropy and the assumed non-negatvty of the functons q k ( ) that that the set Ξ s non-empty. Indeed, Ξ wll always contan the probablty measure µ( ) as defned for the reference system. Remarks The uncertanty descrpton emboded n the above relatve entropy constrant s a generalzaton of the stochastc sum quadratc constrant consdered n [8]. Indeed, as n the stochastc sum uncertanty constrant, ths uncertanty descrpton allows for nonlnear tme varyng uncertanty blocks together wth exogenous nose enterng nto the perturbed nose process. The specal case of a stochastc system wthout uncertanty s obtaned f we let q j ( ) 0 & d = 0. (5) In ths case, the relatve entropy constrant (4) forces the relatve entropy R(ν( ) µ( )) to be zero. Strctly, ths case does not satsfy the assumpton on the constants d. However, 7

8 t s easly to see that the case of a stochastc system wthout uncertanty can be obtaned n the lmt as we let the constants d and the functons q j ( ) approach zero. Note that the relatve entropy R(ν( ) µ( )) can be thought of as a measure of the dfference between the reference probablty measure µ( ) and the perturbed probablty measure ν( ). In partcular, the relatve entropy s zero f and only f µ( ) ν( ). Typcal perturbatons whch are allowed under the above relatve entropy constrant are perturbatons n the mean of the probablty measure µ such as llustrated n Fgure 1. More specfc Uncertan Dynamcs Uncertan Dynamcs Uncertan Nose Uncertan Dynamcs ξ k Reference Nose w k Nomnal System Fgure 1: Uncertan System Block Dagram examples of the uncertantes whch can be descrbed va ths relatve entropy constrant wll be gven n Secton 3. Our motvaton for extendng the uncertanty descrpton used 8

9 n [8], to the relatve entropy uncertanty descrpton consdered here s that t enables us to convert problems of mnmax optmal control nto equvalent rsk senstve control problems. Ths s n contrast to the approach of [8] n whch the mnmax optmal control problem was converted nto an equvalent stochastc game problem. The approach taken n ths paper may gve sgnfcant computatonal advantages snce rsk senstve control problems are often easer to solve than stochastc game problems. Cost Functonal In ths secton, we consder the problem of characterzng, for a stochastc uncertan system defned as above, the worst case performance wth respect to a cost functonal defned to be: N J(z 0N+1 ) = Φ(z N+1 ) + L(k, z k ) (6) where the functons Φ( ) and L( ) are Borel measurable. The problem under consderaton n ths secton s to fnd sup E ν {J(z 0N+1 )}. (7) ν Ξ Our frst step n evaluatng ths quantty s to use a Lagrange multpler technque to convert the problem from a constraned optmzaton problem nto an unconstraned optmzaton problem. Indeed, gven a vector τ = [τ 1, τ 2,..., τ s ] R s, we defne an augmented cost functon as follows: J τ (z 0N+1 ) N = Φ(z N+1 ) + L(k, z k ) R(ν s x0 ( ) µ x0 ( )) d τ N + [R(ν =1 k ( x 0, w 0,... w k 1 ) µ k ( x 0, w 0,... w k 1 )) q k (z k )] Now, we defne V τ to be the value of the correspondng unconstraned optmzaton problem: Also, we defne a set Γ R s as V τ = sup E ν {J τ (z 0N+1 )}. ν P Γ = {τ = [τ 1 τ 2... τ s ] R s : τ 1 0, τ 2 0,..., τ s 0 & V τ < }. 9

10 Usng these defntons, we are now n a poston to state the followng theorem relatng the constraned optmzaton problem and the unconstraned optmzaton problem. Ths theorem s an extenson of the S-procedure deas presented n [8]. The proof of ths theorem wll follow from a standard result on convex analyss and the convexty of the relatve entropy R(γ( ) θ( )) wth respect to the probablty measure γ( ) (e.g., see [10]). Theorem 2.1 Consder the stochastc uncertan system (1), (2), (4) wth cost functonal J(z 0N+1 ). Then the followng condtons hold: () The supremum sup ν Ξ E ν [J(z 0N+1 )] s fnte f and only f the set Γ s non-empty. () If the set Γ s non-empty, then sup E ν [J(z 0N+1 )] = mn V τ. (8) ν Ξ τ Γ Proof. We wll establsh ths result usng Theorem 1 on page 217 of [21]. In order to apply ths result to the constraned optmzaton problem (7), we must frst establsh that all of the condtons requred to apply ths result are satsfed. Frst note that the reference probablty measure µ( ) s contaned n the set P and that when ν( ) = µ( ), we have R(ν x0 ( ) µ x0 ( )) = 0 and R(ν k ( x 0, w 0,... w k 1 ) µ k ( x 0, w 0,... w k 1 )) = 0 k Hence, usng the fact that q k ( ) 0 and d > 0, t follows that R(ν x0 ( ) µ x0 ( )) d E ν N + [R(ν k ( x 0, w 0,... w k 1 ) µ k ( x 0, w 0,... w k 1 )) q k (z k )] [ N ] = E ν ( q k (z k )) d d < 0 for all. That s the relatve entropy constrant (4) s strctly satsfed. Ths s the constrant qualfcaton condton requred n order to apply the result of [21]. In order to apply the result of [21], we wll also use the fact that the functon R(ν s x0 ( ) µ x0 ( )) d E ν τ N + (R(ν k ( x 0, w 0,... w k 1 ) µ k ( x 0, w 0,... w k 1 )) q k (z k )) =1 10

11 s a convex functon of the probablty measure ν. Ths follows snce ] N E ν [R(ν x0 ( ) µ x0 ( )) + (R(ν k ( x 0, w 0,... w k 1 ) µ k ( x 0, w 0,... w k 1 ))) s a representaton of the relatve entropy R(ν( ) µ( )) whch s a convex functon of the probablty measure ν; see Theorem C.3.1 and Theorem of [10]. Furthermore E ν J(z 0N+1 ) s a lnear and hence concave functon of the probablty measure ν. Moreover, the set P s convex. Combnng all of the above facts, we have establshed that all of the condtons requred n order to apply Theorem 1 on page 217 of [21] to the constraned optmzaton problem (7) are satsfed. We now establsh statement () of the theorem. Frst, suppose that sup E ν {J(z 0N+1 )} = c < ν Ξ Then t follows drectly from the result of [21] that there exsts a vector τ 0 such that V τ = sup E ν {J τ (z 0N+1 )} = c <. ν P Conversely, f there exsts a vector τ 0 such that V τ = sup E ν {J τ (z 0N+1 )} = c <, (9) ν P then gven any ν Ξ, t follows from (9) and the relatve entropy constrant (4) that E ν {J(z 0N+1 )} c. Hence, sup E ν {J(z 0N+1 )} c <. ν Ξ Ths completes the proof of (). To establsh (), we frst observe that gven any vector τ 0, t follows from the relatve entropy constrant (4), that for any ν P Hence, sup ν P E ν {J τ (z 0N+1 )} E ν {J(z 0N+1 )}. E ν {J τ (z 0N+1 )} sup ν Ξ E ν {J τ (z 0N+1 )} sup E ν {J(z 0N+1 )} ν Ξ for all τ 0. Also, t follows from the result of [21] that there exsts a τ 0 such that sup ν P E ν {J τ (z 0N+1 )} = sup E ν {J(z 0N+1 )} ν Ξ 11

12 Thus, part () of the theorem has been establshed. Remark In the case of no uncertanty (5), t follows from the postvty of the relatve entropy R(ν( ) µ( )) that the mnmum n (8) wll be acheved n the lmt as s =1 τ. Note that ths does not contradct the exstence of a mnmum n (8) snce as mentoned above, the condtons (5) are strctly not permtted under the standng assumptons but can only be approached n the lmt. In the sequel, we wll requre that the system (1), satsfes the followng assumpton. Assumpton 2.1 sup E ν {J(z 0N+1 )} = ν P In ths assumpton, note that we are effectvely maxmzng the cost functonal (6) wth respect to the nose nput w k. Hence, ths assumpton amounts to a controllablty type assumpton wth respect to the nput w k and an observablty type assumpton wth respect to the cost functonal (6). For example, suppose (2) s a lnear system of the form x k+1 = Ax k + D w k z k = Ex k and (6) s a quadratc cost functonal of the form N J(z 0N+1 ) = zk T Qz k where the par (A, D) s controllable and the par (A, E T QE) s observable. Then, t s straghtforward to verfy that ths assumpton wll be satsfed. Ths assumpton has been ntroduced n order to rule out the case n whch the mnmum n (8) s acheved at τ = 0. Alternatvely, we could have explctly ruled out ths case n the results to follow. Remarks It follows from the above assumpton that the zero vector s not contaned n the set Γ. Indeed, f τ = 0 then J τ (z 0N+1 ) = J(z 0N+1 ) and hence V τ =. Therefore, the zero vector s not contaned n Γ. For any non-zero vector τ 0, t s straghtforward to verfy that the V τ can be rewrtten as s V τ = τ (W τ + d ) (10) =1 12

13 where W τ = sup E ν ν P N Φ(z N+1 )+ L(k,z s k)+ =1 τ N q k(z k ) s =1 τ R(ν x0 ( ) µ x0 ( )) N (R(ν k ( x 0, w 1,... w k 1 ) µ k ( x 0, w 1,... w k 1 ))) Hence, t follows from Theorem 2.1 that f Γ, we can wrte sup ν Ξ E ν [J(z 0N+1 )] = mn τ Γ. s τ (W τ + d ). (11) =1 We now look at a rsk senstve method for evaluatng the quantty W τ. Ths s obtaned drectly usng the dualty between relatve entropy and free energy whch occurs n the theory of large devatons: Lemma 2.2 For each τ 0, [ Φ(zN+1 ) + W τ = log E µ {exp N L(k, z k ) + s =1 τ N ]} q k (z k ) s=1 τ where the probablty measure µ( ) s as defned for the reference system (1). Proof. When Φ, L, and the q k are bounded ths result follows from the relatve entropy representaton for logarthms of exponental ntegrals (Proposton n [10]) and an applcaton of the chan rule for relatve entropy (Theorem C.3.1 n [10]). When one or more of these non-negatve functons s unbounded, we use Proposton of [10] n place of Proposton Remarks By evaluatng W τ usng ths formula, the requred worst case cost (7) can be found by solvng the fnte dmensonal optmzaton problem correspondng to (11). Also note that t follows from (10) and the defnton of W τ that V τ s a convex functon of τ. Hence, the fnte dmensonal optmzaton problem n (11) s a convex optmzaton problem. At ths pont, we recall that for the case of no uncertanty (5), the mnmum n (11) s acheved n the lmt as s =1 τ. Furthermore n ths case, t follows from a standard result on rsk senstve cost functons that } s N lm τ W τ = E µ {Φ(z N+1 ) + L(k, z k ). s =1 τ =1 That s, as expected, n the case of no uncertanty, the formula (11) reduces to a smple evaluaton of the cost functon. 13

14 3 Mnmax Controller Synthess In ths secton, we apply the approach developed n the prevous secton to the problem of syntheszng a controller whch solves a mnmax stochastc optmal control problem for a stochastc uncertan system wth relatve entropy constrants. The man result of ths secton shows that ths problem can be converted nto an equvalent problem of rsk senstve optmal control whch can be solved va nformaton state methods. The class of systems consdered n ths secton wll take a more concrete form than those consdered n the prevous secton. As n the prevous secton, we wll consder stochastc uncertan systems defned n terms of a known reference (nomnal) system and a perturbed system. Reference System The reference system to be consdered n ths secton s defned by the state equatons x k+1 = f(k, x k, u k ) + g(k, x k )w k y k+1 = h(k, x k ) + v k (12) where x k R n s the system state wth ntal condton x 0, w k R r s the process nose nput, v k R l s the measurement nose nput, u k R m s the control nput and y k R l s the measured output. The functons f(k, ), g(k, ) and h(k, ) are assumed to be Borel measurable. The ntal condton vector x 0, the process nose nput sequence w 0N and measurement nose sequence v 0N are assumed to be random varables wth a specfed jont probablty measure µ(dw 0N dv 0N dx 0 ). Ths jont probablty measure s assumed to have the followng dependence structure: µ(dw 0N dv 0N dx 0 ) = µ x0 (dx 0 )µ 0w (dw 0 x 0 )µ 0v (dv 0 x 0 )µ 1w (dw 1 x 0, w 0, v 0 )µ 1v (dv 1 x 0, w 0, v 0 ) µ 2w (dw 2 x 0, w 0, w 1, v 0, v 1 )µ 2v (dv 2 x 0, w 0, w 1, v 0, v 1 )... µ Nw (dw N x 0, w 0,... w N 1, v 0,... v N 1 )µ Nv (dv N x 0, w 0,... w N 1, v 0,... v N 1 ). Ths mples the condtonal ndependence of the system nose and observaton nose. Perturbed System The correspondng perturbed system s also defned by the state equatons x k+1 = f(k, x k, u k ) + g(k, x k ) w k z k = b(k, x k, u k ) y k+1 = h(k, x k ) + v k (13) 14

15 where z k R q s the uncertanty output. In ths case, w 0N and v 0N are perturbed nose sequences. Also, the functons b(k, ) are assumed to be Borel measurable. Furthermore, the uncertanty output vector z k s assumed to be parttoned as z k = z 1 k z 2 k. z s k = b 1 (k, x k, u k ) b 2 (k, x k, u k ). b s (k, x k, u k ). (14) The jont probablty measure of w 0N, v 0N and x 0 s denoted ν(d w 0N d v 0N dx 0 ) and s contaned n the set P of all probablty measures on the varables w 0N, v 0N and x 0. In the sequel, we wll make use of the followng decomposton of the measure ν: ν(d w 0N d v 0N dx 0 ) = ν x0 (dx 0 )ν 0 (d w 0 d v 0 x 0 )ν 1 (d w 1 d v 1 x 0, w 0, v 0 )ν 2 (d w 2 d v 2 x 0, w 0, w 1, v 0, v 1 )... ν N (d w N d v N x 0, w 0,..., w N 1, v 0,..., v N 1 ). It follows mmedately from ths defnton that P s a convex set of probablty measures. Admssble Controllers We consder causal partal nformaton controllers of the form u k = K(k, y 0k+1 ) (15) where u k R m s the control nput at tme k and y 0k+1 s the output sequence over the tme nterval {0, 1,..., k + 1}. It s assumed that the operator K(k, ) defnes a Borel measurable mappng R l(k+1) R m. The class of all such controllers s denoted Λ. Assocated wth the stochastc uncertan system (12), (13) s the followng set of nonnegatve valued Borel measurable functons: q j ( ) : R p R = 0, 1,..., N j = 1, 2,..., s (16) These functons wll determne the admssble perturbed nose processes n the system (13). Relatve Entropy Constrant In order to complete the defnton of the controlled stochastc uncertan system correspondng to the state equatons (12), (13), we now specfy the class of admssble perturbed nose sequences. The admssble perturbed nose processes wll be random processes satsfyng a Relatve Entropy Constrant. To defne ths relatve entropy constrant, frst let d 1 > 0, d 2 > 0,..., d s > 0 be gven postve constants. 15

16 Gven any admssble controller of the form (15), a probablty measure ν( ) P defnes an admssble perturbed nose random process for the resultng closed loop system (13), (15) f R(ν x0 ( ) µ x0 ( )) d ( ) E ν N R(νk ( x + 0, w 0,... w k 1, v 0,... v k 1 ) µ k ( x 0, w 0,... w k 1, v 0,... v k 1 )) q k (zk) 0 (17) for all = 1, 2,..., s. Here µ k (dw k dv k x 0, w 0,..., w k 1, v 0,..., v k 1 ) = µ kw (dw k x 0, w 0,..., w k 1, v 0,..., v k 1 ) µ kv (dv k x 0, w 0,..., w k 1, v 0,..., v k 1 ). For a gven controller K of the form (15), the set of all probablty measures ν P defnng admssble perturbed nose random processes s denoted Ξ K. Note that for any controller K Λ, t follows from one of the propertes of relatve entropy and the assumed nonnegatvty of the functons q k ( ), that the set Ξ K wll contan the probablty measure µ( ) correspondng to the reference system (12). Remarks To motvate the above uncertanty descrpton and ts connecton to the Sum Quadratc Constrant (SQC) uncertanty descrpton such as consdered n [22], we now look at the specal case n whch the reference probablty measure s Gaussan. That s, where N N µ(dw 0N dv 0N dx 0 ) = θ(dw k ) η(dv k )ψ(dx 0 ) (18) θ(dξ) = [(2π) r det(υ)] 1 2 exp[ 1 2 ξt Υ 1 ξ]dξ η(dξ) = [(2π) l det(ω)] 1 2 exp[ 1 2 ξt Ω 1 ξ]dξ ψ(dξ) = [(2π) n det( Σ 0 )] 1 2 exp[ 1 2 (ξ x 0) T Σ 1 0 (ξ x 0 )]dξ. Hence, the reference nose sequences w k and v k are ndependent zero mean whte Gaussan nose sequences wth nose covarance matrces Υ > 0 and Ω > 0 respectvely. Also, the 16

17 ntal condton x 0 s a Gaussan random varable wth mean x 0 and covarance matrx Σ 0 > 0. Furthermore, we suppose that the functon q k (z) s of the form q k (z) = 1 2 z 2. In ths case, the relatve entropy constrant wll allow for a perturbed system of the form x k+1 = f(k, x k, u k ) + g(k, x k ) ( 1k (x k, u k ) + w k ) z k = b(k, x k, u k ) y k+1 = h(k, x k ) + 2k (x k, u k ) + v k (19) where the ntal condton s a Gaussan random varable wth mean x 0 + x0 and covarance matrx Σ 0. Then w k = w k + 1k, v k = w k + 2k and the relatve entropy constrant wll be satsfed provded the followng sum quadratc constrant s satsfed: [ 1 E 2 T x Σ x0 + 1 ] N ( T 2 1k Υ 1 1k + T 2kΩ 1 2k zk 2) d 0 for all = 1, 2,..., s. Note however that even n ths specal case of a Gaussan reference probablty measure, the class of perturbed probablty measures satsfyng the relatve entropy constrant (17) s larger than the class of perturbed probablty measures defned by the above sum quadratc constrant. Cost Functonal The cost functonal to be consdered n ths secton s of the form N J(x 0N+1, u 0N ) = Φ(x N+1 ) + L(k, x k, u k ) (20) where the functons Φ( ) and L(k, ) are Borel measurable. Ths cost functonal extends the cost functonal (6) to the case of controlled stochastc uncertan systems. The above system and cost functonal s requred to satsfy the followng assumptons. Assumpton 3.1 The functons Φ( ) and L( ) satsfy Φ(x) 0 and L(k, x, u) 0 for all k = 0, 1,..., N, x R n and u R m. Assumpton 3.2 For any admssble controller K Λ, the resultng closed loop system s such that sup E ν J(x 0N+1, u 0N ) =. (21) ν P 17

18 Note that Assumpton 3.2 s a smlar controllablty and observablty assumpton to Assumpton 2.1. Although condton (21) s requred to hold for all admssble controllers, the fact that the measurement nose v k can be arbtrary means that ths assumpton s of a smlar type to Assumpton 2.1. The mnmax control problem under consderaton n ths secton nvolves fndng an admssble controller to mnmze the worst case of the expectaton of the cost functonal (20). That s, we are concerned wth the mnmax control problem nf sup E ν J(x 0N+1, u 0N ). (22) K Λ ν Ξ K In the followng theorem, we show that ths constraned mnmax problem can be replaced by a correspondng unconstraned stochastc game problem. Ths stochastc game problem s defned n terms of the followng augmented cost functonal J τ (x 0N+1, u 0N ) N = Φ(x N+1 ) + L(k, x k, u k ) R(ν s x0 ( ) µ x0 ( )) d + ( τ N R(νk ( x 0, w 0,... w k 1, v 0,... v k 1 ) µ k ( x 0, w 0,... w k 1, v 0,... v k 1 )) =1 q k (zk) ) where τ 1 0, τ 2 0,..., τ s 0, are gven constants. In ths stochastc game problem, the maxmzng player nput s a probablty measure ν( ) P and the mnmzng player nput u 0k s assumed to have a partal nformaton structure of the form (15). We let Ṽτ denote the lower value n ths game problem. That s Also, we defne a set Γ R s as Ṽ τ = nf sup K Λ ν P E [J τ (x 0N+1, u 0N )]. (23) Γ = {τ = [τ 1 τ 2... τ s ] R s : τ 1 0, τ 2 0,..., τ s 0 & Ṽτ s fnte}. It follows from Assumpton 3.2 that zero s not contaned n the set Γ. Theorem 3.1 Consder the stochastc uncertan system (12), (13), (17) wth cost functonal J(z 0N+1 ). Then the followng conclusons hold: 18

19 () For the mnmax stochastc optmal control problem nf sup E ν [J(x 0N+1, u 0N )], (24) K Λ ν Ξ K the value of ths optmal control problem s fnte f and only f the set Γ s non-empty. () If the set Γ s non-empty, then nf sup E ν [J(x 0N+1, u 0N )] = nf Ṽ τ. (25) K Λ ν Ξ K τ Γ Proof. Usng Theorem 2.1, the proof of ths result follows va dentcal steps to the proof of Theorem 3.1 of [8]. Remark For any non-zero vector τ 0, t s straghtforward to verfy that the quantty Ṽ τ can be re-wrtten as s ( ) Ṽ τ = τ Wτ + d where =1 W τ = nf sup E ν K Λ ν P N Φ(x N+1 )+ L(k,x s k,u k )+ =1 τ N q k(zk ) s =1 τ R(ν x0 ( ) µ x0 ( )) N (R(ν k ( x 0, w 0,... w k 1, v 0,... v k 1 ) µ k ( x 0, w 0,... w k 1, v 0,... v k 1 ))) Hence, t follows from Theorem 3.1 that f Γ, we can wrte s nf sup ( ) E ν [J(x 0N+1, u 0N )] = nf τ Wτ + d. (26) K Λ ν Ξ K τ Γ =1 The followng theorem shows that the partal nformaton stochastc game defnng the quantty W τ can be replaced by an equvalent partal nformaton rsk senstve optmal control problem Theorem 3.2 W τ = nf K Λ log E µ { exp [ Φ(xN+1 ) + N L(k, x k, u k ) + s =1 τ N q k (zk) ]} s=1 τ (27) where the probablty measure µ( ) s as defned for the reference system (12). 19

20 Proof. The proof of ths theorem follows drectly by applyng Lemma 2.2 to the closed loop system formed by applyng an arbtrary controller K Λ to the uncertan system (12), (13), (17); see also [10]. Remark Theorems 3.1 and 3.2 presented above enable us to convert problems of mnmax optmal control for the class of stochastc uncertan systems under consderaton nto correspondng rsk senstve optmal control problems dependent on the vector of Lagrange multplers τ 0. In the sequel, we consder the soluton of these rsk senstve optmal control problems. 4 Soluton to the Rsk Senstve Optmal Control Problems In ths secton, we apply some exstng results on rsk senstve optmal control gven n [11] and [12] to solve the rsk senstve optmal control problem derved n the prevous secton. 4.1 State Feedback Case We frst consder the state feedback case usng the results of [11]. In order to apply the results of [11], we assume that the stochastc uncertan system under consderaton s such that the reference state equaton (12) s of the form and the perturbed state equaton (13) s of the form x k+1 = f(k, x k, u k ) + g(k, x k )w k (28) x k+1 = f(k, x k, u k ) + g(k, x k ) w k z k = b(k, x k, u k ) (29) For a gven vector τ 0, wll solve the rsk senstve optmal control problem assocated wth the system (28) and the rsk senstve cost functonal J RS τ = log E µ { exp [ Φ(xN+1 ) + N L(k, x k, u k ) + s =1 τ N q k (zk) ]} s=1. (30) τ 20

21 Assumptons We assume that the reference nose probablty measure µ( ) s of the form where N µ(dw 0N dx 0 ) = θ(dw k )ψ(dx 0 ) (31) ψ(dx 0 ) = δ x0 (dx 0 ) That s, the ntal condton s assumed to be known for the reference system. It follows from Theorem 3.2 that the soluton to the rsk senstve control problem (28), (30), (31) yelds the quantty W τ n equaton (27). We now use dynamc programmng to solve ths state feedback rsk senstve control problem. Dynamc Programmng Equaton In order to solve the rsk senstve control problem (28), (30), (31), we consder a functon Z τ (x; k) defned by the followng dynamc programmng equaton: Z τ (x; k) = nf u log E µ [ exp. Z τ (x, N + 1) = Φ(x) { L(k, x, u) + s=1 τ q k (zk) }] s=1 + Z τ (f(k, x, u) + g(k, x)w k ; k + 1) τ Applyng standard dynamc programmng methods to the rsk senstve control problem (28), (30), (31), we obtan the followng proposton; e.g., see [11]. Proposton 4.1 Let the non-zero vector τ 0 be gven and suppose Z τ (x; k) s defned by the dynamc programmng equaton (32). Then W τ, the optmal value of the rsk senstve control problem (28), (30), (31) s gven by (32) W τ = Z τ ( x 0 ; 0). (33) Furthermore, f for each k and x R n, the nfmum n (32) s acheved at u(x, k) then ths defnes the optmal state feedback controller n the rsk senstve control problem (28), (30), (31). Combnng the above proposton wth Theorems 3.1 and Theorem 3.2, we can then solve the mnmax optmal control problem (24) as follows: For each non-zero vector τ 0, we solve the dynamc programmng equaton (32) and use equaton (33) to determne the correspondng value of Wτ. Then, as n (26) we optmze wth respect to the vector τ 0. Assumng that the nfmum n (26) s acheved at a non-zero τ = τ, we then solve the dynamc programmng equaton (32) wth ths value of τ. Then the optmal cost can be evaluated usng formulas (33) and (26). Also, f the nfmum n (32) s acheved at each k and x R n, ths defnes the correspondng mnmax stochastc optmal controller. 21

22 4.2 Partal Informaton Case We now apply the results of [12] to solve the rsk senstve optmal control problem (27) n the partal nformaton case. In order to apply the results of [12], we assume that the stochastc uncertan system under consderaton s such that the reference state equaton (12) s of the form: x k+1 = f(k, x k, u k ) + g(k, x k )w k y k+1 = h(k, x k ) + v k. (34) Also, the perturbed state equaton (13) s assumed to be of the form: x k+1 = f(k, x k, u k ) + g(k, x k ) w k z k = b(k, x k, u k ) y k+1 = h(k, x k ) + v k. (35) For a gven non-zero vector τ 0, wll solve the partal nformaton rsk senstve control problem assocated wth the system (34) and the rsk senstve cost functonal (30). Assumptons We assume that the reference nose probablty measure µ( ) s of the form where N N µ(dw 0N dv 0N dx 0 ) = θ(dw k ) η(dv k )ψ(dx 0 ) (36) θ(dξ) = [(2π) r det(υ)] 1 2 exp[ 1 2 ξt Υ 1 ξ]dξ η(dξ) = [(2π) l det(ω)] 1 2 exp[ 1 2 ξt Ω 1 ξ]dξ ψ(dξ) = [(2π) n det( Σ 0 )] 1 2 exp[ 1 2 (ξ x 0) T Σ 1 0 (ξ x 0 )]dξ. That s, we assume that the reference nose probablty measure s Gaussan. It follows from Theorem 3.2 that the soluton to the rsk senstve control problem (34), (30), (36) yelds the quantty W τ n equaton (27). We now use nformaton state approach of [12] to solve ths partal nformaton rsk senstve control problem. In order to apply the results of [12] to ths partal nformaton rsk senstve optmal control problem, we frst ntroduce some defntons regardng the nformaton state: 22

23 Notaton Usng the results of [12], t follows that the soluton to the partal nformaton rsk senstve optmal control problem under consderaton can be gven n terms of an nformaton state process σ k (z) defned recursvely as follows: where the operator Σ(k, u, y) s defned as follows: Σ(k, u, y)σ(z)dz = ζ k (dz; ξ) R n (2π) n 2 exp σ k+1 = Σ(k, u k, y k+1 )σ k, σ 0 = ψ (37) ( L(k, ξ, u) + s=1 τ q k (b (k, ξ, u)) s=1 τ ) Ψ(k, ξ, y)σ(ξ)dξ. where and ζ k (dz; ξ) = P µ (f(k, ξ, u) + g(k, ξ)w dz) ( Ψ(k, ξ, y) = exp 1 ) 2 h(k, ξ)t Ω 1 h(k, ξ) + h(k, ξ) T Ω 1 y. (38) Dynamc Programmng Equatons As well as the above nformaton state recurson, the approach of [12] also nvolves a dynamc programmng equaton. In order to present ths dynamc programmng equaton, we must frst defne a change of measure for the system (34). Ths change of measure s defned n terms of an alternatve reference system: x k+1 = f(k, x k, u k ) + g(k, x k )w k y k+1 = v k (39) Gven any controller of the form (15), let α(dx 0N, dy 0N ) denote the probablty measure on the sequences x 0N, y 0N nduced by the probablty measure µ( ) and the equatons (34), (15). Also, let α (dx 0N, dy 0N ) denote the probablty measure on the sequences x 0N, y 0N nduced by the probablty measure µ( ) and the equatons (39), (15). As n [12], the Radon-Nkodym dervatve between these two probablty measures s gven by dα dα = Fk N+1 =1 Ψ(, x 1, y ) where F k denotes the complete fltraton generated by (x 0k, y 0k ). 23

24 Usng ths notaton, we then defne quantty S(σ, k) accordng the followng dynamc programmng equaton S(σ, k) = nf u E [S(Σ(k + 1, u, y k+1 )σ, k + 1)] S(σ, N + 1) = σ(x) exp Φ(x)dx (40) R n where E denotes expectaton wth respect to the probablty measure α ( ). Usng the above notaton, we are now n a poston to present the soluton to the partal nformaton rsk senstve control problem (28), (30), (31) gven n [12]. Note that the results of [12] are stated under the assumpton that the functons n the system (34) and cost functon (30) are bounded and unformly contnuous and that the controls are restrcted to a bounded set. However, t s straghtforward to verfy that the proof of the result n [12] whch leads to the followng proposton does not requre these assumptons. Proposton 4.2 Consder the partal nformaton rsk senstve optmal control problem (28), (30), (31) and suppose that the nformaton state σ k s defned by the equaton (37) and the functon S(σ, k) s defned by the dynamc programmng equaton (40). Then W τ the optmal value n ths rsk senstve control problem s gven by W τ = S(ψ, 0). Furthermore, suppose u k s an admssble controller such that for each k, u k = ū k(σ k ) where ū k(σ) acheves the nfmum n (40). Then ths controller s an optmal controller for the partal nformaton rsk senstve optmal control problem under consderaton. As n the state feedback case, we can use ths result to solve the mnmax optmal control problem (22) n the partal nformaton case. Ths s acheved by optmzng over the non-zero vector τ 0 to fnd the nfmum n (26). For the optmal value of τ (f t exsts), the correspondng partal nformaton mnmax optmal controller s obtaned as n the above proposton. 5 The Lnear Quadratc Gaussan Case In ths secton, we specalze the results of the prevous secton to the lnear quadratc Gaussan case. Usng the results of [13], we present the soluton to the state feedback rsk 24

25 senstve optmal control problem n terms of a Rccat dfference equaton. Furthermore, usng the results of [14] and [15], we present the soluton to the partal nformaton rsk senstve optmal control problem n terms of a par of Rccat dfference equatons. These results then enable solutons to the correspondng mnmax LQG problems to be presented. 5.1 State Feedback Case In order to apply the results of [13], we assume that the stochastc uncertan system under consderaton s such that the reference state equaton s of the form and the perturbed state equaton s of the form x k+1 = A k x k + B k u k + D k w k (41) x k+1 = A k x k + B k u k + D k w k z k = E 1k x k + E 2k u k (42) Assumptons We assume that the vector z k has been parttoned as n (14) and wrte z k = E 1kx k + E 2ku k for = 1, 2,..., s. Furthermore, we assume that ( E 1k ) T E 2k = 0, k. Also, we assume that the functons q k (z) n the relatve entropy constrant (4) are of the form q k (z) = z 2. (43) It s assumed that the cost functonal (20) s of the form J(x 0N+1, u 0N ) = 1 2 xt N+1Q N+1 x N N ( ) x T 2 k Q k x k + u T k R k u k (44) where Q k 0 & R k > 0 k. Furthermore, we assume that ths system and cost functonal satsfes Assumpton

26 For a gven vector τ 0, we wll solve the rsk senstve control problem assocated wth the system (41) and the rsk senstve cost functonal J RS τ = log E µ exp 1 2 xt N+1Q N+1 x N [ N x T 2 k Qk + 2 s =1 τ (E1k) T E1k + 1 [ N u T 2 k Rk + 2 s =1 τ (E2k) T ] E2k uk s=1 τ ] xk. (45) We assume that the reference nose probablty measure µ( ) s a Gaussan probablty measure of the form N µ(dw 0N dx 0 ) = θ k (dw k )ψ(dx 0 ) (46) where θ k (dξ) = [(2π) r det(υ k )] exp[ 2 ξt Υ 1 k ξ]dξ ψ(dx 0 ) = δ x0 (dx 0 ). Note, that we have assumed that x 0 has a known value of x 0 for the reference system. Rccat Equatons The soluton to the state feedback lnear quadratc Gaussan rsk senstve control problem (41), (45), (46) s gven n terms of the followng Rccat dfference equaton: P k = s ( ) Q k + 2 τ E T 1k E 1k =1 ) +A T k Pk+1 P s ( ) 1 k+1 B k (R k + 2 τ E T 2k E 2k + Bk T P k+1 B k Bk T P k+1 A k =1 P k+1 = P k+1 + P k+1 D k ( Υ 1 k D T k P k+1 D k ) 1 D T k P k+1 P N+1 = Q N+1. We wll requre that the soluton to ths Rccat equaton satsfes the followng condtons: (47) Υ 1 k Dk T P k+1 D k > 0 k P k > 0 k. (48) 26

27 Also, the followng recursve equaton needs to be solved: F k = F k+1 ( [Υ 1 det k Dk T P ] ) 1 k+1d k det(υ k ) F N+1 = 1. (49) Then, applyng the result of [13] to the rsk senstve control problem (41), (45), (46), we obtan the followng proposton: Proposton 5.1 Let the non-zero vector τ 0 be gven and suppose P k, Pk and F k are defned as above and satsfy condtons (48). Then W τ, the optmal value of the rsk senstve control problem (41), (45), (46) s gven by W τ = log [ F 0 exp ( x T 0 P 0 x 0 )] = x T 0 P 0 x 0 + log F 0 (50) Furthermore, the correspondng state feedback optmal control law s gven by for k = 0, 1,..., N. ( ) s ( ) 1 u k = R k + 2 τ E T 2k E 2k + Bk T P k+1 B k B T P k k+1 A k x k =1 As n the prevous secton, we can use ths result to solve the mnmax optmal control problem (22) n the state feedback lnear quadratc Gaussan case. Ths s acheved by optmzng over the vector τ 0 to fnd the nfmum n (26). For the optmal value of τ (f t exsts), the correspondng partal nformaton mnmax optmal controller s obtaned as n the above proposton. Remark The controller desgn algorthm defned by the above proposton and the optmzaton wth respect to τ 0 n (26) s essentally the same as the controller desgn algorthm proposed n [8] n the lnear state feedback case. Ths s n spte of the fact that a dfferent uncertanty descrpton s consdered here. However, n the lnear partal nformaton case to follow, the results presented here are dfferent to those of [8]. Indeed n [8], no tractable soluton could be obtaned n the lnear partal nformaton case. 27

28 5.2 Partal Informaton Case We now apply the results of [14] (see also [15] for an equvalent result) to solve the rsk senstve optmal control problem (27) n the partal nformaton case. In order to apply the results of [14], we assume that the stochastc uncertan system under consderaton s such that the reference state equaton (12) s of the form: x k+1 = Ax k + Bu k + Dw k y k+1 = Cx k + v k. (51) Also, the perturbed state equaton (13) s assumed to be of the form: x k+1 = Ax k + Bu k + D w k z k = E 1 x k + E 2 u k y k+1 = Cx k + v k. (52) As n the state feedback case, we assume that the vector z k has been parttoned as n (14) and wrte z k = E 1x k + E 2u k for = 1, 2,..., s. Furthermore, we assume that ( E 1 ) T E 2 = 0, k. Also, for ths uncertan system, we assume that the uncertanty s descrbed by the relatve entropy constrant (17) where the functons q k (z) are of the form (43). We assume that the reference nose probablty measure µ( ) s a Gaussan probablty measure of the form N N µ(dw 0N dv 0N dx 0 ) = θ(dw k ) η(dv k )ψ(dx 0 ) (53) where θ(dξ) = [(2π) r det(υ)] 1 2 exp[ 1 2 ξt Υ 1 ξ]dξ η(dξ) = [(2π) l det(ω)] 1 2 exp[ 1 2 ξt Ω 1 ξ]dξ ψ(dξ) = [(2π) n det( Σ 0 )] 1 2 exp[ 1 2 (ξ x 0) T Σ 1 0 (ξ x 0 )]dξ. 28

29 Hence, the reference nose sequences w k and v k are ndependent zero mean whte Gaussan nose sequences wth nose covarance matrces Υ > 0 and Ω > 0 respectvely. Also, the ntal condton x 0 s a Gaussan random varable wth mean x 0 and covarance matrx Σ 0 > 0. For a gven vector τ 0, wll solve the partal nformaton rsk senstve control problem assocated wth the system (51), the rsk senstve cost functonal (45) and nose process (53). In the cost functon (45) t s assumed that Q k Q 0 & R k R > 0 k. Informaton State Equatons The soluton to the partal nformaton lnear quadratc Gaussan rsk senstve control problem (51), (45), (53) s gven n terms of the followng Rccat dfference equaton equaton whch s solved forward n tme: Σ k+1 = ( ) s ( ) 1 DΥD T + A Σ 1 k + C T Ω 1 C Q 2 τ E T 1 E 1 A T =1 Σ 0 = Σ 0. The soluton to ths dfference equaton s requred to satsfy the followng condtons (54) s ( ) Σ 1 k + C T Ω 1 C Q 2 τ E T 1 E 1 > 0 k =1 Σ k > 0 k (55) We also consder the followng flter state equaton: ˆx k+1 = Aˆx k + Bu k ( ) s ( ) 1 +A Σ 1 k + C T Ω 1 C Q 2 τ E T 1 E 1 =1 ( [ ] ) s ( ) C T Ω 1 [y k+1 C ˆx k ] + Q + 2 τ E T 1 E 1 ˆx k =1 ˆx 0 = x 0. (56) The quanttes Σ k, ˆx k together wth a normalzng constant Z k defned n [14] defne the nformaton state of the rsk senstve control problem (51), (45), (53). 29

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models