CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Control of Stochastic Systems - P.R. Kumar

CONROL OF SOCHASIC SYSEMS P.R. Kumar Deparmen of Elecrical and Compuer Engineering, and Coordinaed Science Laboraory, Universiy of Illinois, Urbana-Champaign, USA. Keywords: Markov chains, ransiion probabiliies, conrolled Markov chains, noisy observaions, parially observed sysems, linear sochasic sysems, linear Gaussian sysems, conrolled auo regressive models, auoregressive moving average sysems wih exogenous inpus, cos funcions, ime-horizon, erminal cos, running cos, saefeedback policy, Markov policy, opimal cos-o-go, dynamic programming equaion, principle of opimaliy, linear quadraic Gaussian problem, equilibrium poins, sabiliy, counable Markov chains, seady-sae, super-maringales, Lyapunov funcions, sopping imes, recurrence, posiive recurrence, sochasic sabiliy, esimaion, Bayes Rules, Kalman filer, minimum mean-square error esimae, condiional mean, condiional covariance, poserior probabiliy disribuion, Bayesian and Non-Bayesian approaches o adapive conrol, parameer vecor, maximum likelihood, leas squares esimae, predicion error esimae, sysem idenificaion, consisency, separaion heorem, cerainy equivalence, informaion sae, hypersae, self-uning regulaors. Conens 1. Inroducion 2. Models of Sochasic Sysems 3. Opimal Sochasic Conrol 4. Sabiliy of Sochasic Sysems 5. Esimaion of Sochasic Sysems 6. Idenificaion and Parameer Esimaion of Sochasic Sysems 7. Conrol of Parially Observed Sysems 8. Adapive Conrol Glossary Bibliography Biographical Skech Summary We presen an accoun of several opics, modeling, conrol, esimaion, sabiliy, idenificaion and adapive conrol, which arise in he sudy of he conrol of sochasic sysems. 1. Inroducion A holisic reamen of he problem of conrol of sochasic sysems encompasses he following opics: (i) Models of sochasic sysems (ii) Opimal sochasic conrol

(iii) Sabiliy of sochasic sysems (iv) Esimaion of sochasic sysems (v) Idenificaion of sochasic sysems (vi) Conrol of parially observed sysems (vii) Adapive conrol We presen an ouline of each of hese opics which will enable he reader o obain an inegraed perspecive of he field. 2. Models of Sochasic Sysems A discree-ime sochasic process { x( )} = 0 is a Markov chain if p( x( + 1) x(0),, x( )) = p( x( + 1) x ( )). ha is, he condiional disribuion of he fuure sae x ( + 1) depends on he pas ( x(0),, x ( )) only hrough he presen sae x (). Indeed his jusifies he use of he name sae. he Markov chain can hen be described by is ransiion probabiliies p( x( + 1) x ( )). Exending his noion, one can describe a conrolled Markov chain by is conrolled ransiion probabiliies p ( x x, u ) which describe he condiional probabiliy of he nex sae x ( + 1) being x, when he curren sae x( ) = x, and an inpu u( ) = u is applied. If he sae x ( ) is no observed, hen i is common o model he observaions y ( ) by he condiional probabiliy disribuion p( yx ) which describes he probabiliy disribuion of he observaion y ( ) when he sae x( ) = x. + he sysem is hen called a parially observed conrolled Markov chain. If he ransiion probabiliies depend on he ime, hen one can describe he imevarying sysem by he pair of ransiion probabiliies p ( x x,u, ) and p( yx, ). A common deerminisic noise-free sae space model of a sysems in discree-ime is x( + 1) = f( x( ), u( ), ) y( ) = g( x( ), ), where x ( ) is he sae of he sysem a ime, u ( ) is he inpu applied a ime, and y () is he oupu a ime. he corresponding sochasic analog of he sae space model is

x( + 1) = f( x( ), u( ), w( ), ) y( ) = g( x( ), v( ) ), where w () is he noise enering he sae equaion, and v () is he noise enering he observaion equaion. hese noises are modeled as sochasic processes (see Models of Sochasic Sysems). If { w(0), w(1), w (2), } are muually independen, hen x ( ) is indeed he sae of a conrolled Markov chain. If furher { v(0), v(1), v (2), } are also muually independen, and w, v are independen of each oher, hen one has a parially observed conrolled Markov chain wrien in he form of sae and observaion equaions. If { w(0), w(1), w (2), } are no muually independen, hen one ofen models hem as he oupu of a sysem driven by independen random variables { n(0), n (1),, m(0), m(1), } z( + 1) = h( z( ), n( )) w( ) = k( z( ), m( )). In such siuaions, one can adjoin z o x and le ( x,z ) serve as he sae. A special and imporan case of such a sae space model is a linear sochasic sysem: x( + 1) = A( ) x( ) + B( ) u( ) + G( ) w( ) y() = C() x() + H() v(), where A(), B(), C(), G() and H () are ime-varying marices of appropriae dimensions. A model ha paricularly lends iself o analysis is when he noise processes w () and v () are joinly Gaussian sochasic processes (see Models of Sochasic Sysems). hen i is called a Linear-Gaussian model. Insead of dealing wih he sae x (), one can direcly model how he inpu influences he oupu, i.e., by an inpu-oupu model. he mos common model is a Conrol Auoregressive Moving Average Model (CARMA) or Auoregressive Moving Average Model wih Exogenous Inpus (ARMAX) model: y() + a1y( 1) + + any( n) = b0u() + b1u( 1) + + bu ( n) + w( ) + cw( 1) + + cw( n). n One can also consider he coninuous ime counerpar of he sae-space model: dx( ) = f( x( ), u( ), ) d+σ ( x( )) dw( ) dy() = g( x()) d+ dv(). 1 n

Here w ( ) and v ( ) are Brownian moion processes, and one has o inerpre he above sochasic differenial equaions in he appropriae mahemaical way. his requires a knowledge of Io sochasic inegrals and sochasic calculus. 3. Opimal Sochasic Conrol Consider he case of a discree-ime sochasic sysem where he sae x ( ) is direcly observed. How should one choose he conrol inpu { u ( )} o be applied o such a sysem? A common approach is o consider a cos funcion of he form E hx ( ( + 1)) + c( x( ), u ( )), = 0 and choose conrol inpus which minimize his expeced cos. Above is a ime horizon, hx ( ( + 1)) is he erminal cos, and c( x( ), u ( )) is he running cos. One minimizes his cos over he se of hisory dependen sraegies where u() = u( x(0),, x (),) is allowed o depend on he enire pas of he observaions and he curren ime. I can be shown ha wihin his class of hisory dependen sraegies one can resric aenion o sraegies of he form u() = u( x (),) where he inpu depends only on he curren sae and curren ime. Such a sraegy can be ermed as a sae feedback policy or a Markov policy. If one defines he opimal remaining cos or opimal cos-o-go from a sae x a ime by V( x, ): = Min u() E h( x( + 1)) + c( x( s), u( s)) x( ) = x, s= hen i can be shown ha his funcion saisfies he following equaion: V( x, ) = Min u c( ) + ( ) V ( + 1), x,u p x x, u x, x wih he erminal condiion V ( x, + 1) = h( x ). Essenially he above equaion says ha he opimal cos from a sae x a ime is obained by considering differen choices of an inpu u o apply a ime. For each such poenial inpu u, one deermines he curren cos c( x,u ) as well as he expeced cos from he sae reached a he nex ime insan. hen, one simply chooses he bes inpu o apply a he presen ime as he one which minimizes he sum of he expeced curren cos plus he expeced remaining cos. his equaion is called he dynamic programming equaion, and he logic leading o i as he principle of opimaliy. I also

follows ha if for ( x, ) one chooses he minimizing u, calling i ux, ( ), hen ux, ( ) is he opimal policy. hus he opimal policy can be chosen as a Markov or a sae feedback policy. he dynamic programming approach can be exended o oher models and siuaions, as shown in Dynamic Programming. A paricular special case of grea ineres in conrol is he so-called Linear-Quadraic- Gaussian (LQG) problem. For a linear sysem wih independen whie Gaussian noises w and v, x( + 1) = A( ) x( ) + B( ) u( ) + G( ) w( ) y() = C() x() + H() v(), one seeks o minimize a quadraic cos crierion: E x ( + 1) Sx( + 1) + ( x ( ) Q( ) x ( ) + u ( ) R( ) u ( )), = 1 where S 0, Q( ) 0 and R ( ) >0. he cos-o-go funcion urns ou o be quadraic funcion of he sae plus a deerminisic erm: V( x, ) = x S() x + γ (). By subsiuing his form in he dynamic programming equaion, one can solve for S ( ) and γ ( ) in erms of S+ ( 1) and γ ( + 1) (remember ha dynamic programming solves he problem backwards in ime). Wih he boundary condiions S ( + 1) = S and γ ( + 1) = 0, one hus obains recursions for S ( ) and γ ( ). From he minimizing argumen in he dynamic programming equaion one also deermines ha he opimal conrol law is of he form u() = K() x (), i.e., linear ime varying feedback, wih K ( ) expressible in erms of S. ( ) he LQG problem hus admis a clean soluion. he deails of he soluion are given in LQsochasic Conrol. Given ha he quadraic cos funcion is a reasonable crierion, and given he widespread usage of linear models, his soluion has proved o be eminenly useful in conrol sysem design. In many siuaions of ineres, e.g., in adapive conrol and self-uning regulaors, see Self-uning Conrol, one wishes o work wih inpu-oupu models wih quadraic coss. his is deal wih in Minimum Variance Conrol.

- - - O ACCESS ALL HE 18 PAGES OF HIS CHAPER, Click here Bibliography B. Anderson and J.B. Moore, Opimal Filering. Englewood Cliffs, NJ: Prenice-Hall, 1979. [Conains a comprehensive reamen of esimaion for linear sochasic sysems]. D.P. Bersekas, Dynamic Programming: Deerminisic and Sochasic Models. Englewood Cliffs, NJ: Prenice-Hall, 1987. [A comprehensive reamen of discree-ime dynamic programming]. G.C. Goodwin and K.S. Sin, Adapive Filering, Predicion and conrol. Englewood Cliffs, NJ: Prenice- Hall, 1984. [Conains a reamen of discree-ime modeling of linear sysems, as well as idenificaion and adapive conrol]. P.R. Kumar and P.P. Varaiya, Sochasic Sysems: Esimaion, Idenificaion and Adapive Conrol. Englewood Cliffs, NJ: Prenice-Hall, 1986. [Conains a concise reamen of several opics including modeling, conrol, esimaion, idenificaion, and adapaion]. H. Kushner, Inroducion o Sochasic Conrol. Hold, Rinehar and Winson, 1971. [Conains a reamen of sochasic conrol as well as sochasic sabiliy]. L.Ljung and. Södersrom, heory and Pracice of Recursive Idenificaion. Cambridge, MA: MI Press, 1983. [Conains a reamen of idenificaion and parameer esimaion for linear sysems]. Biographical Skech P.R. Kumar is he Franklin W. Woelge Professor of Elecrical and Compuer Engineering, and a Research Professor in he Coordinaed Science Laboraory, a he Universiy of Illinois, Urbana- Champaign. He was he recipien of he Donald P. Eckman Award of he American Auomaic Conrol Council. He has presened plenary lecures a he SIAM Annual Meeing and he SIAM Conrol Conference in 2001, he IEEE Conference on Decision and Conrol in San Anonio, exas, 1993, he SIAM Conference on Opimizaion in Chicago, 1992, he SIAM Annual Meeing a San Diego, 1994, Brazilian Auomaic Conrol Congress, and he hird Annual Semiconducor Manufacuring, Conrol and Opimizaion Workshop. He is co-auhor wih Pravin Varaiya of he book, Sochasic Sysems: Esimaion, Idenificaion and Adapive Conrol. He serves on he ediorial boards of Communicaions in Informaion and Sysems, Journal of Discree Even Dynamic Sysems; Mahemaics of Conrol Signals and Sysems; Mahemaical Problems in Engineering: Problems, heories and Applicaions; and in he pas has served as Associae Edior a Large of IEEE ransacions on Auomaic Conrol; Associae Edior of SIAM Journal on Conrol and Opimizaion; Sysems and Conrol Leers; Journal of Adapive Conrol and Signal Processing; and he IEEE ransacions on Auomaic Conrol. He is a Fellow of IEEE. Professor Kumar s curren research ineress are in wireless neworks, disribued real-ime sysems wafer fabricaion plans, and machine learning.