Robust Dynamic Programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice

Size: px
Start display at page:

Download "Robust Dynamic Programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice"

Transcription

1 roceengs of the 2007 IEEE Symposum on Approxmate Dynamc rogrammng an Renforcement Learnng (ADRL 2007) Robust Dynamc rogrammng for Dscounte Infnte-Horzon Markov Decson rocesses wth Uncertan Statonary Transton Matrce Baohua L Department of Electrcal Engneerng Arzona State Unversty Tempe, AZ Emal: BaohuaL@asueu Jenne S Department of Electrcal Engneerng Arzona State Unversty Tempe, AZ Telephone: (480) Fax: (480) Emal: S@asueu Abstract In ths paper, fnte-state, fnte-acton, scounte nfnte-horzon-cost Markov ecson processes (MDs) wth uncertan statonary transton matrces are scusse n the etermnstc polcy space Uncertan statonary parametrc transton matrces are clearly classfe nto nepenent an correlate cases It s ponte out n ths paper that the optmalty crteron of unform mnmzaton of the mum expecte total scounte cost functons for all ntal states, or robust unform optmalty crteron, s not approprate for solvng MDs wth correlate transton matrces A new optmalty crteron of mnmzng the mum quaratc total value functon s propose whch nclues the prevous crteron as a specal case Base on the new optmalty crteron, robust polcy teraton s evelope to compute an optmal polcy n the etermnstc statonary polcy space Uner some assumptons, the soluton s guarantee to be optmal or near-optmal n the etermnstc polcy space I INTRODUCTION Dynamc programmng (D) s a computatonal approach to fnng an optmal polcy by employng the prncple of optmalty ntrouce by Rchar Bellman [1] D an approxmate D can solve the problems of Markov ecson processes (MDs) wth accurate knowlege of transton probabltes to obtan an optmal or near-optmal polcy by the algorthms such as value teraton, polcy teraton, evolutonary polcy teraton, approxmate polcy teraton base on Monte Carlo smulaton [2] [5] However, n practce, accurate transton probabltes are very ffcult to obtan Thus, exact solutons va classc D are not attanable, an those algorthms are not applcable to obtan solutons any more Moreover, the estmate transton probabltes may be far away from the true values ue to nose an other ssues assocate wth the estmaton process, or the estmaton error may not be trval that t can result n sgnfcant evaton of the solutons from the optmal values [6] Hence, the ea of set estmaton for transton matrces wth hgh confence can be use to allevate some of the efct from naccurate pont estmaton Wth those uncertan transton matrces, robust ynamc programmng s esre to aress the ssue of esgnng approxmaton metho wth an approprate robustness to exten the power of the Bellman Equaton Representatve efforts n evelopng robust ynamc programmng can be foun n [6] [10] One commonly use prncple of optmalty crteron for robust algorthms s to mnmze the mum value functons for all ntal states, whch s referre to as robust unform optmalty crteron n ths paper Base on ths optmalty crteron, robust value teraton an robust polcy teraton were propose n [6] an [7], respectvely, to obtan a etermnstc, unformly optmal polcy To eal wth uncertan transton matrces, the noton of correlaton n a transton matrx was ntrouce n [6] an [10] n the context of MDs The exstng optmalty crtera an assocate robust algorthms were evelope only for MDs wth nepenent transton matrces In ths paper, fnte-state, fnte-acton MDs wth scounte nfnte-horzon cost s scusse The optmal polcy s constrane n the etermnstc polcy space The transton matrces are assume to be statonary, whch s reasonable when systems are slowly varyng The contrbuton of ths paper s as follows Frst, mathematcally clear an more tractable efntons of nepenent an correlate uncertan statonary transton matrces are prove Usng these newly formulate efntons, robust unform optmalty crteron s not applcable for MDs wth correlate transton matrces Because of the correlaton of uncertan transton matrces, for any gven polcy, there may be no optmal transton matrces such that the value functons of the polcy reach mum unformly for all ntal states Ths makes the comparson among polces meanngless Even f such transton matrces exst for all polces, there may not be an optmal polcy such that ts mum value functons reach mnmum unformly for all ntal states Hence, a new optmalty crteron of mnmzng quaratc total value functon s propose Base on ths crteron, uner a weak conton, there exsts an optmal polcy whch can be optmal non-unformly n ntal states The prevous crteron becomes a specal case of the new crteron Note that the exstng robust value teraton an robust polcy teraton can not guarantee solutons for /07/$ IEEE 96

2 roceengs of the 2007 IEEE Symposum on Approxmate Dynamc rogrammng an Renforcement Learnng (ADRL 2007) MDs wth correlate transton matrces A new robust polcy teraton s evelope to obtan an optmal polcy n the statonary space Uner some assumptons, the soluton s guarantee to be optmal or near-optmal n the etermnstc polcy space The rest of the paper s organze as follows In secton 2, the optmalty crteron of mnmzng the mum quaratc total value functon s propose an theorems are evelope to show that ths crteron makes robust unform optmalty crteron a specal case an uner some contons, statonary optmal polces are proven to be optmal or near-optmal n the etermnstc polcy space Base on ths optmalty crteron, robust polcy teraton s evelope n secton 3 The paper conclues n secton 4 II OTIMALITY CRITERION USING QUADRATIC TOTAL VALUE FUNCTIONS In ths secton, an MD wth uncertan parametrc transton matrx s formulate Base on ths formulaton, we present why an optmal soluton may be senstve to transton probabltes After that, uncertan parametrc transton matrces are classfe nto nepenent an correlate cases Analyss emonstrates why an optmal polcy may not exst uner robust unform optmalty crteron Consequently, the optmalty crteron usng the quaratc total value functon s propose It s proven that robust unform optmalty crteron s a specal case Beses, two theorems an one corollary are prove to show that uner some assumptons, statonary optmal polces are optmal or near-optmal n the etermnstc polcy space Fnte-state, fnte-acton, nfnte-horzon MDs wth statonary transton matrces are escrbe as follows Let T enote the screte, nfnte ecson horzon, where T = {0, 1, 2, } At each stage, the system occupes a state S, where S s the state space wth n states an enote as S = {1, 2,, n} Wth each state S, a ecson maker s allowe to choose an acton a etermnstcally from a fnte state-epenent set of allowable actons, enote by A = {1, 2,, m } Let M = n =1 m, an let a t be a functon mappng states nto actons wth a t () A at the tme t, e, a t : 1 2 n Denote a polcy by π a 1 a 2 a n A 1 A 2 A n (1) π = (a 1, a 2, ), (2) an let Π represent the etermnstc polcy space Denote a statonary polcy by π s π s = (a, a, ), (3) an let Π s represent the etermnstc statonary polcy space Obvously, Π s Π Defne the cost corresponng to state S an acton a A by c(, a) Assume that c(, a) s nonnegatve The costs are tme scounte by a factor γ (0 < γ < 1) The system starts from an ntal state The states make Markov transtons accorng to statonary transton probabltes p a j from one state to another state j uner an acton a All transton probabltes consttute an M n transton matrx ( = 1 1 ) ( a n mn ), (4) where the superscrpt enotes the transpose an a represents a transton probablty row uner the state S an the acton a A In a more general settng, let each transton probablty p a j be represente by a functon of a parameter vector enote as θ = {ϑ 1,, ϑ q }, e, p a j = p a j(θ), j S, (5) where 0 p a j 1 an n j=1 pa j = 1 In [6] [10], the parameters are specfe as the transton probabltes themselves, that s, θ = {p 1,, p j,, p (n 1),, pa 1,, p a j,, p a (n 1),, pmn n1,, pmn nj,, pmn n(n 1) } (6) The transton probablty row a an the transton matrx can also be represente as a functon of θ, e, a = a(θ) an = (θ) Assume the parameter vector θ vary n a known subset Θ R 1 q An the transton matrx s varyng n the set = { : θ Θ} It s not too ffcult to see that optmal value functons may be very senstve to the perturbaton of the parameters n transton matrx Conser a statonary polcy π s efne n (3) For any, v πs = (I γ πs ) 1 C πs, (7) where v πs s the expecte total scounte cost functon uner the polcy π s an the traston matrx, πs an C πs are the n n transton matrx an the n 1 cost functon vector respectvely to π s, e, πs = a(1) 1 a(n) n c(1, a(1)), C πs = c(, a()) c(n, a(n)) (8) Obvously, v πs s contnuous n [11] However, can be scontnuous n θ over Θ, whch may lea to scontnuous value functons v πs n θ Even f s contnuous n θ an therefore v πs s contnuous n Θ, the functon vπs stll may not be smooth enough, that s, a small perturbaton n θ results n relatvely large varaton n v πs Therefore there s a nee to evelop robust solutons uner uncertanty We are now n a poston to ntrouce the concepts of nepenence an correlaton n an πs Defnton (Correlate transton matrx, Inepenent 97

3 roceengs of the 2007 IEEE Symposum on Approxmate Dynamc rogrammng an Renforcement Learnng (ADRL 2007) transton matrx): The transton matrx s correlate f a mn n, (9) where a = { a (θ) : θ Θ} The transton matrx s nepenent f = a mn n (10) Defnton (Correlate transton matrx for π s, Inepenent transton matrx for π s ): The transton matrx πs s correlate f πs a(1) 1 n a(n), (11) where πs = { πs (θ) : θ Θ} The transton matrx πs s nepenent f πs = a(1) 1 n a(n) (12) () The set a mn n s an M-mensonal hyper-rectangle Matrx s correlate f s a proper subset of ths hyper-rectangle, an s nepenent f s equal to the hyper-rectangle An exact transton matrx s a specal case of an nepenent transton matrx () The set a(1) 1 n a(n) s an n-mensonal hyper-rectangle Matrx πs s correlate f πs s a proper subset of ths hyper-rectangle, an πs s nepenent f πs s equal to the hyper-rectangle () Accorng to the above efntons, MDs are classfe nto those wth nepenent transton matrces an those wth correlate transton matrces The optmalty crteron of mnmzng the mum value functon for any ntal state s gven as follows mn π Π vπ () S, (13) where v π () s the expecte total scounte cost functon uner the polcy π Π an the transton matrx at the ntal state, e, v π () = E π ( γ t c(j, a) ) (14) t=0 Accorng to the optmalty crteron gven n (13), an optmal polcy s efne as follows Defnton (Optmal polcy): A polcy π s optmal f there exsts such that v π () = vπ () = mn π Π vπ () S, (15) where there exsts θ Θ such that = (θ ) For MDs wth nepenent transton matrces, a etermnstc optmal polcy has been proven exstent [6], [7] However, for MDs wth correlate transton matrces, an optmal polcy oes not always exst The efnton of an optmal polcy gven by (15) can be nterprete as the contons for the exstence of an optmal polcy: () for any gven π Π, there s, at least one such that v π () = vπ () for any S, where from here on epens on π; () there s at least one π Π such that v π () = mn π Π vπ () for any S Such an optmal polcy s unformly optmal for ntal states For MDs wth correlate transton matrces, the above contons are too strong to be satsfe, whch can be shown n the followng example Example: conser a two-state, two-acton, nfnte-horzon MD Let the state space be S = {1, 2} The acton spaces at state 1 an state 2 are A 1 = {1, 2} an A 2 = {1, 2} Accorng to (1) an (3), all four statonary polces n Π s are efne by π 1, π 2, π 3, π 4 as follows, π 1 = (,, ), (16) π 2 = (,, ), (17) π 3 = (,, ), (18) π 4 = (,, ) (19) Let the transton matrx have the followng formulaton 1 1 θ θ 1 = = θ 2 1 θ 2 1 θ θ1 2, (20) 1 θ 3 θ 3 an the cost functons are as follows, c(1, 1) = 1, c(1, 2) = 2, c(2, 1) = 3, c(2, 2) = 4 (21) The scount factor γ s chosen at 09 Let Ω = {0, 02, 04, 06, 08, 1} Let θ = {θ 1, θ 2, θ 3 } an Θ = {θ : θ 1, θ 2, θ 3 Ω} The transton matrx s correlate, that s, where , (22) = 2 1 = = {(0, 1), (02, 08), (04, 06), (06, 04), (08, 02), (1, 0)}, (23) 1 2 = {(1, 0), (096, 004), (084, 016), (064, 036), Actually, by (8), (036, 064), (0, 1)} (24) π2 = π3 = π4 = θ θ 1, (25) 1 θ 3 θ 3 θ2 1 θ 2 1 θ1 2 θ1 2, (26) θ2 1 θ 2 (27) 1 θ 3 θ 3 By (12), π2, π3 an π4 are nepenent an the mum value functons are reachable However, for π 1, θ θ π1 = 1 1 θ1 2 θ1 2 (28) 98

4 roceengs of the 2007 IEEE Symposum on Approxmate Dynamc rogrammng an Renforcement Learnng (ADRL 2007) By (11), t s correlate an then there s no such that the mum value functons for all ntal states are reachable Thus, by the optmalty crteron efne n (13), an optmal polcy oes not exst Thus, a new optmalty crteron s neee an ts corresponng optmal polcy also nee to be efne For any fxe transton matrx, the quaratc total value functon for a polcy π s efne as follows, v π 2 = (v π ) v π, (29) where n terms of the value functon efne n (14) v π = ( v π (1) vπ () vπ (n) ) (30) The new optmalty crteron s to mnmze the mum quaratc total value functon, e, mn π Π vπ 2 (31) Defnton (Optmal polcy): A polcy π s optmal f v π 2 = vπ 2 = mn π Π vπ 2 = mn π Π vπ 2 (32) () Base on the optmalty crteron usng the quaratc total value functon, an optmal polcy can be non-unformly optmal n ntal states () The exstence of such an optmal polcy s guarantee uner a weak conton For any π, s easly satsfe to make the mum value reachable For example, the set s compact an close an the functon v π 2 s contnuous over Even f the mum an mnmum values are not reachable, an mn can be replace by nf an sup, respectvely, an an near-optmal polcy can be easly propose () If the cost functon s also parameterze n θ, the optmalty crteron can be mofe usng the quaratc value functon wth the cost functon represente as c(, a, θ) (v) If for any polcy π, the probablty strbuton of the ntal states s non-unform, or the value functons for fferent ntal states have fferent weghts, the optmalty crteron can be mofe usng weghte quaratc total value functon v π 2 W = (vπ W v π ) Actually, the optmalty crteron efne by (31) generalzes the unform optmalty crteron efne by (13), whch s shown n the followng Theorem 1 Theorem 1: If an optmal polcy exsts wth regar to the optmalty crteron efne n (13), then ths optmalty crteron s equvalent to the one efne n (31) That s, f an optmal polcy exsts, a polcy uner the optmalty crteron efne n (13) s optmal f an only f t s optmal uner the optmalty crteron efne n (31) roof: Snce an optmal polcy exsts uner the optmalty crteron efne n (13), for any π Π, there exsts such that v π () = vπ () S (33) an let π 1 be an optmal polcy such that v π1 () = vπ1 Snce v π () 0, by (33), () = mn π Π vπ () S (34) (v π ()) 2 = (vπ ()) 2 = v π 2, (35) an then by (34), mn (v π ()) 2 = mn π Π π Π (vπ ()) 2 = v π1 2 (36) : By (35) an (36), v π1 2 = (v π1 ())2 = mn π Π (v π ()) 2 (37) Hence, π 1 s an optmal polcy uner the optmalty crteron efne n (31) : Let π 2 be an optmal polcy uner the optmalty crteron efne n (31) Then, for any π Π, there exsts Q such that vq π 2 = ( v π Q () ) 2 = (v π ()) 2, (38) an for π 2, Q 2 = ( By (35), for any π Π, an then by (36), Snce an then Q () ) 2 = mn π Π (v π ()) 2 (39) v π Q 2 = (vπ ()) 2, (40) Q 2 = mn π Π (vπ ()) 2 (41) 2 Q () (vπ2 ())2, 2 Q () = (vπ2 ())2, (42) Q () = vπ2 () (43) Snce (vπ2 ())2 mn π Π (vπ ())2, 2 Q () = mn π Π (vπ ()) 2, (44) an then Hence, Q () = Q vπ2 () = mn π Π vπ () (45) () = mn π Π vπ2 () S (46) That s to say, π 2 s an optmal polcy uner the optmalty crteron efne n (13) 99

5 roceengs of the 2007 IEEE Symposum on Approxmate Dynamc rogrammng an Renforcement Learnng (ADRL 2007) Generally, an optmal polcy uner the optmalty crteron usng the quaratc total value functon may be non-statonary However, uner some assumptons, statonary optmal polces are optmal or ɛ-optmal n the polcy space Π Theorem 2: If mn vπ 2 = mn v π 2, (47) statonary optmal polces are optmal n the polcy space Π roof: For any fxe, snce mn v π π Π 2 = s mn π Π vπ 2, by (47), mn vπ 2 = mn v π 2 = mn π Π vπ 2 (48) Snce by weak ualty, mn π Π vπ 2 mn π Π vπ 2, mn vπ 2 mn π Π vπ 2 (49) Snce mn vπ 2 mn π Π vπ 2, mn vπ 2 = mn π Π vπ 2 (50) That s to say, statonary optmal polces are also optmal n Π Assume that Θ s a compact an close subspace of the vector space R 1 q an for any π Π, the functon v π (θ) 2 s contnuous n θ Then, gven an arbtrarly small constant ɛ > 0, for any θ 0 Θ, there exsts δθ π 0 > 0, such that v π (θ) 2 v π (θ 0) 2 < ɛ, θ B π θ0 Θ (51) where Bθ π 0 = {θ : θ θ 0 < δθ π 0 } Snce Bθ π 0 Θ, (52) θ 0 Θ by Hene-Borel theorem, there exst fnte balls to cover Θ Let those selecte balls be Bθ π 1,, Bθ π j,, Bθ π r an π = { (θ j ) : 1 j r}, where θ j an r epen on π An then = π (53) π Π Theorem 3: Wth the above assumptons an notatons, f for any π Π s, mn v π (θ) (θ) 2 = (θ) mn v π (θ) π Π 2, (54) s then statonary optmal polces on are ɛ-optmal an statonary optmal polces on are also ɛ-optmal n the polcy space Π roof: Wthout loss of generalty, for any π Π, let (θ ), (θ 1 ) an (θ 2 ) π such that v π (θ ) 2 = v π (θ 1) 2 = (θ) vπ (θ) 2, (55) v π (θ) (θ) 2, (56) 0 v π (θ ) 2 v π (θ 2) 2 < ɛ (57) Snce v π (θ 2) 2 v π (θ 1) 2 v π (θ ) 2, Then, 0 (θ) vπ (θ) 2 v π (θ) (θ) 2 < ɛ (58) 0 mn π Π (θ) vπ (θ) 2 mn v π (θ) π Π (θ) 2 ɛ (59) By (54) an Theorem 2, Hence, mn v π 2 = mn v π 2 (60) (θ) π Π (θ) 0 mn π Π (θ) vπ (θ) 2 mn v π (θ) (θ) 2 ɛ, (61) that s to say, statonary optmal polces on are ɛ-optmal n Π Snce 0 mn (θ) vπ (θ) 2 mn v π (θ) (θ) 2 ɛ, 0 mn (θ) vπ (θ) 2 mn π Π = mn (θ) vπ (θ) 2 mn (θ) vπ (θ) 2 v π (θ) (θ) 2 (62) + mn v π (θ) (θ) 2 mn π Π (θ) vπ (θ) 2 ɛ (63) Hence, statonary optmal polces on are ɛ-optmal n Π () If for any fxe π Π, the functon v π (θ) 2 s Lpschtz wth the constant L π that epens on π, that s, v π (θ 1) 2 vθ π 2 2 L π θ 1 θ 2, θ 1, θ 2 Θ (64) then the parameter δθ π 0 can be chosen as follows, δθ π 0 = δ π ɛ = L π + 1, θ 0 Θ (65) That s to say, δθ π 0 s nepenent of θ 0 () If the functon v π (θ) 2 s contnuously fferentable wth respect to θ, t s also Lpschtz an the constant L π can be chosen as follows, > L π v π (θ) 2 θ Θ θ (66) () The smaller the L π s, the larger the δ π shoul be, an then the smaller the carnalty of π s, whch can make the conton (54) relatvely easy to satsfy (v) If for some polces, the constants L π are equal, the same balls Bθ π j can be selecte to cover Θ an also ther sets π are equal, whch may reuce the carnalty of an make the conton (54) satsfe more easly When for all π Π, the functons v π (θ) 2 are Lpschtz wth the constant L that s nepenent of π, then the parameters δθ π 0 can be nepenent of not only θ 0 but also of π, that s, δθ π 0 = δ = ɛ L + 1, θ 0 Θ, π Π, (67) 100

6 roceengs of the 2007 IEEE Symposum on Approxmate Dynamc rogrammng an Renforcement Learnng (ADRL 2007) an consequently, the balls Bθ π 0 are nepenent of π, that s, B π θ 0 = B θ0 = {θ : θ θ0 < δ}, π Π (68) Hence, for all π, the same balls Bθ π j can be selecte to cover Θ an also all π are equal Thus, the set becomes fnte, that s, = f = { (θj ) : 1 j r}, where θ j an r are nepenent of π Corollary 1: Wth the above assumptons an notatons, f mn v π 2 = mn v π 2, (69) (θ) f (θ) f then statonary optmal polces on f are ɛ-optmal an statonary optmal polces on are also ɛ-optmal n the polcy space Π roof: By (69) an Theorem 3, statonary optmal polces on f are ɛ-optmal an statonary optmal polces on are also ɛ-optmal n Π () If the contnuous ervatves of the functons v π (θ) 2 are unformly boune for all π Π, the Lpschtz constant L can be chosen as follows, v π (θ) > L sup 2 π Π θ Θ θ (70) () The smaller the L s, the larger the δ shoul be, an consequently the smaller the carnalty of f s, whch can make the conton (69) relatvely easy to satsfy III ROBUST OLICY ITERATION UNDER QUADRATIC TOTAL VALUE FUNCTION Base on the optmalty crteron of mnmzng the mum quaratc total value functon gven n (31), a robust polcy teraton s evelope to fn a statonary optmal polcy for MDs Even though ths soluton s generally suboptmal n the etermnstc polcy space Π, accorng to Theorems 2, 3, an Corollary 1, ths statonary optmal polcy can be guarantee to be optmal or ɛ-optmal n Π In ths secton, we conser statonary polces For smplcty, the subscrpt s whch ncates assocaton wth statonary polces, s roppe out olcy teraton nclues polcy evaluaton an polcy mprovement steps Uner the polcy evaluaton step, for any polcy π = (a, a, ), the mum quaratc total value functon s expresse as follows n terms of (7) v π 2 = vπ 2 = (Cπ ) ( I γ ( π ) ) 1 (I γ π ) 1 C π (71) The mum value v π 2 an the optmal are to be calculate Such approaches are avalable to compute these values Actually, f the transton matrx for the polcy π s nepenent, an effcent teratve metho gven n Algorthm 1 can be use to compute v π 2 () Gven a polcy π, for the mum pont, the transton Algorthm 1 Iteratve Algorthm for mum quaratc total value functon of a statonary polcy wth ts nepenent transton matrx 1 select v π 0 Rn 1 an set k = 0; 2 compute v π k+1 by vk+1 π () = c(, a()) + γ vk π, S; (72) 3 termnate f vk+1 π = vπ k ; otherwse, ncrement k by 1 an go to 2; 4 output vk π 2 as v π 2 probablty rows for ( all states ) an the corresponng actons, a(), enote by are compute as follows, { } = arg, S, (73) an there are no specal constrants for other rows () A proof of convergence of the teratve algorthm uner the total value functon can be obtane smlar to that n [6], [7] () In prncple, the ntal value v0 π can be selecte arbtrarly n R n 1 However, a carefully selecte v0 π can accelerate the convergence of the teratve process To see that, let an G 1 = {v : v() c(, a()) + γ G 2 = {v : v() c(, a()) + γ v π k v} (74) v} (75) If v π 0 G 1, the sequence {v π k } s non-ecreasng If vπ 0 G 2, the sequence {v π k } s non-ncreasng Thus, the sequence {vπ k } converges to v π faster than those wth vπ 0 / G 1 G 2 For the polcy mprovement step, the polcy can be mprove easly for MDs wth nepenent transton matrces usng robust polcy teraton [7] However, such mprovement metho oes not guarantee solutons for MDs wth correlate transton matrces Hence, the polcy elmnaton metho s consere heren as a means of mprovng polcy The polcy elmnaton process entals elmnaton of some non-optmal polces urng each teraton by a necessary conton that a statonary polcy s optmal The nequalty v µ 2 v πk 2 gven n (77) s such a necessary conton for a statonary polcy µ beng optmal, where v µ 2 s the quaratc total value functon of µ at, v πk 2 s the mum quaratc total value functon of π k an s the corresponng optmal matrx Robust polcy teraton uner quaratc total value functon s escrbe n etals n Algorthm 2 () Robust polcy teraton uner quaratc total value functon termnates n fnte teratons when the carnalty of the set Π k, enote by Π k n Algorthm 2, s equal to one, snce the carnalty of Π 0 s fnte an Π k+1 Π k (k = 0, 1, 2, ) The sequence {π k } converges to a statonary optmal polcy () A goo ntal polcy π 0, whch has relatvely small total value functon, can accelerate the convergence process 101

7 roceengs of the 2007 IEEE Symposum on Approxmate Dynamc rogrammng an Renforcement Learnng (ADRL 2007) Algorthm 2 Robust olcy Iteraton Uner Quaratc Total Value Functon 1 Intalzaton: set k = 0, Π 0 = Π s, O 1 = +, {π 1 } =, an select π 0 Π 0 ; 2 olcy evaluaton: compute v π k 2 an such that 3 olcy mprovement: f Π k > 1 (a) elmnate polces to obtan Πk such that v π k 2 = vπ k 2 ; (76) Π k = {µ Π k : v µ 2 v π k 2 }; (77) f Πk > 1 f v π k 2 < O k 1 (b) set O k = v π k 2, Πk = Πk {π k 1 }; (c) f Πk = 1, set Π k+1 = Πk = {π k }, π k+1 = π k, O k+1 = O k = v π k 2, ncrement k by 1, an go to 4; otherwse, go to (); () set Π k+1 = Πk, select π k+1 Π k+1 {π k }, ncrement k by 1, an return to 2; else (e) f Πk {π k } {π k 1 } 1, select π k Πk {π k } {π k 1 }, set Π k = Πk {π k }, π k = π k, an return to 2; otherwse, set Π k = Πk {π k } = {π k 1 }, π k = π k 1, O k = O k 1 = v π k 1 2 ; else (f) set Π k+1 = Πk = {π k }, π k+1 = π k, O k+1 = v π k 2, ncrement k by 1, an go to 4; else (g) go to 4; 4 Termnaton: output π k as the statonary optmal polcy wth the corresponng optmal transton matrx an mum quaratc total value functon O k [3] Hyeong Soo Chang, Hong-G Lee, Mchael C Fu, an Steven I Marcus, Evolutonary olcy Iteraton for Solvng Markov Decson rocesses, IEEE Transactons on Automatc Cntrol, vol 50, n 11, pp , 2005 [4] D Berstsekas an J Tstskls, Neuro-Dynamc rogrammng, Athena Scentfc Massachusetts, 1996 [5] J S, A G Barto, W B owell an D Wunsch, Hanbook of Learnng an Approxmate Dynamc rogrammng, Wley-IEEE ress, 2004 [6] A Nlm an L E Ghaou, Robust Control of Markov Decson rocesses wth Uncertan Transton Matrces, Operatons Research, vol 53, n 5, pp , 2005 [7] J K Sata an R E Lave, Jr, Markov Decson rocesses wth Uncertan Transton robabltes, Operatons Research, vol 21, pp , 1973 [8] C C Whte an H K Eleb, Markov Decson rocesses wth Imprecse Transton robabltes, Operatons Research, vol 42, pp , 1994 [9] R Gvan, S Leach, an T Dean, Boune arameter Markov Decson rocesses, Artfcal Intellgence, vol 122, no 1-2, pp , 2000 [10] S Kalyanasunaram, E K Chong, an N B Shroff, Markov Decson rocesses wth Uncertan Transton Rates: Senstvty an Robust Control, roceengs of the 41st IEEE Conference on Decson an Control, vol 4, pp , 2002 [11] M Abba, an J A Flar, erturbaton an Stablty Theory for Markov Control roblems, IEEE transatons on Automatc Control, vol 37, pp , 1992 () Robust polcy teraton uner quaratc total value functon can be extene to solvng problems wth parameterze cost functons an weghte quaratc total value functons (v) Assume that the set Θ s compact an close n R 1 q, an f for any fxe π an (θ), the quaratc total value functon can be compute wthn arbtrarly small error boun, then Algorthm 2 can be extene to obtan a statonary ɛ-optmal polcy IV CONCLUSION A theoretcal framework s propose to stuy MDs wth uncertantes n the transton probablty matrces As a result, the paper conclues wth a robust polcy teraton proceure uner a newly propose total value functon, whch guarantees optmal or near-optmal solutons Such a robust construct may be especally useful n practcal applcatons when there s too lmte amount of expermental ata to obtan an accurate transton probablty matrx, or when an accurate estmaton of the probablty transton matrx s unrealstc The propose framework s straghtforwar to apply n practcal problems REFERENCES [1] Bellman an R kalaba, Dynamc rogrammng an Moern Control Theory, New York: Acaemc ress, 1965 [2] M utterman, Markov Decson rocesses: Dscrete Stochastc Dynamc rogrammng, Wley-Interscnce, New York,

Chapter 2 Transformations and Expectations. , and define f

Chapter 2 Transformations and Expectations. , and define f Revew for the prevous lecture Defnton: support set of a ranom varable, the monotone functon; Theorem: How to obtan a cf, pf (or pmf) of functons of a ranom varable; Eamples: several eamples Chapter Transformatons

More information

ENTROPIC QUESTIONING

ENTROPIC QUESTIONING ENTROPIC QUESTIONING NACHUM. Introucton Goal. Pck the queston that contrbutes most to fnng a sutable prouct. Iea. Use an nformaton-theoretc measure. Bascs. Entropy (a non-negatve real number) measures

More information

p(z) = 1 a e z/a 1(z 0) yi a i x (1/a) exp y i a i x a i=1 n i=1 (y i a i x) inf 1 (y Ax) inf Ax y (1 ν) y if A (1 ν) = 0 otherwise

p(z) = 1 a e z/a 1(z 0) yi a i x (1/a) exp y i a i x a i=1 n i=1 (y i a i x) inf 1 (y Ax) inf Ax y (1 ν) y if A (1 ν) = 0 otherwise Dustn Lennon Math 582 Convex Optmzaton Problems from Boy, Chapter 7 Problem 7.1 Solve the MLE problem when the nose s exponentally strbute wth ensty p(z = 1 a e z/a 1(z 0 The MLE s gven by the followng:

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

ENGI9496 Lecture Notes Multiport Models in Mechanics

ENGI9496 Lecture Notes Multiport Models in Mechanics ENGI9496 Moellng an Smulaton of Dynamc Systems Mechancs an Mechansms ENGI9496 Lecture Notes Multport Moels n Mechancs (New text Secton 4..3; Secton 9.1 generalzes to 3D moton) Defntons Generalze coornates

More information

New Liu Estimators for the Poisson Regression Model: Method and Application

New Liu Estimators for the Poisson Regression Model: Method and Application New Lu Estmators for the Posson Regresson Moel: Metho an Applcaton By Krstofer Månsson B. M. Golam Kbra, Pär Sölaner an Ghaz Shukur,3 Department of Economcs, Fnance an Statstcs, Jönköpng Unversty Jönköpng,

More information

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 ) Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often

More information

GENERIC CONTINUOUS SPECTRUM FOR MULTI-DIMENSIONAL QUASIPERIODIC SCHRÖDINGER OPERATORS WITH ROUGH POTENTIALS

GENERIC CONTINUOUS SPECTRUM FOR MULTI-DIMENSIONAL QUASIPERIODIC SCHRÖDINGER OPERATORS WITH ROUGH POTENTIALS GENERIC CONTINUOUS SPECTRUM FOR MULTI-DIMENSIONAL QUASIPERIODIC SCHRÖDINGER OPERATORS WITH ROUGH POTENTIALS YANG FAN AND RUI HAN Abstract. We stuy the mult-mensonal operator (H xu) n = m n = um + f(t n

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

CHAPTER 4 HYDROTHERMAL COORDINATION OF UNITS CONSIDERING PROHIBITED OPERATING ZONES A HYBRID PSO(C)-SA-EP-TPSO APPROACH

CHAPTER 4 HYDROTHERMAL COORDINATION OF UNITS CONSIDERING PROHIBITED OPERATING ZONES A HYBRID PSO(C)-SA-EP-TPSO APPROACH 77 CHAPTER 4 HYDROTHERMAL COORDINATION OF UNITS CONSIDERING PROHIBITED OPERATING ZONES A HYBRID PSO(C)-SA-EP-TPSO APPROACH 4.1 INTRODUCTION HTC consttutes the complete formulaton of the hyrothermal electrc

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

FIRST AND SECOND ORDER NECESSARY OPTIMALITY CONDITIONS FOR DISCRETE OPTIMAL CONTROL PROBLEMS

FIRST AND SECOND ORDER NECESSARY OPTIMALITY CONDITIONS FOR DISCRETE OPTIMAL CONTROL PROBLEMS Yugoslav Journal of Operatons Research 6 (6), umber, 53-6 FIRST D SECOD ORDER ECESSRY OPTIMLITY CODITIOS FOR DISCRETE OPTIML COTROL PROBLEMS Boban MRIKOVIĆ Faculty of Mnng and Geology, Unversty of Belgrade

More information

WHY NOT USE THE ENTROPY METHOD FOR WEIGHT ESTIMATION?

WHY NOT USE THE ENTROPY METHOD FOR WEIGHT ESTIMATION? ISAHP 001, Berne, Swtzerlan, August -4, 001 WHY NOT USE THE ENTROPY METHOD FOR WEIGHT ESTIMATION? Masaak SHINOHARA, Chkako MIYAKE an Kekch Ohsawa Department of Mathematcal Informaton Engneerng College

More information

Large-Scale Data-Dependent Kernel Approximation Appendix

Large-Scale Data-Dependent Kernel Approximation Appendix Large-Scale Data-Depenent Kernel Approxmaton Appenx Ths appenx presents the atonal etal an proofs assocate wth the man paper [1]. 1 Introucton Let k : R p R p R be a postve efnte translaton nvarant functon

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,

More information

On a one-parameter family of Riordan arrays and the weight distribution of MDS codes

On a one-parameter family of Riordan arrays and the weight distribution of MDS codes On a one-parameter famly of Roran arrays an the weght strbuton of MDS coes Paul Barry School of Scence Waterfor Insttute of Technology Irelan pbarry@wte Patrck Ftzpatrck Department of Mathematcs Unversty

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

Reinforcement learning

Reinforcement learning Renforcement learnng Nathanel Daw Gatsby Computatonal Neuroscence Unt daw @ gatsby.ucl.ac.uk http://www.gatsby.ucl.ac.uk/~daw Mostly adapted from Andrew Moore s tutorals, copyrght 2002, 2004 by Andrew

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

Non-negative Matrices and Distributed Control

Non-negative Matrices and Distributed Control Non-negatve Matrces an Dstrbute Control Yln Mo July 2, 2015 We moel a network compose of m agents as a graph G = {V, E}. V = {1, 2,..., m} s the set of vertces representng the agents. E V V s the set of

More information

Appendix B. The Finite Difference Scheme

Appendix B. The Finite Difference Scheme 140 APPENDIXES Appendx B. The Fnte Dfference Scheme In ths appendx we present numercal technques whch are used to approxmate solutons of system 3.1 3.3. A comprehensve treatment of theoretcal and mplementaton

More information

Visualization of 2D Data By Rational Quadratic Functions

Visualization of 2D Data By Rational Quadratic Functions 7659 Englan UK Journal of Informaton an Computng cence Vol. No. 007 pp. 7-6 Vsualzaton of D Data By Ratonal Quaratc Functons Malk Zawwar Hussan + Nausheen Ayub Msbah Irsha Department of Mathematcs Unversty

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Lecture 14: Bandits with Budget Constraints

Lecture 14: Bandits with Budget Constraints IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed

More information

Dynamic Programming. Lecture 13 (5/31/2017)

Dynamic Programming. Lecture 13 (5/31/2017) Dynamc Programmng Lecture 13 (5/31/2017) - A Forest Thnnng Example - Projected yeld (m3/ha) at age 20 as functon of acton taken at age 10 Age 10 Begnnng Volume Resdual Ten-year Volume volume thnned volume

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

FUZZY GOAL PROGRAMMING VS ORDINARY FUZZY PROGRAMMING APPROACH FOR MULTI OBJECTIVE PROGRAMMING PROBLEM

FUZZY GOAL PROGRAMMING VS ORDINARY FUZZY PROGRAMMING APPROACH FOR MULTI OBJECTIVE PROGRAMMING PROBLEM Internatonal Conference on Ceramcs, Bkaner, Inda Internatonal Journal of Modern Physcs: Conference Seres Vol. 22 (2013) 757 761 World Scentfc Publshng Company DOI: 10.1142/S2010194513010982 FUZZY GOAL

More information

Neuro-Adaptive Design - I:

Neuro-Adaptive Design - I: Lecture 36 Neuro-Adaptve Desgn - I: A Robustfyng ool for Dynamc Inverson Desgn Dr. Radhakant Padh Asst. Professor Dept. of Aerospace Engneerng Indan Insttute of Scence - Bangalore Motvaton Perfect system

More information

SELECTED SOLUTIONS, SECTION (Weak duality) Prove that the primal and dual values p and d defined by equations (4.3.2) and (4.3.3) satisfy p d.

SELECTED SOLUTIONS, SECTION (Weak duality) Prove that the primal and dual values p and d defined by equations (4.3.2) and (4.3.3) satisfy p d. SELECTED SOLUTIONS, SECTION 4.3 1. Weak dualty Prove that the prmal and dual values p and d defned by equatons 4.3. and 4.3.3 satsfy p d. We consder an optmzaton problem of the form The Lagrangan for ths

More information

On the Multicriteria Integer Network Flow Problem

On the Multicriteria Integer Network Flow Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of

More information

Portfolios with Trading Constraints and Payout Restrictions

Portfolios with Trading Constraints and Payout Restrictions Portfolos wth Tradng Constrants and Payout Restrctons John R. Brge Northwestern Unversty (ont wor wth Chrs Donohue Xaodong Xu and Gongyun Zhao) 1 General Problem (Very) long-term nvestor (eample: unversty

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Application of Gravitational Search Algorithm for Optimal Reactive Power Dispatch Problem

Application of Gravitational Search Algorithm for Optimal Reactive Power Dispatch Problem Applcaton of Gravtatonal Search Algorthm for Optmal Reactve Power Dspatch Problem Serhat Duman Department of Electrcal Eucaton, Techncal Eucaton Faculty, Duzce Unversty, Duzce, 8620 TURKEY serhatuman@uzce.eu.tr

More information

Lecture 9 Sept 29, 2017

Lecture 9 Sept 29, 2017 Sketchng Algorthms for Bg Data Fall 2017 Prof. Jelan Nelson Lecture 9 Sept 29, 2017 Scrbe: Mtal Bafna 1 Fast JL transform Typcally we have some hgh-mensonal computatonal geometry problem, an we use JL

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

An Asymptotically Efficient Simulation-Based Algorithm for Finite Horizon Stochastic Dynamic Programming

An Asymptotically Efficient Simulation-Based Algorithm for Finite Horizon Stochastic Dynamic Programming An Asymptotcally Effcent Smulaton-Based Algorthm for Fnte Horzon Stochastc Dynamc Programmng Hyeong Soo Chang, Mchael C. Fu, Jaqao Hu, and Steven I. Marcus Abstract We present a smulaton-based algorthm

More information

SIMPLIFIED MODEL-BASED OPTIMAL CONTROL OF VAV AIR- CONDITIONING SYSTEM

SIMPLIFIED MODEL-BASED OPTIMAL CONTROL OF VAV AIR- CONDITIONING SYSTEM Nnth Internatonal IBPSA Conference Montréal, Canaa August 5-8, 2005 SIMPLIFIED MODEL-BASED OPTIMAL CONTROL OF VAV AIR- CONDITIONING SYSTEM Nabl Nassf, Stanslaw Kajl, an Robert Sabourn École e technologe

More information

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence. Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm

More information

Convexity preserving interpolation by splines of arbitrary degree

Convexity preserving interpolation by splines of arbitrary degree Computer Scence Journal of Moldova, vol.18, no.1(52), 2010 Convexty preservng nterpolaton by splnes of arbtrary degree Igor Verlan Abstract In the present paper an algorthm of C 2 nterpolaton of dscrete

More information

High-Order Hamilton s Principle and the Hamilton s Principle of High-Order Lagrangian Function

High-Order Hamilton s Principle and the Hamilton s Principle of High-Order Lagrangian Function Commun. Theor. Phys. Bejng, Chna 49 008 pp. 97 30 c Chnese Physcal Socety Vol. 49, No., February 15, 008 Hgh-Orer Hamlton s Prncple an the Hamlton s Prncple of Hgh-Orer Lagrangan Functon ZHAO Hong-Xa an

More information

Research Article Global Sufficient Optimality Conditions for a Special Cubic Minimization Problem

Research Article Global Sufficient Optimality Conditions for a Special Cubic Minimization Problem Mathematcal Problems n Engneerng Volume 2012, Artcle ID 871741, 16 pages do:10.1155/2012/871741 Research Artcle Global Suffcent Optmalty Condtons for a Specal Cubc Mnmzaton Problem Xaome Zhang, 1 Yanjun

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

A Note on the Numerical Solution for Fredholm Integral Equation of the Second Kind with Cauchy kernel

A Note on the Numerical Solution for Fredholm Integral Equation of the Second Kind with Cauchy kernel Journal of Mathematcs an Statstcs 7 (): 68-7, ISS 49-3644 Scence Publcatons ote on the umercal Soluton for Freholm Integral Equaton of the Secon Kn wth Cauchy kernel M. bulkaw,.m.. k Long an Z.K. Eshkuvatov

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Control of Uncertain Bilinear Systems using Linear Controllers: Stability Region Estimation and Controller Design

Control of Uncertain Bilinear Systems using Linear Controllers: Stability Region Estimation and Controller Design Control of Uncertan Blnear Systems usng Lnear Controllers: Stablty Regon Estmaton Controller Desgn Shoudong Huang Department of Engneerng Australan Natonal Unversty Canberra, ACT 2, Australa shoudong.huang@anu.edu.au

More information

Analytical classical dynamics

Analytical classical dynamics Analytcal classcal ynamcs by Youun Hu Insttute of plasma physcs, Chnese Acaemy of Scences Emal: yhu@pp.cas.cn Abstract These notes were ntally wrtten when I rea tzpatrck s book[] an were later revse to

More information

A MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON

A MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON A MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON PIOTR NAYAR AND TOMASZ TKOCZ Abstract We prove a menson-free tal comparson between the Euclean norms of sums of nepenent ranom vectors

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals Smultaneous Optmzaton of Berth Allocaton, Quay Crane Assgnment and Quay Crane Schedulng Problems n Contaner Termnals Necat Aras, Yavuz Türkoğulları, Z. Caner Taşkın, Kuban Altınel Abstract In ths work,

More information

The Noether theorem. Elisabet Edvardsson. Analytical mechanics - FYGB08 January, 2016

The Noether theorem. Elisabet Edvardsson. Analytical mechanics - FYGB08 January, 2016 The Noether theorem Elsabet Evarsson Analytcal mechancs - FYGB08 January, 2016 1 1 Introucton The Noether theorem concerns the connecton between a certan kn of symmetres an conservaton laws n physcs. It

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

A MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON

A MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON A MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON PIOTR NAYAR AND TOMASZ TKOCZ Abstract We prove a menson-free tal comparson between the Euclean norms of sums of nepenent ranom vectors

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

Grenoble, France Grenoble University, F Grenoble Cedex, France

Grenoble, France   Grenoble University, F Grenoble Cedex, France MODIFIED K-MEA CLUSTERIG METHOD OF HMM STATES FOR IITIALIZATIO OF BAUM-WELCH TRAIIG ALGORITHM Paulne Larue 1, Perre Jallon 1, Bertrand Rvet 2 1 CEA LETI - MIATEC Campus Grenoble, France emal: perre.jallon@cea.fr

More information

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION Advanced Mathematcal Models & Applcatons Vol.3, No.3, 2018, pp.215-222 ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EUATION

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

General viscosity iterative method for a sequence of quasi-nonexpansive mappings

General viscosity iterative method for a sequence of quasi-nonexpansive mappings Avalable onlne at www.tjnsa.com J. Nonlnear Sc. Appl. 9 (2016), 5672 5682 Research Artcle General vscosty teratve method for a sequence of quas-nonexpansve mappngs Cuje Zhang, Ynan Wang College of Scence,

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

Stability of Switched Linear Systems on Cones: A Generating Function Approach

Stability of Switched Linear Systems on Cones: A Generating Function Approach 49th IEEE Conference on Decson and Control December 15-17, 2010 Hlton Atlanta Hotel, Atlanta, GA, USA Stablty of Swtched Lnear Systems on Cones: A Generatng Functon Approach Jngla Shen and Jangha Hu Abstract

More information

Continuous Time Markov Chains

Continuous Time Markov Chains Contnuous Tme Markov Chans Brth and Death Processes,Transton Probablty Functon, Kolmogorov Equatons, Lmtng Probabltes, Unformzaton Chapter 6 1 Markovan Processes State Space Parameter Space (Tme) Dscrete

More information

Large-Margin HMM Estimation for Speech Recognition

Large-Margin HMM Estimation for Speech Recognition Large-Margn HMM Estmaton for Speech Recognton Prof. Hu Jang Department of Computer Scence and Engneerng York Unversty, Toronto, Ont. M3J 1P3, CANADA Emal: hj@cs.yorku.ca Ths s a jont work wth Chao-Jun

More information

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1 Estmatng the Fundamental Matrx by Transformng Image Ponts n Projectve Space 1 Zhengyou Zhang and Charles Loop Mcrosoft Research, One Mcrosoft Way, Redmond, WA 98052, USA E-mal: fzhang,cloopg@mcrosoft.com

More information

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 7, Number 2, December 203 Avalable onlne at http://acutm.math.ut.ee A note on almost sure behavor of randomly weghted sums of φ-mxng

More information

Off-policy Reinforcement Learning for Robust Control of Discrete-time Uncertain Linear Systems

Off-policy Reinforcement Learning for Robust Control of Discrete-time Uncertain Linear Systems Off-polcy Renforcement Learnng for Robust Control of Dscrete-tme Uncertan Lnear Systems Yonglang Yang 1 Zhshan Guo 2 Donald Wunsch 3 Yxn Yn 1 1 School of Automatc and Electrcal Engneerng Unversty of Scence

More information

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,

More information

Erratum: A Generalized Path Integral Control Approach to Reinforcement Learning

Erratum: A Generalized Path Integral Control Approach to Reinforcement Learning Journal of Machne Learnng Research 00-9 Submtted /0; Publshed 7/ Erratum: A Generalzed Path Integral Control Approach to Renforcement Learnng Evangelos ATheodorou Jonas Buchl Stefan Schaal Department of

More information

High resolution entropy stable scheme for shallow water equations

High resolution entropy stable scheme for shallow water equations Internatonal Symposum on Computers & Informatcs (ISCI 05) Hgh resoluton entropy stable scheme for shallow water equatons Xaohan Cheng,a, Yufeng Ne,b, Department of Appled Mathematcs, Northwestern Polytechncal

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

EXPONENTIAL ERGODICITY FOR SINGLE-BIRTH PROCESSES

EXPONENTIAL ERGODICITY FOR SINGLE-BIRTH PROCESSES J. Appl. Prob. 4, 022 032 (2004) Prnted n Israel Appled Probablty Trust 2004 EXPONENTIAL ERGODICITY FOR SINGLE-BIRTH PROCESSES YONG-HUA MAO and YU-HUI ZHANG, Beng Normal Unversty Abstract An explct, computable,

More information

Chapter - 2. Distribution System Power Flow Analysis

Chapter - 2. Distribution System Power Flow Analysis Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load

More information

Time-Varying Systems and Computations Lecture 6

Time-Varying Systems and Computations Lecture 6 Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

Research Article Relative Smooth Topological Spaces

Research Article Relative Smooth Topological Spaces Advances n Fuzzy Systems Volume 2009, Artcle ID 172917, 5 pages do:10.1155/2009/172917 Research Artcle Relatve Smooth Topologcal Spaces B. Ghazanfar Department of Mathematcs, Faculty of Scence, Lorestan

More information

Optimum Design of Steel Frames Considering Uncertainty of Parameters

Optimum Design of Steel Frames Considering Uncertainty of Parameters 9 th World Congress on Structural and Multdscplnary Optmzaton June 13-17, 211, Shzuoka, Japan Optmum Desgn of Steel Frames Consderng ncertanty of Parameters Masahko Katsura 1, Makoto Ohsak 2 1 Hroshma

More information

Stochastic Optimal Controls for Parallel-Server Channels with Zero Waiting Buffer Capacity and Multi-Class Customers

Stochastic Optimal Controls for Parallel-Server Channels with Zero Waiting Buffer Capacity and Multi-Class Customers Advances n Systems Scence and Applcatons (009), Vol.9, No.4 67-646 Stochastc Optmal Controls for Parallel-Server Channels wth Zero Watng Buffer Capacty and Mult-Class Customers Wanyang Da and Youy Feng

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

The lower and upper bounds on Perron root of nonnegative irreducible matrices

The lower and upper bounds on Perron root of nonnegative irreducible matrices Journal of Computatonal Appled Mathematcs 217 (2008) 259 267 wwwelsevercom/locate/cam The lower upper bounds on Perron root of nonnegatve rreducble matrces Guang-Xn Huang a,, Feng Yn b,keguo a a College

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

A Hybrid Variational Iteration Method for Blasius Equation

A Hybrid Variational Iteration Method for Blasius Equation Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 1 (June 2015), pp. 223-229 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) A Hybrd Varatonal Iteraton Method

More information

Robust Solutions to Markov Decision Problems with Uncertain Transition Matrices

Robust Solutions to Markov Decision Problems with Uncertain Transition Matrices Draft. Do not dstrbute Robust Solutons to Markov Decson Problems wth Uncertan Transton Matrces Arnab Nlm, nlm@eecs.berkeley.edu (Correspondng Author) Laurent El Ghaou, elghaou@eecs.berkeley.edu Department

More information

Pattern Classification (II) 杜俊

Pattern Classification (II) 杜俊 attern lassfcaton II 杜俊 junu@ustc.eu.cn Revew roalty & Statstcs Bayes theorem Ranom varales: screte vs. contnuous roalty struton: DF an DF Statstcs: mean, varance, moment arameter estmaton: MLE Informaton

More information

On a direct solver for linear least squares problems

On a direct solver for linear least squares problems ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear

More information

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso Supplement: Proofs and Techncal Detals for The Soluton Path of the Generalzed Lasso Ryan J. Tbshran Jonathan Taylor In ths document we gve supplementary detals to the paper The Soluton Path of the Generalzed

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

ALGORITHM FOR THE CALCULATION OF THE TWO VARIABLES CUBIC SPLINE FUNCTION

ALGORITHM FOR THE CALCULATION OF THE TWO VARIABLES CUBIC SPLINE FUNCTION ANALELE ŞTIINŢIFICE ALE UNIVERSITĂŢII AL.I. CUZA DIN IAŞI (S.N.) MATEMATICĂ, Tomul LIX, 013, f.1 DOI: 10.478/v10157-01-00-y ALGORITHM FOR THE CALCULATION OF THE TWO VARIABLES CUBIC SPLINE FUNCTION BY ION

More information

Inexact Newton Methods for Inverse Eigenvalue Problems

Inexact Newton Methods for Inverse Eigenvalue Problems Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.

More information

The Asymptotic Distributions of the Absolute Maximum of the Generalized Wiener Random Field

The Asymptotic Distributions of the Absolute Maximum of the Generalized Wiener Random Field Internatonal Journal of Apple Engneerng esearch ISSN 973-456 Volume, Number 8 (7) pp. 79-799 esearch Ina Publcatons. http://www.rpublcaton.com The Asmptotc Dstrbutons of the Absolute Maxmum of the Generalze

More information