Robust Dynamic Programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice
|
|
- Dinah Ramsey
- 5 years ago
- Views:
Transcription
1 roceengs of the 2007 IEEE Symposum on Approxmate Dynamc rogrammng an Renforcement Learnng (ADRL 2007) Robust Dynamc rogrammng for Dscounte Infnte-Horzon Markov Decson rocesses wth Uncertan Statonary Transton Matrce Baohua L Department of Electrcal Engneerng Arzona State Unversty Tempe, AZ Emal: BaohuaL@asueu Jenne S Department of Electrcal Engneerng Arzona State Unversty Tempe, AZ Telephone: (480) Fax: (480) Emal: S@asueu Abstract In ths paper, fnte-state, fnte-acton, scounte nfnte-horzon-cost Markov ecson processes (MDs) wth uncertan statonary transton matrces are scusse n the etermnstc polcy space Uncertan statonary parametrc transton matrces are clearly classfe nto nepenent an correlate cases It s ponte out n ths paper that the optmalty crteron of unform mnmzaton of the mum expecte total scounte cost functons for all ntal states, or robust unform optmalty crteron, s not approprate for solvng MDs wth correlate transton matrces A new optmalty crteron of mnmzng the mum quaratc total value functon s propose whch nclues the prevous crteron as a specal case Base on the new optmalty crteron, robust polcy teraton s evelope to compute an optmal polcy n the etermnstc statonary polcy space Uner some assumptons, the soluton s guarantee to be optmal or near-optmal n the etermnstc polcy space I INTRODUCTION Dynamc programmng (D) s a computatonal approach to fnng an optmal polcy by employng the prncple of optmalty ntrouce by Rchar Bellman [1] D an approxmate D can solve the problems of Markov ecson processes (MDs) wth accurate knowlege of transton probabltes to obtan an optmal or near-optmal polcy by the algorthms such as value teraton, polcy teraton, evolutonary polcy teraton, approxmate polcy teraton base on Monte Carlo smulaton [2] [5] However, n practce, accurate transton probabltes are very ffcult to obtan Thus, exact solutons va classc D are not attanable, an those algorthms are not applcable to obtan solutons any more Moreover, the estmate transton probabltes may be far away from the true values ue to nose an other ssues assocate wth the estmaton process, or the estmaton error may not be trval that t can result n sgnfcant evaton of the solutons from the optmal values [6] Hence, the ea of set estmaton for transton matrces wth hgh confence can be use to allevate some of the efct from naccurate pont estmaton Wth those uncertan transton matrces, robust ynamc programmng s esre to aress the ssue of esgnng approxmaton metho wth an approprate robustness to exten the power of the Bellman Equaton Representatve efforts n evelopng robust ynamc programmng can be foun n [6] [10] One commonly use prncple of optmalty crteron for robust algorthms s to mnmze the mum value functons for all ntal states, whch s referre to as robust unform optmalty crteron n ths paper Base on ths optmalty crteron, robust value teraton an robust polcy teraton were propose n [6] an [7], respectvely, to obtan a etermnstc, unformly optmal polcy To eal wth uncertan transton matrces, the noton of correlaton n a transton matrx was ntrouce n [6] an [10] n the context of MDs The exstng optmalty crtera an assocate robust algorthms were evelope only for MDs wth nepenent transton matrces In ths paper, fnte-state, fnte-acton MDs wth scounte nfnte-horzon cost s scusse The optmal polcy s constrane n the etermnstc polcy space The transton matrces are assume to be statonary, whch s reasonable when systems are slowly varyng The contrbuton of ths paper s as follows Frst, mathematcally clear an more tractable efntons of nepenent an correlate uncertan statonary transton matrces are prove Usng these newly formulate efntons, robust unform optmalty crteron s not applcable for MDs wth correlate transton matrces Because of the correlaton of uncertan transton matrces, for any gven polcy, there may be no optmal transton matrces such that the value functons of the polcy reach mum unformly for all ntal states Ths makes the comparson among polces meanngless Even f such transton matrces exst for all polces, there may not be an optmal polcy such that ts mum value functons reach mnmum unformly for all ntal states Hence, a new optmalty crteron of mnmzng quaratc total value functon s propose Base on ths crteron, uner a weak conton, there exsts an optmal polcy whch can be optmal non-unformly n ntal states The prevous crteron becomes a specal case of the new crteron Note that the exstng robust value teraton an robust polcy teraton can not guarantee solutons for /07/$ IEEE 96
2 roceengs of the 2007 IEEE Symposum on Approxmate Dynamc rogrammng an Renforcement Learnng (ADRL 2007) MDs wth correlate transton matrces A new robust polcy teraton s evelope to obtan an optmal polcy n the statonary space Uner some assumptons, the soluton s guarantee to be optmal or near-optmal n the etermnstc polcy space The rest of the paper s organze as follows In secton 2, the optmalty crteron of mnmzng the mum quaratc total value functon s propose an theorems are evelope to show that ths crteron makes robust unform optmalty crteron a specal case an uner some contons, statonary optmal polces are proven to be optmal or near-optmal n the etermnstc polcy space Base on ths optmalty crteron, robust polcy teraton s evelope n secton 3 The paper conclues n secton 4 II OTIMALITY CRITERION USING QUADRATIC TOTAL VALUE FUNCTIONS In ths secton, an MD wth uncertan parametrc transton matrx s formulate Base on ths formulaton, we present why an optmal soluton may be senstve to transton probabltes After that, uncertan parametrc transton matrces are classfe nto nepenent an correlate cases Analyss emonstrates why an optmal polcy may not exst uner robust unform optmalty crteron Consequently, the optmalty crteron usng the quaratc total value functon s propose It s proven that robust unform optmalty crteron s a specal case Beses, two theorems an one corollary are prove to show that uner some assumptons, statonary optmal polces are optmal or near-optmal n the etermnstc polcy space Fnte-state, fnte-acton, nfnte-horzon MDs wth statonary transton matrces are escrbe as follows Let T enote the screte, nfnte ecson horzon, where T = {0, 1, 2, } At each stage, the system occupes a state S, where S s the state space wth n states an enote as S = {1, 2,, n} Wth each state S, a ecson maker s allowe to choose an acton a etermnstcally from a fnte state-epenent set of allowable actons, enote by A = {1, 2,, m } Let M = n =1 m, an let a t be a functon mappng states nto actons wth a t () A at the tme t, e, a t : 1 2 n Denote a polcy by π a 1 a 2 a n A 1 A 2 A n (1) π = (a 1, a 2, ), (2) an let Π represent the etermnstc polcy space Denote a statonary polcy by π s π s = (a, a, ), (3) an let Π s represent the etermnstc statonary polcy space Obvously, Π s Π Defne the cost corresponng to state S an acton a A by c(, a) Assume that c(, a) s nonnegatve The costs are tme scounte by a factor γ (0 < γ < 1) The system starts from an ntal state The states make Markov transtons accorng to statonary transton probabltes p a j from one state to another state j uner an acton a All transton probabltes consttute an M n transton matrx ( = 1 1 ) ( a n mn ), (4) where the superscrpt enotes the transpose an a represents a transton probablty row uner the state S an the acton a A In a more general settng, let each transton probablty p a j be represente by a functon of a parameter vector enote as θ = {ϑ 1,, ϑ q }, e, p a j = p a j(θ), j S, (5) where 0 p a j 1 an n j=1 pa j = 1 In [6] [10], the parameters are specfe as the transton probabltes themselves, that s, θ = {p 1,, p j,, p (n 1),, pa 1,, p a j,, p a (n 1),, pmn n1,, pmn nj,, pmn n(n 1) } (6) The transton probablty row a an the transton matrx can also be represente as a functon of θ, e, a = a(θ) an = (θ) Assume the parameter vector θ vary n a known subset Θ R 1 q An the transton matrx s varyng n the set = { : θ Θ} It s not too ffcult to see that optmal value functons may be very senstve to the perturbaton of the parameters n transton matrx Conser a statonary polcy π s efne n (3) For any, v πs = (I γ πs ) 1 C πs, (7) where v πs s the expecte total scounte cost functon uner the polcy π s an the traston matrx, πs an C πs are the n n transton matrx an the n 1 cost functon vector respectvely to π s, e, πs = a(1) 1 a(n) n c(1, a(1)), C πs = c(, a()) c(n, a(n)) (8) Obvously, v πs s contnuous n [11] However, can be scontnuous n θ over Θ, whch may lea to scontnuous value functons v πs n θ Even f s contnuous n θ an therefore v πs s contnuous n Θ, the functon vπs stll may not be smooth enough, that s, a small perturbaton n θ results n relatvely large varaton n v πs Therefore there s a nee to evelop robust solutons uner uncertanty We are now n a poston to ntrouce the concepts of nepenence an correlaton n an πs Defnton (Correlate transton matrx, Inepenent 97
3 roceengs of the 2007 IEEE Symposum on Approxmate Dynamc rogrammng an Renforcement Learnng (ADRL 2007) transton matrx): The transton matrx s correlate f a mn n, (9) where a = { a (θ) : θ Θ} The transton matrx s nepenent f = a mn n (10) Defnton (Correlate transton matrx for π s, Inepenent transton matrx for π s ): The transton matrx πs s correlate f πs a(1) 1 n a(n), (11) where πs = { πs (θ) : θ Θ} The transton matrx πs s nepenent f πs = a(1) 1 n a(n) (12) () The set a mn n s an M-mensonal hyper-rectangle Matrx s correlate f s a proper subset of ths hyper-rectangle, an s nepenent f s equal to the hyper-rectangle An exact transton matrx s a specal case of an nepenent transton matrx () The set a(1) 1 n a(n) s an n-mensonal hyper-rectangle Matrx πs s correlate f πs s a proper subset of ths hyper-rectangle, an πs s nepenent f πs s equal to the hyper-rectangle () Accorng to the above efntons, MDs are classfe nto those wth nepenent transton matrces an those wth correlate transton matrces The optmalty crteron of mnmzng the mum value functon for any ntal state s gven as follows mn π Π vπ () S, (13) where v π () s the expecte total scounte cost functon uner the polcy π Π an the transton matrx at the ntal state, e, v π () = E π ( γ t c(j, a) ) (14) t=0 Accorng to the optmalty crteron gven n (13), an optmal polcy s efne as follows Defnton (Optmal polcy): A polcy π s optmal f there exsts such that v π () = vπ () = mn π Π vπ () S, (15) where there exsts θ Θ such that = (θ ) For MDs wth nepenent transton matrces, a etermnstc optmal polcy has been proven exstent [6], [7] However, for MDs wth correlate transton matrces, an optmal polcy oes not always exst The efnton of an optmal polcy gven by (15) can be nterprete as the contons for the exstence of an optmal polcy: () for any gven π Π, there s, at least one such that v π () = vπ () for any S, where from here on epens on π; () there s at least one π Π such that v π () = mn π Π vπ () for any S Such an optmal polcy s unformly optmal for ntal states For MDs wth correlate transton matrces, the above contons are too strong to be satsfe, whch can be shown n the followng example Example: conser a two-state, two-acton, nfnte-horzon MD Let the state space be S = {1, 2} The acton spaces at state 1 an state 2 are A 1 = {1, 2} an A 2 = {1, 2} Accorng to (1) an (3), all four statonary polces n Π s are efne by π 1, π 2, π 3, π 4 as follows, π 1 = (,, ), (16) π 2 = (,, ), (17) π 3 = (,, ), (18) π 4 = (,, ) (19) Let the transton matrx have the followng formulaton 1 1 θ θ 1 = = θ 2 1 θ 2 1 θ θ1 2, (20) 1 θ 3 θ 3 an the cost functons are as follows, c(1, 1) = 1, c(1, 2) = 2, c(2, 1) = 3, c(2, 2) = 4 (21) The scount factor γ s chosen at 09 Let Ω = {0, 02, 04, 06, 08, 1} Let θ = {θ 1, θ 2, θ 3 } an Θ = {θ : θ 1, θ 2, θ 3 Ω} The transton matrx s correlate, that s, where , (22) = 2 1 = = {(0, 1), (02, 08), (04, 06), (06, 04), (08, 02), (1, 0)}, (23) 1 2 = {(1, 0), (096, 004), (084, 016), (064, 036), Actually, by (8), (036, 064), (0, 1)} (24) π2 = π3 = π4 = θ θ 1, (25) 1 θ 3 θ 3 θ2 1 θ 2 1 θ1 2 θ1 2, (26) θ2 1 θ 2 (27) 1 θ 3 θ 3 By (12), π2, π3 an π4 are nepenent an the mum value functons are reachable However, for π 1, θ θ π1 = 1 1 θ1 2 θ1 2 (28) 98
4 roceengs of the 2007 IEEE Symposum on Approxmate Dynamc rogrammng an Renforcement Learnng (ADRL 2007) By (11), t s correlate an then there s no such that the mum value functons for all ntal states are reachable Thus, by the optmalty crteron efne n (13), an optmal polcy oes not exst Thus, a new optmalty crteron s neee an ts corresponng optmal polcy also nee to be efne For any fxe transton matrx, the quaratc total value functon for a polcy π s efne as follows, v π 2 = (v π ) v π, (29) where n terms of the value functon efne n (14) v π = ( v π (1) vπ () vπ (n) ) (30) The new optmalty crteron s to mnmze the mum quaratc total value functon, e, mn π Π vπ 2 (31) Defnton (Optmal polcy): A polcy π s optmal f v π 2 = vπ 2 = mn π Π vπ 2 = mn π Π vπ 2 (32) () Base on the optmalty crteron usng the quaratc total value functon, an optmal polcy can be non-unformly optmal n ntal states () The exstence of such an optmal polcy s guarantee uner a weak conton For any π, s easly satsfe to make the mum value reachable For example, the set s compact an close an the functon v π 2 s contnuous over Even f the mum an mnmum values are not reachable, an mn can be replace by nf an sup, respectvely, an an near-optmal polcy can be easly propose () If the cost functon s also parameterze n θ, the optmalty crteron can be mofe usng the quaratc value functon wth the cost functon represente as c(, a, θ) (v) If for any polcy π, the probablty strbuton of the ntal states s non-unform, or the value functons for fferent ntal states have fferent weghts, the optmalty crteron can be mofe usng weghte quaratc total value functon v π 2 W = (vπ W v π ) Actually, the optmalty crteron efne by (31) generalzes the unform optmalty crteron efne by (13), whch s shown n the followng Theorem 1 Theorem 1: If an optmal polcy exsts wth regar to the optmalty crteron efne n (13), then ths optmalty crteron s equvalent to the one efne n (31) That s, f an optmal polcy exsts, a polcy uner the optmalty crteron efne n (13) s optmal f an only f t s optmal uner the optmalty crteron efne n (31) roof: Snce an optmal polcy exsts uner the optmalty crteron efne n (13), for any π Π, there exsts such that v π () = vπ () S (33) an let π 1 be an optmal polcy such that v π1 () = vπ1 Snce v π () 0, by (33), () = mn π Π vπ () S (34) (v π ()) 2 = (vπ ()) 2 = v π 2, (35) an then by (34), mn (v π ()) 2 = mn π Π π Π (vπ ()) 2 = v π1 2 (36) : By (35) an (36), v π1 2 = (v π1 ())2 = mn π Π (v π ()) 2 (37) Hence, π 1 s an optmal polcy uner the optmalty crteron efne n (31) : Let π 2 be an optmal polcy uner the optmalty crteron efne n (31) Then, for any π Π, there exsts Q such that vq π 2 = ( v π Q () ) 2 = (v π ()) 2, (38) an for π 2, Q 2 = ( By (35), for any π Π, an then by (36), Snce an then Q () ) 2 = mn π Π (v π ()) 2 (39) v π Q 2 = (vπ ()) 2, (40) Q 2 = mn π Π (vπ ()) 2 (41) 2 Q () (vπ2 ())2, 2 Q () = (vπ2 ())2, (42) Q () = vπ2 () (43) Snce (vπ2 ())2 mn π Π (vπ ())2, 2 Q () = mn π Π (vπ ()) 2, (44) an then Hence, Q () = Q vπ2 () = mn π Π vπ () (45) () = mn π Π vπ2 () S (46) That s to say, π 2 s an optmal polcy uner the optmalty crteron efne n (13) 99
5 roceengs of the 2007 IEEE Symposum on Approxmate Dynamc rogrammng an Renforcement Learnng (ADRL 2007) Generally, an optmal polcy uner the optmalty crteron usng the quaratc total value functon may be non-statonary However, uner some assumptons, statonary optmal polces are optmal or ɛ-optmal n the polcy space Π Theorem 2: If mn vπ 2 = mn v π 2, (47) statonary optmal polces are optmal n the polcy space Π roof: For any fxe, snce mn v π π Π 2 = s mn π Π vπ 2, by (47), mn vπ 2 = mn v π 2 = mn π Π vπ 2 (48) Snce by weak ualty, mn π Π vπ 2 mn π Π vπ 2, mn vπ 2 mn π Π vπ 2 (49) Snce mn vπ 2 mn π Π vπ 2, mn vπ 2 = mn π Π vπ 2 (50) That s to say, statonary optmal polces are also optmal n Π Assume that Θ s a compact an close subspace of the vector space R 1 q an for any π Π, the functon v π (θ) 2 s contnuous n θ Then, gven an arbtrarly small constant ɛ > 0, for any θ 0 Θ, there exsts δθ π 0 > 0, such that v π (θ) 2 v π (θ 0) 2 < ɛ, θ B π θ0 Θ (51) where Bθ π 0 = {θ : θ θ 0 < δθ π 0 } Snce Bθ π 0 Θ, (52) θ 0 Θ by Hene-Borel theorem, there exst fnte balls to cover Θ Let those selecte balls be Bθ π 1,, Bθ π j,, Bθ π r an π = { (θ j ) : 1 j r}, where θ j an r epen on π An then = π (53) π Π Theorem 3: Wth the above assumptons an notatons, f for any π Π s, mn v π (θ) (θ) 2 = (θ) mn v π (θ) π Π 2, (54) s then statonary optmal polces on are ɛ-optmal an statonary optmal polces on are also ɛ-optmal n the polcy space Π roof: Wthout loss of generalty, for any π Π, let (θ ), (θ 1 ) an (θ 2 ) π such that v π (θ ) 2 = v π (θ 1) 2 = (θ) vπ (θ) 2, (55) v π (θ) (θ) 2, (56) 0 v π (θ ) 2 v π (θ 2) 2 < ɛ (57) Snce v π (θ 2) 2 v π (θ 1) 2 v π (θ ) 2, Then, 0 (θ) vπ (θ) 2 v π (θ) (θ) 2 < ɛ (58) 0 mn π Π (θ) vπ (θ) 2 mn v π (θ) π Π (θ) 2 ɛ (59) By (54) an Theorem 2, Hence, mn v π 2 = mn v π 2 (60) (θ) π Π (θ) 0 mn π Π (θ) vπ (θ) 2 mn v π (θ) (θ) 2 ɛ, (61) that s to say, statonary optmal polces on are ɛ-optmal n Π Snce 0 mn (θ) vπ (θ) 2 mn v π (θ) (θ) 2 ɛ, 0 mn (θ) vπ (θ) 2 mn π Π = mn (θ) vπ (θ) 2 mn (θ) vπ (θ) 2 v π (θ) (θ) 2 (62) + mn v π (θ) (θ) 2 mn π Π (θ) vπ (θ) 2 ɛ (63) Hence, statonary optmal polces on are ɛ-optmal n Π () If for any fxe π Π, the functon v π (θ) 2 s Lpschtz wth the constant L π that epens on π, that s, v π (θ 1) 2 vθ π 2 2 L π θ 1 θ 2, θ 1, θ 2 Θ (64) then the parameter δθ π 0 can be chosen as follows, δθ π 0 = δ π ɛ = L π + 1, θ 0 Θ (65) That s to say, δθ π 0 s nepenent of θ 0 () If the functon v π (θ) 2 s contnuously fferentable wth respect to θ, t s also Lpschtz an the constant L π can be chosen as follows, > L π v π (θ) 2 θ Θ θ (66) () The smaller the L π s, the larger the δ π shoul be, an then the smaller the carnalty of π s, whch can make the conton (54) relatvely easy to satsfy (v) If for some polces, the constants L π are equal, the same balls Bθ π j can be selecte to cover Θ an also ther sets π are equal, whch may reuce the carnalty of an make the conton (54) satsfe more easly When for all π Π, the functons v π (θ) 2 are Lpschtz wth the constant L that s nepenent of π, then the parameters δθ π 0 can be nepenent of not only θ 0 but also of π, that s, δθ π 0 = δ = ɛ L + 1, θ 0 Θ, π Π, (67) 100
6 roceengs of the 2007 IEEE Symposum on Approxmate Dynamc rogrammng an Renforcement Learnng (ADRL 2007) an consequently, the balls Bθ π 0 are nepenent of π, that s, B π θ 0 = B θ0 = {θ : θ θ0 < δ}, π Π (68) Hence, for all π, the same balls Bθ π j can be selecte to cover Θ an also all π are equal Thus, the set becomes fnte, that s, = f = { (θj ) : 1 j r}, where θ j an r are nepenent of π Corollary 1: Wth the above assumptons an notatons, f mn v π 2 = mn v π 2, (69) (θ) f (θ) f then statonary optmal polces on f are ɛ-optmal an statonary optmal polces on are also ɛ-optmal n the polcy space Π roof: By (69) an Theorem 3, statonary optmal polces on f are ɛ-optmal an statonary optmal polces on are also ɛ-optmal n Π () If the contnuous ervatves of the functons v π (θ) 2 are unformly boune for all π Π, the Lpschtz constant L can be chosen as follows, v π (θ) > L sup 2 π Π θ Θ θ (70) () The smaller the L s, the larger the δ shoul be, an consequently the smaller the carnalty of f s, whch can make the conton (69) relatvely easy to satsfy III ROBUST OLICY ITERATION UNDER QUADRATIC TOTAL VALUE FUNCTION Base on the optmalty crteron of mnmzng the mum quaratc total value functon gven n (31), a robust polcy teraton s evelope to fn a statonary optmal polcy for MDs Even though ths soluton s generally suboptmal n the etermnstc polcy space Π, accorng to Theorems 2, 3, an Corollary 1, ths statonary optmal polcy can be guarantee to be optmal or ɛ-optmal n Π In ths secton, we conser statonary polces For smplcty, the subscrpt s whch ncates assocaton wth statonary polces, s roppe out olcy teraton nclues polcy evaluaton an polcy mprovement steps Uner the polcy evaluaton step, for any polcy π = (a, a, ), the mum quaratc total value functon s expresse as follows n terms of (7) v π 2 = vπ 2 = (Cπ ) ( I γ ( π ) ) 1 (I γ π ) 1 C π (71) The mum value v π 2 an the optmal are to be calculate Such approaches are avalable to compute these values Actually, f the transton matrx for the polcy π s nepenent, an effcent teratve metho gven n Algorthm 1 can be use to compute v π 2 () Gven a polcy π, for the mum pont, the transton Algorthm 1 Iteratve Algorthm for mum quaratc total value functon of a statonary polcy wth ts nepenent transton matrx 1 select v π 0 Rn 1 an set k = 0; 2 compute v π k+1 by vk+1 π () = c(, a()) + γ vk π, S; (72) 3 termnate f vk+1 π = vπ k ; otherwse, ncrement k by 1 an go to 2; 4 output vk π 2 as v π 2 probablty rows for ( all states ) an the corresponng actons, a(), enote by are compute as follows, { } = arg, S, (73) an there are no specal constrants for other rows () A proof of convergence of the teratve algorthm uner the total value functon can be obtane smlar to that n [6], [7] () In prncple, the ntal value v0 π can be selecte arbtrarly n R n 1 However, a carefully selecte v0 π can accelerate the convergence of the teratve process To see that, let an G 1 = {v : v() c(, a()) + γ G 2 = {v : v() c(, a()) + γ v π k v} (74) v} (75) If v π 0 G 1, the sequence {v π k } s non-ecreasng If vπ 0 G 2, the sequence {v π k } s non-ncreasng Thus, the sequence {vπ k } converges to v π faster than those wth vπ 0 / G 1 G 2 For the polcy mprovement step, the polcy can be mprove easly for MDs wth nepenent transton matrces usng robust polcy teraton [7] However, such mprovement metho oes not guarantee solutons for MDs wth correlate transton matrces Hence, the polcy elmnaton metho s consere heren as a means of mprovng polcy The polcy elmnaton process entals elmnaton of some non-optmal polces urng each teraton by a necessary conton that a statonary polcy s optmal The nequalty v µ 2 v πk 2 gven n (77) s such a necessary conton for a statonary polcy µ beng optmal, where v µ 2 s the quaratc total value functon of µ at, v πk 2 s the mum quaratc total value functon of π k an s the corresponng optmal matrx Robust polcy teraton uner quaratc total value functon s escrbe n etals n Algorthm 2 () Robust polcy teraton uner quaratc total value functon termnates n fnte teratons when the carnalty of the set Π k, enote by Π k n Algorthm 2, s equal to one, snce the carnalty of Π 0 s fnte an Π k+1 Π k (k = 0, 1, 2, ) The sequence {π k } converges to a statonary optmal polcy () A goo ntal polcy π 0, whch has relatvely small total value functon, can accelerate the convergence process 101
7 roceengs of the 2007 IEEE Symposum on Approxmate Dynamc rogrammng an Renforcement Learnng (ADRL 2007) Algorthm 2 Robust olcy Iteraton Uner Quaratc Total Value Functon 1 Intalzaton: set k = 0, Π 0 = Π s, O 1 = +, {π 1 } =, an select π 0 Π 0 ; 2 olcy evaluaton: compute v π k 2 an such that 3 olcy mprovement: f Π k > 1 (a) elmnate polces to obtan Πk such that v π k 2 = vπ k 2 ; (76) Π k = {µ Π k : v µ 2 v π k 2 }; (77) f Πk > 1 f v π k 2 < O k 1 (b) set O k = v π k 2, Πk = Πk {π k 1 }; (c) f Πk = 1, set Π k+1 = Πk = {π k }, π k+1 = π k, O k+1 = O k = v π k 2, ncrement k by 1, an go to 4; otherwse, go to (); () set Π k+1 = Πk, select π k+1 Π k+1 {π k }, ncrement k by 1, an return to 2; else (e) f Πk {π k } {π k 1 } 1, select π k Πk {π k } {π k 1 }, set Π k = Πk {π k }, π k = π k, an return to 2; otherwse, set Π k = Πk {π k } = {π k 1 }, π k = π k 1, O k = O k 1 = v π k 1 2 ; else (f) set Π k+1 = Πk = {π k }, π k+1 = π k, O k+1 = v π k 2, ncrement k by 1, an go to 4; else (g) go to 4; 4 Termnaton: output π k as the statonary optmal polcy wth the corresponng optmal transton matrx an mum quaratc total value functon O k [3] Hyeong Soo Chang, Hong-G Lee, Mchael C Fu, an Steven I Marcus, Evolutonary olcy Iteraton for Solvng Markov Decson rocesses, IEEE Transactons on Automatc Cntrol, vol 50, n 11, pp , 2005 [4] D Berstsekas an J Tstskls, Neuro-Dynamc rogrammng, Athena Scentfc Massachusetts, 1996 [5] J S, A G Barto, W B owell an D Wunsch, Hanbook of Learnng an Approxmate Dynamc rogrammng, Wley-IEEE ress, 2004 [6] A Nlm an L E Ghaou, Robust Control of Markov Decson rocesses wth Uncertan Transton Matrces, Operatons Research, vol 53, n 5, pp , 2005 [7] J K Sata an R E Lave, Jr, Markov Decson rocesses wth Uncertan Transton robabltes, Operatons Research, vol 21, pp , 1973 [8] C C Whte an H K Eleb, Markov Decson rocesses wth Imprecse Transton robabltes, Operatons Research, vol 42, pp , 1994 [9] R Gvan, S Leach, an T Dean, Boune arameter Markov Decson rocesses, Artfcal Intellgence, vol 122, no 1-2, pp , 2000 [10] S Kalyanasunaram, E K Chong, an N B Shroff, Markov Decson rocesses wth Uncertan Transton Rates: Senstvty an Robust Control, roceengs of the 41st IEEE Conference on Decson an Control, vol 4, pp , 2002 [11] M Abba, an J A Flar, erturbaton an Stablty Theory for Markov Control roblems, IEEE transatons on Automatc Control, vol 37, pp , 1992 () Robust polcy teraton uner quaratc total value functon can be extene to solvng problems wth parameterze cost functons an weghte quaratc total value functons (v) Assume that the set Θ s compact an close n R 1 q, an f for any fxe π an (θ), the quaratc total value functon can be compute wthn arbtrarly small error boun, then Algorthm 2 can be extene to obtan a statonary ɛ-optmal polcy IV CONCLUSION A theoretcal framework s propose to stuy MDs wth uncertantes n the transton probablty matrces As a result, the paper conclues wth a robust polcy teraton proceure uner a newly propose total value functon, whch guarantees optmal or near-optmal solutons Such a robust construct may be especally useful n practcal applcatons when there s too lmte amount of expermental ata to obtan an accurate transton probablty matrx, or when an accurate estmaton of the probablty transton matrx s unrealstc The propose framework s straghtforwar to apply n practcal problems REFERENCES [1] Bellman an R kalaba, Dynamc rogrammng an Moern Control Theory, New York: Acaemc ress, 1965 [2] M utterman, Markov Decson rocesses: Dscrete Stochastc Dynamc rogrammng, Wley-Interscnce, New York,
Chapter 2 Transformations and Expectations. , and define f
Revew for the prevous lecture Defnton: support set of a ranom varable, the monotone functon; Theorem: How to obtan a cf, pf (or pmf) of functons of a ranom varable; Eamples: several eamples Chapter Transformatons
More informationENTROPIC QUESTIONING
ENTROPIC QUESTIONING NACHUM. Introucton Goal. Pck the queston that contrbutes most to fnng a sutable prouct. Iea. Use an nformaton-theoretc measure. Bascs. Entropy (a non-negatve real number) measures
More informationp(z) = 1 a e z/a 1(z 0) yi a i x (1/a) exp y i a i x a i=1 n i=1 (y i a i x) inf 1 (y Ax) inf Ax y (1 ν) y if A (1 ν) = 0 otherwise
Dustn Lennon Math 582 Convex Optmzaton Problems from Boy, Chapter 7 Problem 7.1 Solve the MLE problem when the nose s exponentally strbute wth ensty p(z = 1 a e z/a 1(z 0 The MLE s gven by the followng:
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationENGI9496 Lecture Notes Multiport Models in Mechanics
ENGI9496 Moellng an Smulaton of Dynamc Systems Mechancs an Mechansms ENGI9496 Lecture Notes Multport Moels n Mechancs (New text Secton 4..3; Secton 9.1 generalzes to 3D moton) Defntons Generalze coornates
More informationNew Liu Estimators for the Poisson Regression Model: Method and Application
New Lu Estmators for the Posson Regresson Moel: Metho an Applcaton By Krstofer Månsson B. M. Golam Kbra, Pär Sölaner an Ghaz Shukur,3 Department of Economcs, Fnance an Statstcs, Jönköpng Unversty Jönköpng,
More informationYong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )
Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often
More informationGENERIC CONTINUOUS SPECTRUM FOR MULTI-DIMENSIONAL QUASIPERIODIC SCHRÖDINGER OPERATORS WITH ROUGH POTENTIALS
GENERIC CONTINUOUS SPECTRUM FOR MULTI-DIMENSIONAL QUASIPERIODIC SCHRÖDINGER OPERATORS WITH ROUGH POTENTIALS YANG FAN AND RUI HAN Abstract. We stuy the mult-mensonal operator (H xu) n = m n = um + f(t n
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationCHAPTER 4 HYDROTHERMAL COORDINATION OF UNITS CONSIDERING PROHIBITED OPERATING ZONES A HYBRID PSO(C)-SA-EP-TPSO APPROACH
77 CHAPTER 4 HYDROTHERMAL COORDINATION OF UNITS CONSIDERING PROHIBITED OPERATING ZONES A HYBRID PSO(C)-SA-EP-TPSO APPROACH 4.1 INTRODUCTION HTC consttutes the complete formulaton of the hyrothermal electrc
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationFIRST AND SECOND ORDER NECESSARY OPTIMALITY CONDITIONS FOR DISCRETE OPTIMAL CONTROL PROBLEMS
Yugoslav Journal of Operatons Research 6 (6), umber, 53-6 FIRST D SECOD ORDER ECESSRY OPTIMLITY CODITIOS FOR DISCRETE OPTIML COTROL PROBLEMS Boban MRIKOVIĆ Faculty of Mnng and Geology, Unversty of Belgrade
More informationWHY NOT USE THE ENTROPY METHOD FOR WEIGHT ESTIMATION?
ISAHP 001, Berne, Swtzerlan, August -4, 001 WHY NOT USE THE ENTROPY METHOD FOR WEIGHT ESTIMATION? Masaak SHINOHARA, Chkako MIYAKE an Kekch Ohsawa Department of Mathematcal Informaton Engneerng College
More informationLarge-Scale Data-Dependent Kernel Approximation Appendix
Large-Scale Data-Depenent Kernel Approxmaton Appenx Ths appenx presents the atonal etal an proofs assocate wth the man paper [1]. 1 Introucton Let k : R p R p R be a postve efnte translaton nvarant functon
More information4DVAR, according to the name, is a four-dimensional variational method.
4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationAdditional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty
Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,
More informationOn a one-parameter family of Riordan arrays and the weight distribution of MDS codes
On a one-parameter famly of Roran arrays an the weght strbuton of MDS coes Paul Barry School of Scence Waterfor Insttute of Technology Irelan pbarry@wte Patrck Ftzpatrck Department of Mathematcs Unversty
More informationThe Minimum Universal Cost Flow in an Infeasible Flow Network
Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran
More informationReinforcement learning
Renforcement learnng Nathanel Daw Gatsby Computatonal Neuroscence Unt daw @ gatsby.ucl.ac.uk http://www.gatsby.ucl.ac.uk/~daw Mostly adapted from Andrew Moore s tutorals, copyrght 2002, 2004 by Andrew
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More informationNon-negative Matrices and Distributed Control
Non-negatve Matrces an Dstrbute Control Yln Mo July 2, 2015 We moel a network compose of m agents as a graph G = {V, E}. V = {1, 2,..., m} s the set of vertces representng the agents. E V V s the set of
More informationAppendix B. The Finite Difference Scheme
140 APPENDIXES Appendx B. The Fnte Dfference Scheme In ths appendx we present numercal technques whch are used to approxmate solutons of system 3.1 3.3. A comprehensve treatment of theoretcal and mplementaton
More informationVisualization of 2D Data By Rational Quadratic Functions
7659 Englan UK Journal of Informaton an Computng cence Vol. No. 007 pp. 7-6 Vsualzaton of D Data By Ratonal Quaratc Functons Malk Zawwar Hussan + Nausheen Ayub Msbah Irsha Department of Mathematcs Unversty
More informationDepartment of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING
MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/
More informationEEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming
EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-
More informationVQ widely used in coding speech, image, and video
at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng
More informationLecture 14: Bandits with Budget Constraints
IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed
More informationDynamic Programming. Lecture 13 (5/31/2017)
Dynamc Programmng Lecture 13 (5/31/2017) - A Forest Thnnng Example - Projected yeld (m3/ha) at age 20 as functon of acton taken at age 10 Age 10 Begnnng Volume Resdual Ten-year Volume volume thnned volume
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationFUZZY GOAL PROGRAMMING VS ORDINARY FUZZY PROGRAMMING APPROACH FOR MULTI OBJECTIVE PROGRAMMING PROBLEM
Internatonal Conference on Ceramcs, Bkaner, Inda Internatonal Journal of Modern Physcs: Conference Seres Vol. 22 (2013) 757 761 World Scentfc Publshng Company DOI: 10.1142/S2010194513010982 FUZZY GOAL
More informationNeuro-Adaptive Design - I:
Lecture 36 Neuro-Adaptve Desgn - I: A Robustfyng ool for Dynamc Inverson Desgn Dr. Radhakant Padh Asst. Professor Dept. of Aerospace Engneerng Indan Insttute of Scence - Bangalore Motvaton Perfect system
More informationSELECTED SOLUTIONS, SECTION (Weak duality) Prove that the primal and dual values p and d defined by equations (4.3.2) and (4.3.3) satisfy p d.
SELECTED SOLUTIONS, SECTION 4.3 1. Weak dualty Prove that the prmal and dual values p and d defned by equatons 4.3. and 4.3.3 satsfy p d. We consder an optmzaton problem of the form The Lagrangan for ths
More informationOn the Multicriteria Integer Network Flow Problem
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of
More informationPortfolios with Trading Constraints and Payout Restrictions
Portfolos wth Tradng Constrants and Payout Restrctons John R. Brge Northwestern Unversty (ont wor wth Chrs Donohue Xaodong Xu and Gongyun Zhao) 1 General Problem (Very) long-term nvestor (eample: unversty
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationApplication of Gravitational Search Algorithm for Optimal Reactive Power Dispatch Problem
Applcaton of Gravtatonal Search Algorthm for Optmal Reactve Power Dspatch Problem Serhat Duman Department of Electrcal Eucaton, Techncal Eucaton Faculty, Duzce Unversty, Duzce, 8620 TURKEY serhatuman@uzce.eu.tr
More informationLecture 9 Sept 29, 2017
Sketchng Algorthms for Bg Data Fall 2017 Prof. Jelan Nelson Lecture 9 Sept 29, 2017 Scrbe: Mtal Bafna 1 Fast JL transform Typcally we have some hgh-mensonal computatonal geometry problem, an we use JL
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More informationAn Asymptotically Efficient Simulation-Based Algorithm for Finite Horizon Stochastic Dynamic Programming
An Asymptotcally Effcent Smulaton-Based Algorthm for Fnte Horzon Stochastc Dynamc Programmng Hyeong Soo Chang, Mchael C. Fu, Jaqao Hu, and Steven I. Marcus Abstract We present a smulaton-based algorthm
More informationSIMPLIFIED MODEL-BASED OPTIMAL CONTROL OF VAV AIR- CONDITIONING SYSTEM
Nnth Internatonal IBPSA Conference Montréal, Canaa August 5-8, 2005 SIMPLIFIED MODEL-BASED OPTIMAL CONTROL OF VAV AIR- CONDITIONING SYSTEM Nabl Nassf, Stanslaw Kajl, an Robert Sabourn École e technologe
More informationVector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.
Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm
More informationConvexity preserving interpolation by splines of arbitrary degree
Computer Scence Journal of Moldova, vol.18, no.1(52), 2010 Convexty preservng nterpolaton by splnes of arbtrary degree Igor Verlan Abstract In the present paper an algorthm of C 2 nterpolaton of dscrete
More informationHigh-Order Hamilton s Principle and the Hamilton s Principle of High-Order Lagrangian Function
Commun. Theor. Phys. Bejng, Chna 49 008 pp. 97 30 c Chnese Physcal Socety Vol. 49, No., February 15, 008 Hgh-Orer Hamlton s Prncple an the Hamlton s Prncple of Hgh-Orer Lagrangan Functon ZHAO Hong-Xa an
More informationResearch Article Global Sufficient Optimality Conditions for a Special Cubic Minimization Problem
Mathematcal Problems n Engneerng Volume 2012, Artcle ID 871741, 16 pages do:10.1155/2012/871741 Research Artcle Global Suffcent Optmalty Condtons for a Specal Cubc Mnmzaton Problem Xaome Zhang, 1 Yanjun
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationEstimation: Part 2. Chapter GREG estimation
Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the
More informationCHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE
CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationA Note on the Numerical Solution for Fredholm Integral Equation of the Second Kind with Cauchy kernel
Journal of Mathematcs an Statstcs 7 (): 68-7, ISS 49-3644 Scence Publcatons ote on the umercal Soluton for Freholm Integral Equaton of the Secon Kn wth Cauchy kernel M. bulkaw,.m.. k Long an Z.K. Eshkuvatov
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationControl of Uncertain Bilinear Systems using Linear Controllers: Stability Region Estimation and Controller Design
Control of Uncertan Blnear Systems usng Lnear Controllers: Stablty Regon Estmaton Controller Desgn Shoudong Huang Department of Engneerng Australan Natonal Unversty Canberra, ACT 2, Australa shoudong.huang@anu.edu.au
More informationAnalytical classical dynamics
Analytcal classcal ynamcs by Youun Hu Insttute of plasma physcs, Chnese Acaemy of Scences Emal: yhu@pp.cas.cn Abstract These notes were ntally wrtten when I rea tzpatrck s book[] an were later revse to
More informationA MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON
A MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON PIOTR NAYAR AND TOMASZ TKOCZ Abstract We prove a menson-free tal comparson between the Euclean norms of sums of nepenent ranom vectors
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More informationSimultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals
Smultaneous Optmzaton of Berth Allocaton, Quay Crane Assgnment and Quay Crane Schedulng Problems n Contaner Termnals Necat Aras, Yavuz Türkoğulları, Z. Caner Taşkın, Kuban Altınel Abstract In ths work,
More informationThe Noether theorem. Elisabet Edvardsson. Analytical mechanics - FYGB08 January, 2016
The Noether theorem Elsabet Evarsson Analytcal mechancs - FYGB08 January, 2016 1 1 Introucton The Noether theorem concerns the connecton between a certan kn of symmetres an conservaton laws n physcs. It
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationA MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON
A MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON PIOTR NAYAR AND TOMASZ TKOCZ Abstract We prove a menson-free tal comparson between the Euclean norms of sums of nepenent ranom vectors
More informationLecture 21: Numerical methods for pricing American type derivatives
Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)
More informationGrenoble, France Grenoble University, F Grenoble Cedex, France
MODIFIED K-MEA CLUSTERIG METHOD OF HMM STATES FOR IITIALIZATIO OF BAUM-WELCH TRAIIG ALGORITHM Paulne Larue 1, Perre Jallon 1, Bertrand Rvet 2 1 CEA LETI - MIATEC Campus Grenoble, France emal: perre.jallon@cea.fr
More informationON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION
Advanced Mathematcal Models & Applcatons Vol.3, No.3, 2018, pp.215-222 ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EUATION
More informationChapter Newton s Method
Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve
More informationGeneral viscosity iterative method for a sequence of quasi-nonexpansive mappings
Avalable onlne at www.tjnsa.com J. Nonlnear Sc. Appl. 9 (2016), 5672 5682 Research Artcle General vscosty teratve method for a sequence of quas-nonexpansve mappngs Cuje Zhang, Ynan Wang College of Scence,
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationLOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin
Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence
More informationStability of Switched Linear Systems on Cones: A Generating Function Approach
49th IEEE Conference on Decson and Control December 15-17, 2010 Hlton Atlanta Hotel, Atlanta, GA, USA Stablty of Swtched Lnear Systems on Cones: A Generatng Functon Approach Jngla Shen and Jangha Hu Abstract
More informationContinuous Time Markov Chains
Contnuous Tme Markov Chans Brth and Death Processes,Transton Probablty Functon, Kolmogorov Equatons, Lmtng Probabltes, Unformzaton Chapter 6 1 Markovan Processes State Space Parameter Space (Tme) Dscrete
More informationLarge-Margin HMM Estimation for Speech Recognition
Large-Margn HMM Estmaton for Speech Recognton Prof. Hu Jang Department of Computer Scence and Engneerng York Unversty, Toronto, Ont. M3J 1P3, CANADA Emal: hj@cs.yorku.ca Ths s a jont work wth Chao-Jun
More informationEstimating the Fundamental Matrix by Transforming Image Points in Projective Space 1
Estmatng the Fundamental Matrx by Transformng Image Ponts n Projectve Space 1 Zhengyou Zhang and Charles Loop Mcrosoft Research, One Mcrosoft Way, Redmond, WA 98052, USA E-mal: fzhang,cloopg@mcrosoft.com
More informationA note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights
ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 7, Number 2, December 203 Avalable onlne at http://acutm.math.ut.ee A note on almost sure behavor of randomly weghted sums of φ-mxng
More informationOff-policy Reinforcement Learning for Robust Control of Discrete-time Uncertain Linear Systems
Off-polcy Renforcement Learnng for Robust Control of Dscrete-tme Uncertan Lnear Systems Yonglang Yang 1 Zhshan Guo 2 Donald Wunsch 3 Yxn Yn 1 1 School of Automatc and Electrcal Engneerng Unversty of Scence
More informationA PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS
HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,
More informationErratum: A Generalized Path Integral Control Approach to Reinforcement Learning
Journal of Machne Learnng Research 00-9 Submtted /0; Publshed 7/ Erratum: A Generalzed Path Integral Control Approach to Renforcement Learnng Evangelos ATheodorou Jonas Buchl Stefan Schaal Department of
More informationHigh resolution entropy stable scheme for shallow water equations
Internatonal Symposum on Computers & Informatcs (ISCI 05) Hgh resoluton entropy stable scheme for shallow water equatons Xaohan Cheng,a, Yufeng Ne,b, Department of Appled Mathematcs, Northwestern Polytechncal
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationEXPONENTIAL ERGODICITY FOR SINGLE-BIRTH PROCESSES
J. Appl. Prob. 4, 022 032 (2004) Prnted n Israel Appled Probablty Trust 2004 EXPONENTIAL ERGODICITY FOR SINGLE-BIRTH PROCESSES YONG-HUA MAO and YU-HUI ZHANG, Beng Normal Unversty Abstract An explct, computable,
More informationChapter - 2. Distribution System Power Flow Analysis
Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load
More informationTime-Varying Systems and Computations Lecture 6
Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy
More informationThe Second Anti-Mathima on Game Theory
The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player
More informationResearch Article Relative Smooth Topological Spaces
Advances n Fuzzy Systems Volume 2009, Artcle ID 172917, 5 pages do:10.1155/2009/172917 Research Artcle Relatve Smooth Topologcal Spaces B. Ghazanfar Department of Mathematcs, Faculty of Scence, Lorestan
More informationOptimum Design of Steel Frames Considering Uncertainty of Parameters
9 th World Congress on Structural and Multdscplnary Optmzaton June 13-17, 211, Shzuoka, Japan Optmum Desgn of Steel Frames Consderng ncertanty of Parameters Masahko Katsura 1, Makoto Ohsak 2 1 Hroshma
More informationStochastic Optimal Controls for Parallel-Server Channels with Zero Waiting Buffer Capacity and Multi-Class Customers
Advances n Systems Scence and Applcatons (009), Vol.9, No.4 67-646 Stochastc Optmal Controls for Parallel-Server Channels wth Zero Watng Buffer Capacty and Mult-Class Customers Wanyang Da and Youy Feng
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationThe lower and upper bounds on Perron root of nonnegative irreducible matrices
Journal of Computatonal Appled Mathematcs 217 (2008) 259 267 wwwelsevercom/locate/cam The lower upper bounds on Perron root of nonnegatve rreducble matrces Guang-Xn Huang a,, Feng Yn b,keguo a a College
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More informationA Hybrid Variational Iteration Method for Blasius Equation
Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 1 (June 2015), pp. 223-229 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) A Hybrd Varatonal Iteraton Method
More informationRobust Solutions to Markov Decision Problems with Uncertain Transition Matrices
Draft. Do not dstrbute Robust Solutons to Markov Decson Problems wth Uncertan Transton Matrces Arnab Nlm, nlm@eecs.berkeley.edu (Correspondng Author) Laurent El Ghaou, elghaou@eecs.berkeley.edu Department
More informationPattern Classification (II) 杜俊
attern lassfcaton II 杜俊 junu@ustc.eu.cn Revew roalty & Statstcs Bayes theorem Ranom varales: screte vs. contnuous roalty struton: DF an DF Statstcs: mean, varance, moment arameter estmaton: MLE Informaton
More informationOn a direct solver for linear least squares problems
ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear
More informationSupplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso
Supplement: Proofs and Techncal Detals for The Soluton Path of the Generalzed Lasso Ryan J. Tbshran Jonathan Taylor In ths document we gve supplementary detals to the paper The Soluton Path of the Generalzed
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationMatrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD
Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationALGORITHM FOR THE CALCULATION OF THE TWO VARIABLES CUBIC SPLINE FUNCTION
ANALELE ŞTIINŢIFICE ALE UNIVERSITĂŢII AL.I. CUZA DIN IAŞI (S.N.) MATEMATICĂ, Tomul LIX, 013, f.1 DOI: 10.478/v10157-01-00-y ALGORITHM FOR THE CALCULATION OF THE TWO VARIABLES CUBIC SPLINE FUNCTION BY ION
More informationInexact Newton Methods for Inverse Eigenvalue Problems
Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.
More informationThe Asymptotic Distributions of the Absolute Maximum of the Generalized Wiener Random Field
Internatonal Journal of Apple Engneerng esearch ISSN 973-456 Volume, Number 8 (7) pp. 79-799 esearch Ina Publcatons. http://www.rpublcaton.com The Asmptotc Dstrbutons of the Absolute Maxmum of the Generalze
More information