arxiv: v2 [cs.lg] 9 Nov 2017

Size: px
Start display at page:

Download "arxiv: v2 [cs.lg] 9 Nov 2017"

Transcription

1 Renforcement Lernng under Model Msmtch Aurko Roy 1, Hun Xu 2, nd Sebstn Pokutt 2 rxv: v2 cs.lg 9 Nov Google Eml: urkor@google.com 2 ISyE, Georg Insttute of Technology, Atlnt, GA, USA. Eml: hun.xu@sye.gtech.edu 2 ISyE, Georg Insttute of Technology, Atlnt, GA, USA. Eml: sebstn.pokutt@sye.gtech.edu November 10, 2017 Abstrct We study renforcement lernng under model msspecfcton, where we do not hve ccess to the true envronment but only to resonbly close pproxmton to t. We ddress ths problem by extendng the frmework of robust MDPs of 2, 17, 13 to the model-free Renforcement Lernng settng, where we do not hve ccess to the model prmeters, but cn only smple sttes from t. We defne robust versons of Q-lernng, SARSA, nd TD-lernng nd prove convergence to n pproxmtely optml robust polcy nd pproxmte vlue functon respectvely. We scle up the robust lgorthms to lrge MDPs v functon pproxmton nd prove convergence under two dfferent settngs. We prove convergence of robust pproxmte polcy terton nd robust pproxmte vlue terton for lner rchtectures under mld ssumptons. We lso defne robust loss functon, the men squred robust projected Bellmn error nd gve stochstc grdent descent lgorthms tht re gurnteed to converge to locl mnmum. 1 Introducton Renforcement lernng s concerned wth lernng good polcy for sequentl decson mkng problems modeled s Mrkov Decson Process MDP, v nterctng wth the envronment 22, 20. In ths work we ddress the problem of renforcement lernng from msspecfed model. As motvtng exmple, consder the scenro where the problem of nterest s not drectly ccessble, but nsted the gent cn nterct wth smultor whose dynmcs s resonbly close to the true problem. Another plusble pplcton s when the prmeters of the model my evolve over tme but cn stll be resonbly pproxmted by n MDP. To ddress ths problem we use the frmework of robust MDPs whch ws proposed by 2, 17, 13 to solve the plnnng problem under model msspecfcton. The robust MDP frmework consders clss of models nd fnds the robust optml polcy whch s polcy tht performs best under the worst model. It ws shown by 2, 17, 13 tht the robust optml polcy stsfes the robust Bellmn equton whch nturlly leds to exct dynmc progrmmng lgorthms to fnd n optml polcy. However, ths pproch s model dependent nd does not mmedtely generlze to the model-free cse where the prmeters of the model re unknown. Essentlly, renforcement lernng s model-free frmework to solve the Bellmn equton usng smples. Therefore, to lern polces from msspecfed models, we develop smple bsed methods to solve the robust Work done whle t Georg Tech 1

2 Bellmn equton. In prtculr, we develop robust versons of clsscl renforcement lernng lgorthms such s Q-lernng, SARSA, nd TD-lernng nd prove convergence to n pproxmtely optml polcy under mld ssumptons on the dscount fctor. We lso show tht the nomnl versons of these tertve lgorthms converge to polces tht my be rbtrrly worse compred to the optml polcy. We lso scle up these robust lgorthms to lrge scle MDPs v functon pproxmton, where we prove convergence under two dfferent settngs. Under techncl ssumpton smlr to 6, 26 we show convergence of robust pproxmte polcy terton nd vlue terton lgorthms for lner rchtectures. We lso study functon pproxmton wth nonlner rchtectures, by defnng n pproprte men squred robust projected Bellmn error MSRPBE loss functon, whch s generlzton of the men squred projected Bellmn error MSPBE loss functon of 23, 24, 7. We propose robust versons of stochstc grdent descent lgorthms s n 23, 24, 7 nd prove convergence to locl mnmum under some ssumptons for functon pproxmton wth rbtrry smooth functons. Contrbuton. In summry we hve the followng contrbutons: 1. We extend the robust MDP frmework of 2, 17, 13 to the model-free renforcement lernng settng. We then defne robust versons of Q-lernng, SARSA, nd TD-lernng nd prove convergence to n pproxmtely optml robust polcy. 2. We lso provde robust renforcement lernng lgorthms for the functon pproxmton cse nd prove convergence of robust pproxmte polcy terton nd vlue terton lgorthms for lner rchtectures. We lso defne the MSRPBE loss functon whch contns the robust optml polcy s locl mnmum nd we derve stochstc grdent descent lgorthms to mnmze ths loss functon s well s estblsh convergence to locl mnmum n the cse of functon pproxmton by rbtrry smooth functons. 3. Fnlly, we demonstrte emprclly the mprovement n performnce for the robust lgorthms compred to ther nomnl counterprts. For ths we used vrous Renforcement Lernng test envronments from OpenAI 10 s benchmrk to ssess the mprovement n performnce s well s to ensure reproducblty nd consstency of our results. Relted Work. Recently, severl pproches hve been proposed to ddress model performnce due to prmeter uncertnty for Mrkov Decson Processes MDPs. A Byesn pproch ws proposed by 21 whch requres perfect knowledge of the pror dstrbuton on trnston mtrces. Other probblstc nd rsk bsed settngs were studed by 11, 28, 25 whch propose vrous mechnsms to ncorporte percentle rsk nto the model. A frmework for robust MDPs ws frst proposed by 2, 17, 13 who consder the trnston mtrces to le n some uncertnty set nd proposed dynmc progrmmng lgorthm to solve the robust MDP. Recent work by 26 extended the robust MDP frmework to the functon pproxmton settng where under techncl ssumpton the uthors prove convergence to n optml polcy for lner rchtectures. Note tht these lgorthms for robust MDPs do not redly generlze to the model-free renforcement lernng settng where the prmeters of the envronment re not explctly known. For renforcement lernng n the non-robust model-free settng, severl tertve lgorthms such s Q- lernng, TD-lernng, nd SARSA re known to converge to n optml polcy under mld ssumptons, see 5 for survey. Robustness n renforcement lernng for MDPs ws studed by 15 who ntroduced robust lernng frmework for lernng wth dsturbnces. Smlrly, 18 lso studed lernng n the presence of n dversry who mght pply dsturbnces to the system. However, for the lgorthms proposed n 15, 18 no theoretcl gurntees re known nd there s only lmted emprcl evdence. Another recent work on robust renforcement lernng s 14, where the uthors propose n onlne lgorthm wth certn trnstons beng stochstc nd the others beng dversrl nd the devsed lgorthm ensures low regret. 2

3 For the cse of renforcement lernng wth lrge MDPs usng functon pproxmtons, theoretcl gurntees for most TD-lernng bsed lgorthms re only known for lner rchtectures 3. Recent work by 7 extended the results of 23, 24 nd proved tht stochstc grdent descent lgorthm mnmzng the men squred projected Bellmn equton MSPBE loss functon converges to locl mnmum, even for nonlner rchtectures. However, these lgorthms do not pply to robust MDPs; n ths work we extend these lgorthms to the robust settng. 2 Prelmnres We consder n nfnte horzon Mrkov Decson Process MDP 20 wth fnte stte spce X of sze n nd fnte cton spce A of sze m. At every tme step t the gent s n stte X nd cn choose n cton A ncurrng cost c t,. We wll mke the stndrd ssumpton tht future cost s dscounted, see e.g., 22, wth dscount fctor ϑ < 1 ppled to future costs,.e., c t, := ϑ t c,, where c, s fxed constnt ndependent of the tme step t for X nd A. The sttes trnston ccordng to probblty trnston mtrces τ := {P } A whch depends only on ther lst tken cton. A polcy of the gent s sequence π = 0, 1,..., where every t corresponds to n cton n A f the system s n stte t tme t. For every polcy π, we hve correspondng vlue functon v π R n, where v π for stte X mesures the expected cost of tht stte f the gent were to follow polcy π. Ths cn be expressed by the followng recurrence relton v π := c, 0 + ϑe j X v π j. 1 The gol s to devse lgorthms to lern n optml polcy π tht mnmzes the expected totl cost: Defnton 2.1 Optml polcy. Gven n MDP wth stte spce X, cton spce A nd trnston mtrces P, let Π be the strtegy spce of ll possble polces. Then n optml polcy π s one tht mnmzes the expected totl cost,.e., π := rg mn π Π E t=0 ϑ t c t, t t. In the robust cse we wll ssume s n 17, 13 tht the trnston mtrces P re not fxed nd my come from some uncertnty regon P nd my be chosen dversrlly by nture n future runs of the model. In ths settng, 17, 13 prove the followng robust nlogue of the Bellmn recurson. A polcy of nture s sequence τ := P 0, P 1,... where every P t P corresponds to trnston probblty mtrx chosen from P. Let T denote the set of ll such polces of nture. In other words, polcy τ T of nture s sequence of trnston mtrces tht my be plyed by t n response to the ctons of the gent. For ny set P R n nd vector v R n, let σ P v := sup { p v p P } be the support functon of the set P. For stte X, let P be the projecton onto the th row of P. Theorem We hve the followng perfect dulty relton mn mx E τ ϑ t c t, t t π Π τ T t=0 = mx mn E τ ϑ t c t, t t τ T π Π t=0. 2 The optml vlue functon v π correspondng to the optml polcy π stsfes v π = mn c, + ϑσ P A v π, 3 } nd π cn then be obtned n greedy fshon,.e., rg mn A {c, + ϑσ P v. The mn shortcomng of ths pproch s tht t does not generlze to the model free cse where the trnston probbltes re not explctly known but rther the gent cn only smple sttes ccordng to these 3

4 z x y Fgure 1: Exmple trnston mtrces shown wthn the probblty smplex n wth uncertnty sets beng l 2 blls of fxed rdus. probbltes. In the bsence of ths knowledge, we cnnot compute the support functons of the uncertnty sets P. On the other hnd t s often esy to hve confdence regon U, e.g., bll or n ellpsod, correspondng to every stte-cton pr X, A tht quntfes our uncertnty n the smulton, wth the uncertnty set P beng the confdence regon U centered round the unknown smultor probbltes. Formlly, we defne the uncertnty sets correspondng to every stte cton pr n the followng fshon. Defnton 2.3 Uncertnty sets. Correspondng to every stte-cton pr, we hve confdence regon U so tht the uncertnty regon P of the probblty trnston mtrx correspondng to, s defned s P := {x + p x U }, 4 where p s the unknown stte trnston probblty vector from the stte X to every other stte n X gven cton durng the smulton. As smple exmple, we hve the ellpsod U := { x x A x 1, X x = 0 } for some n n psd mtrx A wth the uncertnty set P beng P := { x + p x U }, where p s the unknown smultor stte trnston probblty vector wth whch the gent trnstoned to new stte durng trnng. Note tht whle t my esy to come up wth good descrptons of the confdence regon U, the pproch of 17, 13 breks down snce we hve no knowledge of p nd merely observe the new stte j smpled from ths dstrbuton. See Fgure 1 for n llustrton wth the confdence regons beng n l 2 bll of fxed rdus r. In the followng sectons we develop robust versons of Q-lernng, SARSA, nd TD-lernng whch re gurnteed to converge to n pproxmtely optml polcy tht s robust wth respect to ths confdence regon. The robust versons of these tertve lgorthms nvolve n ddtonl lner optmzton step over the set U, whch n the cse of U = { x 2 r} smply corresponds to ddng fxed nose durng every updte. In lter sectons we wll extend t to the functon pproxmton cse where we study lner rchtectures s well s nonlner rchtectures; n the ltter cse we derve new stochstc grdent descent lgorthms for computng pproxmtely robust polces. 4

5 3 Robust exct dynmc progrmmng lgorthms In ths secton we develop robust versons of exct dynmc progrmmng lgorthms such s Q-lernng, SARSA, nd TD-lernng. These methods re sutble for smll MDPs where the sze n of the stte spce s not too lrge. Note tht confdence regon U must lso be constrned to le wthn the probblty smplex n, see Fgure 1. However snce we do not hve knowledge of the smultor probbltes p, we do not know how fr wy p s from the boundry of n nd so the lgorthms wll mke use of proxy confdence regon Û where we drop the requrement of Û n, to compute the robust optml polces. Wth sutble choce of step lengths nd dscount fctors we cn prove convergence to n pproxmtely optml U -robust polcy where the pproxmton depends on the dfference between the unconstrned proxy regon Û nd the true confdence regon U. Below we gve specfc exmples of possble choces for smple confdence regons. 1. Ellpsod: Let {A }, be sequence of n n psd mtrces. Then we cn defne the confdence regon s { } U := x x A x 1, x = 0, pj x j 1 pj, j X. 5 X Note tht U hs some ddtonl lner constrnts so tht the uncertnty set P := { p + x x } U les nsde n. Snce we do not know p, we wll mke use of the proxy confdence regon Û := {x x A x 1, X x = 0}. In prtculr when A = r 1 I n for every X, A then ths corresponds to sphercl confdence ntervl of r, r n every drecton. In other words, ech uncertnty set P s n l 2 bll of rdus r. 2. Prllelepped: Let {B }, be sequence of n n nvertble mtrces. Then we cn defne the confdence regon s { } U := x B x 1 1, x = 0, pj x j 1 pj, j X. 6 X As before, we wll use the unconstrned prllelepped Û wthout the pj x j 1 pj constrnts, s proxy for U snce we do not hve knowledge p. In prtculr f B = D for dgonl mtrx D, then the proxy confdence regon Û corresponds to rectngle. In prtculr f every dgonl entry s r, then every uncertnty set P s n l 1 bll of rdus r. 3.1 Robust Q-lernng Let us recll the noton of Q-fctor of stte-cton pr, nd polcy π whch n the non-robust settng s defned s Q, := c, + E j X vj, 7 where v s the vlue functon of the polcy π. In other words, the Q-fctor represents the expected cost f we strt t stte, use the cton nd follow the polcy π subsequently. One my smlrly defne the robust Q-fctors usng smlr nterpretton nd the mnmx chrcterzton of Theorem 2.2. Let Q denote the Q-fctors of the optml robust polcy nd let v R n be ts vlue functon. Note tht we my wrte the vlue functon n terms of the Q-fctors s v = mn A Q,. From Theorem 2.2 we hve the followng expresson for Q : Q, = c, + ϑσ P v 8 = c, + ϑσ U v + ϑ pj mn j X A Q j,, 9 5

6 where equton 9 follows from Defnton 2.3. For n estmte Q t of Q, let v t R n be ts vlue vector,.e., v t := mn A Q t,. The robust Q-terton s defned s: Q t, := 1 γ t Q t 1, + γ t c, + ϑσû v t 1 + ϑ mn Q t 1j,, 10 A where stte j X s smpled wth the unknown trnston probblty pj usng the smultor. Note tht the robust Q-terton of equton 10 nvolves n ddtonl lner optmzton step to compute the support functon σû v t of v t over the proxy confdence regon Û. We wll prove tht tertng equton 10 converges to n pproxmtely optml polcy. The followng defnton ntroduces the noton of n ε-optml polcy, see e.g., 5. The error fctor ε s lso referred to s the mplfcton fctor. We wll tret the Q-fctors s X A mtrx n the defnton so tht ts l norm s defned s usul. Defnton 3.1 ε-optml polcy. A polcy π wth Q-fctors Q s ε-optml wth respect to the optml polcy π wth correspondng Q-fctors Q f Q Q ε Q. 11 The followng smple lemm llows us to decompose the optmzton of lner functon over the proxy uncertnty set P n terms of lner optmzton over P, U, nd Û. Lemm 3.2. Let v R n be ny vector nd let β := mx y Û σ P v + β v. mn x U y x 1. Then we hve v σ P Proof. Note tht every pont p n P s of the form p + x for some x U nd every pont q P s of the form p + y for some y Û, nd ths correspondence s one to one by defnton. For ny vector v Rn nd prs of ponts p P nd q P we hve q v = p v + q p v 12 sup p v + p + y p x v 13 p P = σ P v + y x v. 14 σ P v + y x v 15 σ P v + y v mn x v 16 x U σ P v + mx mny x v 17 y Û x U σ P v + mx mn y x y Û x U 1 v 18 σ P v + β v. 19 Snce equton 19 holds for every q P, t follows tht t lso holds for rg mx σ P v so tht σ P v σ P v + β v. 20 The followng theorem proves tht under sutble choce of step lengths γ t nd dscount fctor ϑ, the terton of equton 10 converges to n ε-pproxmtely optml polcy wth respect to the confdence regons U. 6

7 Theorem 3.3. Let the step lengths γ t of the Q-terton lgorthm be chosen such tht t=0 γ t = nd t=0 γt 2 < nd let the dscount fctor ϑ < 1. Let β be s n Lemm 3.2 nd let β := mx X, A β. If ϑ1 + β < 1 then wth probblty 1 the terton of equton 10 converges to n ε-optml polcy where ε := ϑβ 1 ϑ1+β. Proof. Let P be the proxy uncertnty set for stte X nd A,.e., P := { } x + p x Û. We denote the vlue functon of Q by v. Let us defne the followng opertor H mppng Q-fctors to Q-fctors s follows: H Q, := c, + v. 21 ϑσ P We wll frst show tht soluton Q to the equton H Q = Q s n ε-optml polcy s n Defnton 3.1,.e., Q Q ε Q. Q, Q, = H Q, c, ϑσ P v 22 = ϑ v σ P σ P v 23 ϑ mx y x 1 Q + σ P y Û v σ P v 24,x U ϑβ Q + σ P v σ P v 25 ϑβ Q + ϑ mx q q P j mn j X A Q j, mx q j mn q P j X A Q j, 26 ϑβ Q + ϑ mx q j mn q P j X A Q j, mn A Q j, 27 ϑβ Q + ϑ mx q j mx q P j X A Q j, Q j, 28 ϑβ Q + ϑ mx q j Q Q 29 q P j X ϑβ Q + ϑ Q Q, 30 where we used Lemm 3.2 to derve equton 24. Equton 30 mples tht Q Q ϑβ 1 ϑ Q. If Q Q then we re done snce ϑβ 1 ϑ ϑβ 1 ϑ1+β. Otherwse ssume tht Q > Q nd use the trngle nequlty: Q Q = Q Q Q Q. Ths mples tht 1 ϑ ϑβ Q Q Q Q Q, 31 from whch t follows tht Q Q ε Q under the ssumpton tht ϑ1 + β < 1 s clmed. The Q-terton of equton 10 cn then be reformulted n terms of the opertor H s Q t, = 1 γ t Q t 1, + γ t H Q t, + η t,, 32 where η t, := mn A Q t j, E j p mn A Q t j, where the expectton s over the sttes j X wth the trnston probblty from stte to stte j gven by p j. Note tht ths s n exmple of 7

8 stochstc pproxmton lgorthm s n 5 wth nose prmeter η t. Let F t denote the hstory of the lgorthm untl tme t. Note tht E j p η t, F t = 0 by defnton nd the vrnce s bounded by E j p ηt, 2 F t K 1 + mx j X A Q 2 t j,. 33 Thus the nose term η t stsfes the zero condtonl men nd bounded vrnce ssumpton Assumpton 4.3 n 5. Therefore t remns to show tht the opertor H s contrcton mppng to rgue tht tertng equton 10 converges to the optml Q-fctor Q. We wll show tht the opertor H s contrcton mppng wth respect to the nfnty norm.. Let Q nd Q be two dfferent Q-vectors wth vlue functons v nd v. If U s not necessrly the sme s the unconstrned proxy set Û for some X, A, then we need the dscount fctor to stsfy ϑ1 + β n order to ensure convergence. Intutvely, the dscount fctor should be smll enough tht the dfference n the estmton due to the dfference of the sets U nd Û converges to 0 over tme. In ths cse we show contrcton for opertor H s follows H Q, H Q, ϑ mx q P ϑ mx q P ϑ mx j X j X y Û,x U q j mn A mn A Q j, 34 q j mx Qj, Q j, 35 A y x 1 Q Q + ϑ mx q P ϑβ Q Q + ϑ Q Q mx q P j X j X q j Q Q 36 q j 37 ϑβ + 1 Q Q 38 where we used Lemm 3.2 wth vector vj := mx A Qj, Q j, to derve equton 36 nd the fct tht P n to conclude tht mx q P j X q j = 1. Therefore f ϑ1 + β < 1, then t follows tht the opertor H s norm contrcton nd thus the robust Q-terton of equton 10 converges to soluton of H Q = Q whch s n ε-pproxmtely optml polcy for ε = ϑβ 1 ϑ1+β, s ws proved before. Remrk 3.4. If β = 0 then note tht by Theorem 3.3, the robust Q-tertons converge to the exct optml mn x U y x ξ mx y Û Q-fctors snce ε = 0. Snce β = mx X, A ξ mn, t follows tht β = 0 ff Û = U for every X, A. Ths hppens when the confdence regon s smll enough so tht the smplex constrnts pj x j 1 pj j X n the descrpton of P become redundnt for every X, A. Equvlently every p s fr from the boundry of the smplex n compred to the sze of the confdence regon U, see e.g., Fgure 1. Remrk 3.5. Note tht smply usng the nomnl Q-terton wthout the σû v term does not gurntee convergence to Q. Indeed, the nomnl Q-tertons converge to Q-fctors Q where Q Q my be rbtrry lrge. Ths follows esly from observng tht Q, Q, = σû v, where v s the vlue functon of Q nd so Q Q = mx X, A σû v, 39 whch cn be s hgh s v = Q. See Secton 5 for n expermentl demonstrton of the dfference n the polces lerned by the robust nd nomnl lgorthms. 8

9 3.2 Robust SARSA Recll tht the updte rule of SARSA s smlr to the updte rule for Q-lernng except tht nsted of choosng the cton = rg mn A Q t 1 j,, we choose the cton where wth probblty δ, the cton s chosen unformly t rndom from A nd wth probblty 1 δ, we hve = rg mn A Q t 1 j,. Therefore, t s esy to modfy the robust Q-terton of equton 10 to gve us the robust SARSA updtes: Q t, := 1 γ t Q t 1, + γ t c, + ϑσû v t 1 + ϑ Q t 1 j,. 40 In the exct dynmc progrmmng settng, t hs the sme convergence gurntees s robust Q-lernng nd cn be seen s corollry of Theorem 3.3. Corollry 3.6. Let the step lengths γ t be chosen such tht t=0 γ t = nd t=0 γt 2 < nd let the dscount fctor ϑ < 1. Let β be s n Lemm 3.2 nd let β := mx X, A β. If ϑ1 + β < 1 then ϑβ wth probblty 1 the terton of equton 40 converges to n ε-optml polcy where ε := 1 ϑ1+β. In prtculr f β = β = 0 so tht the proxy confdence regons Û re the sme s the true confdence regons U, then the terton 40 converges to the true optmum Q. 3.3 Robust TD-lernng Recll tht TD-lernng llows us to estmte the vlue functon v π for gven polcy π. In ths secton we wll generlze the TD-lernng lgorthm to the robust cse. The mn de behnd TD-lernng n the non-robust settng s the followng Bellmn equton v π := E j p π c, π + v π j. 41 Consder trjectory of the gent 0, 1,..., where m denotes the stte of the gent t tme step m. For tme step m, defne the temporl dfference d m s d m := c m, π m + ϑv π m+1 v π m. 42 Let λ 0, 1. The recurrence relton for TDλ my be wrtten n terms of the temporl dfference d m s v π k = E m=0 ϑλ m k d m + v π k. 43 The correspondng Robbns-Monro stochstc pproxmton lgorthm wth step sze γ t for equton 43 s v t+1 k := v t k + γ t ϑλ m k d m. 44 m=k A more generl vrnt of the TDλ tertons uses elgblty coeffcents z m for every stte X nd temporl dfference vector d m n the updte for equton 44 v t+1 := v t + γ t z m d m. 45 m=k Let m denote the stte of the smultor t tme step m. For the dscounted cse, there re two possbltes for the elgblty vectors z m ledng to two dfferent TDλ tertons: 9

10 1. The every-vst TDλ method, where the elgblty coeffcents re { ϑλz m 1 f m = z m := ϑλz m f m =. 2. The restrt TDλ method, where the elgblty coeffcents re { ϑλz m 1 f m = z m := 1 f m =. We mke the followng ssumptons bout the elgblty coeffcents tht re suffcent for proof of convergence. Assumpton 3.7. The elgblty coeffcents z m stsfy the followng condtons 1. z m 0 2. z 1 = 0 3. z m ϑz m 1 f / { 0, 1,... } 4. The weght z m gven to the temporl dfference d m should be chosen before ths temporl dfference s generted. Note tht the elgblty coeffcents of both the every-vst nd restrt TDλ tertons stsfy Assumpton 3.7. In the robust settng, we re nterested n estmtng the robust vlue of polcy π, whch from Theorem 2.2 we my express s v π := c, π + ϑ mx p P π E j p v π j, 46 where the expectton s now computed over the probblty vector p chosen dversrlly from the uncertnty regon P. As n Secton 3.1, we my decompose mx p P E j p vj = σ P v s mx p P π E j p vj = σ π U v + E π j p vj, 47 where p π s the trnston probblty of the gent durng smulton. For the remnder of ths secton, we wll drop the subscrpt nd just use E to denote expectton wth respect to ths trnston probblty p π. Defne smulton to be trjectory { 0, 1,..., Nt } of the gent, whch s stopped ccordng to rndom stoppng tme N t. Note tht N t s rndom vrble for mkng stoppng decsons tht s not llowed to foresee the future. Let F t denote the hstory of the lgorthm up to the pont where the t th smulton s bout to commence. Let v t be the estmte of the vlue functon t the strt of the t th smulton. Let { 0, 1,..., Nt } be the trjectory of the gent durng the t th smulton wth 0 =. Durng trnng, we generte severl smultons of the gent nd updte the estmte of the robust vlue functon usng the the robust temporl dfference d m whch s defned s d m := d m + ϑσûπm v t, 48 m = c m, π m + ϑv t m+1 v t m + ϑσûπm v t, 49 m 10

11 where d m s the usul temporl dfference defned s before d m := c m, π m + ϑv t m+1 v t m. 50 The robust TD-updte s now the usul TD-updte, except tht we use the robust temporl dfference computed over the proxy confdence regon: N t 1 v t+1 := v t + γ t z m dm, 51 m=0 N t 1 = v t + γ t z m m=0 ϑσûπm v t + d m. 52 m We defne n ε-pproxmte vlue functon for fxed polcy π n wy smlr to the ε-optml Q-fctors s n Defnton 3.1: Defnton 3.8 ε-pproxmte vlue functon. Gven polcy π, we sy tht vector v R n s n ε- pproxmton of v π f the followng holds v v π ε v π. The followng theorem gurntees convergence of the robust TD terton of equton 51 to n pproxmte vlue functon for π under Assumpton 3.7. Theorem 3.9. Let β be s n Lemm 3.2 nd let β := mx X, A β. Let ρ := mx X m=0 z m. If ϑ1 + ρβ < 1 then the robust TD-tertons of equton 51 converges to n ε-pproxmte vlue functon, ϑβ where ε := 1 ϑ1+ρβ. In prtculr f β = β = 0,.e., the proxy confdence regon Û s the sme s the true confdence regon U, then the convergence s exct,.e., ε = 0. Note tht n the specl cse of regulr TDλ tertons, ρ = ϑλ 1 ϑλ. Proof. Let { P be the proxy uncertnty } set for stte X nd cton A s n the proof of Theorem 3.3,.e., P := x + p x Û. Let I t := {m m = } be the set of tme ndces the t th smulton vsts stte. We defne δ t := mx qm P πm E m q m m It z m F t, so tht we my wrte the updte of m equton 51 s v t+1 = v t 1 γ t δ t + γ t δ t E N t 1 m=0 z m d Ft m + v t 53 δ t +γ t δ t ϑ N t 1 m=0 z m d m E N t 1 m=0 z m d Ft m. 54 δ t Let us defne the opertor H t : R n R n correspondng to the t th smulton s E N t 1 m=0 z m c m, π m + ϑσûπm v + ϑv m+1 v m F t m H t v := δ t + v

12 We clm s n the proof of Theorem 3.3 tht soluton v to H t v = v must be n ε-pproxmton to v π. Defne the opertor H t wth the proxy confdence regons replced by the true ones,.e., H tv := E N t 1 m=0 z m c m, π m + ϑσ πm U m δ t v + ϑv m+1 v m F t + v. 56 Note tht H t v π = v π for the robust vlue functon v π snce c m, π m + ϑσ πm U v π + ϑv π m+1 m v π m = 0 for every m X by Theorem 2.2. Fnlly by Lemm 3.2 we hve σûπm v + E v m σ πm U + E v m + β v, 57 m m for ny vector v, where the expectton s over the stte m p π m 1 m 1. Thus for ny soluton v to the equton H t v = v, we hve v v π = H t v v π 58 H tv v π + ϑβ v E Nt 1 m=0 = H tv H tv π + ϑβ v E z m Nt 1 z m m= ϑ v v π + ϑρβ v, 61 where equton 61 follows from equton 56. Therefore the soluton to H t v = v s n ε-pproxmton to ϑβ v π for ε = 1 ϑ1+ρβ f ϑ1 + ρβ < 1 s n the proof of Theorem 3.3. Note tht the opertor H t ppled to the tertes v t s H t v t = E pproxmton lgorthm of the form N t 1 m=0 zt m d Ft m,t δ t + v t so tht the updte of equton 51 s stochstc v t+1 = 1 γ t v t + γ t H t v t + η t, where γ t = γ t δ t nd η t s nose term wth zero men nd s defned s η t := N t 1 m=0 zt m d m E N t 1 m=0 zt m d Ft m. 62 δ t Note tht by Lemm 5.1 of 5, the new step szes stsfy t=0 γ t = nd t=0 γ t 2 < f the orgnl step sze γ t stsfes the condtons t=0 γ t = nd t=0 γt 2 <, snce the condtons on the elgblty coeffcents re unchnged. Note tht the nose term lso stsfes the bounded vrnce of Lemm 5.2 of 5 snce ny q P π stll specfes dstrbuton s P π n. Therefore, t remns to show tht H t s norm contrcton wth respect to the l norm on v. Let us defne the opertor A t s A t v := E N t 1 m=0 z m ϑσûπm m v + ϑv m+1 v m F t δ t + v 63 12

13 nd the expresson b t := E N t 1 m=0 c m,π m F t δ t so tht H t v = A t v + b t. We wll show tht A t v α v for some α < 1 from whch the contrcton on H t follows becuse for ny vector v R n nd the ε-optml vlue functon v = H t v we hve H t v v = H t v H t v = A t v v α v v. 64 Let us now nlyze the expresson for A t. We wll show tht Nt 1 E z m ϑv m+1 v m + ϑσûπ v + m=0 m I t α v E z m v F t 65 z m F t. 66 m I t We frst replce the σûπm term wth σ πm U usng Lemm 3.2 whle ncurrng ρβ v penlty. Let us m m collect together the coeffcents correspondng to v m n the expresson for the expectton: E Nt 1 z m m=0 mx q m P πm m = mx q m P πm m ϑv m+1 v m + ϑσ U πm m E m q m Nt 1 m=0 E m q m Nt m=0 v + m I t z m ϑv m+1 v m + ϑz m 1 z m v m + where we obtn nequlty 68 by subsumng the σ U πm m m I t m I t z m v F t + ϑρβ v 67 z m v F t + ϑρβ v 68 z m v F t + ϑρβ v, 69 term wthn the expectton snce P π m m s now prt of the smplex n nd tkng the worst possble dstrbuton q m. We lso used the fct tht z 1 = 0 nd z Nt = 0. Note tht whenever m =, the coeffcent ϑz m 1 z m of v m s nonnegtve whle whenever m =, then the coeffcent ϑz m 1 z m + z m s lso nonnegtve. Therefore, we my bound the rght hnd sde of equton 67 s mx q m P πm m mx q m P πm m E m q m Nt m=0 E m q m Nt m=0 ϑz m 1 z m v m + ϑz m 1 z m v + m I t m I t z m v F t + ϑρβ v 70 z m v F t + ϑρβ v

14 Let us now collect the terms correspondng to fxed z m : mx q m P πm m E m q m Nt m=0 = v mx v q m P πm m mx q m P πm m ϑz m 1 z m v + Nt 1 E m q m m=0 E m q m m I t m I t z m v F t + ϑρβ v 72 z m ϑ 1 + z m F t + ϑρβ v 73 m I t z m ϑ 1 + z m F t + ϑρβ v 74 m I t v ϑ 1 + ρβ E z m F t 75 m I t where equton 74 follows snce ϑ < 1. Therefore settng α = ϑ 1 + ρβ, our clm follows under the ssumpton tht ϑ1 + ρβ < 1. 4 Robust Renforcement Lernng wth functon pproxmton In Secton 3 we derved robust versons of exct dynmc progrmmng lgorthms such s Q-lernng, SARSA, nd TD-lernng respectvely. If the stte spce X of the MDP s lrge then t s prohbtve to mntn lookup tble entry for every stte. A stndrd pproch for lrge scle MDPs s to use the pproxmte dynmc progrmmng ADP frmework 19. In ths settng, the problem s prmetrzed by smller dmensonl vector θ R d where d n = X. The nturl generlztons of Q-lernng, SARSA, nd TD-lernng lgorthms of Secton 3 re v the projected Bellmn equton, where we project bck to the spce spnned by ll the prmeters n θ R d, snce they re the vlue functons representble by the model. Convergence for these lgorthms even n the non-robust settng re known only for lner rchtectures, see e.g., 3. Recent work by 7 proposed stochstc grdent descent lgorthms wth convergence gurntees for smooth nonlner functon rchtectures, where the problem s frmed n terms of mnmzng loss functon. We gve robust versons of both these pproches. 4.1 Robust pproxmtons wth lner rchtectures In the pproxmte settng wth lner rchtectures, we pproxmte the vlue functon v π of polcy π by Φθ where θ R d nd Φ s n n d feture mtrx wth rows φj for every stte j X representng ts feture vector. Let S be the spn of the columns of Φ,.e., S := { Φθ θ R d} s the set of representble vlue functons. Defne the opertor T π : R n R n s T π v := c, π + ϑ p π j vj, 76 j X so tht the true vlue functon v π stsfes T π v π = v π. A nturl pproch towrds estmtng v π gven current estmte Φθ t s to compute T π Φθ t nd project t bck to S to get the next prmeter θ t+1. The motvton behnd such n terton s the fct tht the true vlue functon s fxed pont of ths operton f t belonged to the subspce S. Ths gves rse to the projected Bellmn equton where the projecton Π s typclly tken wth respect to weghted Euclden norm ξ,.e., x ξ = X ξ x 2, where ξ s some probblty dstrbuton over the sttes X, see 3 for survey. In the model free cse, where we do not hve explct knowledge of the trnston probbltes, vrous methods lke LSTDλ, LSPEλ, nd TDλ hve been proposed see e.g., 4, 9, 8, 16, 23, 24. The key de behnd provng convergence for these methods s to show tht the mppng ΠT π s contrcton mppng 14

15 wth respect to the ξ for some dstrbuton ξ over the sttes X. Whle the opertor T π n the non-robust cse s lner nd s contrcton n the l norm s n Secton 3, the projecton opertor wth respect to such norms s not gurnteed to be contrcton. However, t s known tht f ξ s the stedy stte dstrbuton of the polcy π under evluton, then Π s non-expnsve n ξ 5, 3. Hence becuse of dscountng, the mppng ΠT π s contrcton. We generlze these methods to the robust settng v the robust Bellmn opertors T π defned s T π v := c, π + ϑσ π P v. 77 Snce we do not hve ccess to the smultor probbltes p, we wll use proxy set P s n Secton 3, wth the proxy opertor denoted by T π. Whle the tertve methods of the non-robust settng generlze v the robust opertor T π nd the robust projected Bellmn equton Φθ = ΠT π Φθ, t s however not cler how to choose the dstrbuton ξ under whch the projected opertor ΠT π s contrcton n order to show convergence. Let ξ be the stedy stte dstrbuton of the explorton polcy π of the MDP wth trnston probblty mtrx P π,.e. the polcy wth whch the gent chooses ts ctons durng the smulton. We mke the followng ssumpton on the dscount fctor ϑ s n 26. Assumpton 4.1. For every stte X nd cton A, there exsts constnt α 0, 1 such tht for ny p P we hve ϑp j αp π j for every j X. Assumpton 4.1 mght pper rtfclly restrctve; however, t s necessry to prove tht ΠT π s contrcton. Whle 26 requre ths ssumpton for provng convergence of robust MDPs, smlr ssumpton s lso requred n provng convergence of off-polcy Renforcement Lernng methods of 6 where the sttes re smpled from n explorton polcy π whch s not necessrly the sme s the polcy π under evluton. Note tht n the robust settng, ll methods re necessrly off-polcy snce the trnston mtrces re not fxed for gven polcy. The followng lemm s n ξ-weghted Euclden norm verson of Lemm 3.2. Lemm 4.2. Let v R n be ny vector nd let β := mx y Û where ξ mn := mn X ξ. σ P mn x U y x ξ ξ mn. Then we hve v σ P v + β v ξ, 78 Proof. Sme s Lemm 3.2 except now we tke Cuchy-Schwrz wth respect to weghted Euclden norm ξ n the followng mnner b Ξb ξ mn ξ b ξ ξ mn. 79 The followng theorem shows tht the robust projected Bellmn equton s contrcton under resonble ssumptons on the dscount fctor ϑ. Theorem 4.3. Let β be s n Lemm 4.2 nd let β := mx X β π. If the dscount fctor ϑ stsfes Assumpton 4.1 for some α nd α 2 + ϑ 2 β 2 < 1 2, then the opertor T π s contrcton wth respect to ξ. In other words, for ny two θ, θ R d, we hve T π Φθ T π Φθ 2 2 α 2 + ϑ 2 β 2 Φθ Φθ 2 ξ ξ < Φθ Φθ 2 ξ

16 If β = β = 0 so tht Ûπ = U π, then we hve smpler contrcton under the ssumpton tht α < 1,.e., T π Φθ T π Φθ α Φθ Φθ ξ ξ < Φθ Φθ ξ. 81 Proof. Consder two prmeters θ nd θ n R d. Then we hve T π Φ θ T π Φ θ 2 2 = ξ T π Φ θ T π Φ θ 82 ξ X = ϑ 2 ξ X = ϑ 2 ξ X ϑ 2 ξ X ϑ 2 ξ X σ θ σ Φ P π Φ P π sup q P π sup q P π sup q P π q Φθ q sup P π q Φθ Φθ θ q Φθ 2 q Φθ Φθ + β Φθ Φθ ξ ξ α Pj π φj θ φj θ + ϑβ 2 Φθ Φθ ξ 87 X j X 2 ξ α 2 Pj π X j X φj θ φj θ 2 + ϑ 2 β 2 Φθ Φθ 2 ξ 88 2α 2 + ϑ 2 β 2 Φθ Φθ 2 ξ 89 where we used Lemm 4.2 nd the defnton of β n lne 86, the nequlty + b b 2, 2 nd the fct tht Pj π P π j. Note tht f β π = β = 0 so tht the proxy confdence regon s the sme s the true confdence regon, then we hve the smple upper bound of T π Φ θ T π Φ θ 2 ξ α 2 Φθ Φθ 2 ξ T nsted of π Φ θ T π Φ θ 2 ξ 2α2 Φθ Φθ 2 ξ snce we do not hve the cross term n equton 87 n ths cse. The followng corollry shows tht the soluton to the proxy projected Bellmn equton converges to soluton tht s not too fr wy from the true vlue functon v π. Corollry 4.4. Let Assumpton 4.1 hold nd let β be s n Theorem 4.3. Let ṽ π be the fxed pont of the projected Bellmn equton for the proxy opertor T π,.e., Π T π ṽ π = ṽ π. Let v π be the fxed pont of the proxy opertor T π,.e., T π v π = v π. Let v π be the true vlue functon of the polcy π,.e., T π v π = v π. Then the followng holds ṽ π v π ξ ϑβ v π ξ + Πv π v π ξ α 2 + ϑ 2 β 2 16

17 In prtculr f β = β = 0.e., the proxy confdence regon s ctully the true confdence regon, then the proxy projected Bellmn equton hs soluton stsfyng ṽ π v π ξ Πv π v π ξ 1 α. Proof. We hve the followng expresson ṽ π v π ξ ṽ π Πv π ξ + Πv π v π ξ 91 ξ Π T π ṽ π ΠT π v π + Πv π v π ξ 92 Π T π ṽ π Π T π v π + ϑβ v π ξ + Πvπ v π ξ 93 ξ Π T π ṽ π Π T π v π + ϑβ v π ξ + Πv π v π ξ 94 2α 2 + ϑ 2 β 2 ṽ π v π ξ + ϑβ v π ξ + Πv π v π ξ, 95 ξ where we used Lemm 4.2 to derve nequlty 93 nd Theorem 4.3 to conclude tht Π T π ṽ π Π T π v π 2α2 + ϑ 2 β 2 ṽ π v π ξ. If β π = β = 0 so tht the proxy confdence regons re the sme s the true confdence regons, then we hve α nsted of 2α 2 + ϑ 2 β 2 n the lst equton due to Theorem 4.3. Theorem 4.3 gurntees tht the robust projected Bellmn tertons of LSTDλ, LSPEλ nd TDλ- methods converge, whle Corollry 4.4 gurntees tht the soluton t converges to s not too fr wy from the true vlue functon v π. We refer the reder to 3 for more detls on LSTDλ, LSPEλ snce ther proof of convergence s nlogous to tht of TDλ. 4.2 Robust stochstc grdent descent lgorthms Whle the TDλ-lernng lgorthms wth functon pproxmton wth lner rchtectures converges to v π f the sttes re smpled ccordng to the polcy π, t s known to be unstble f the sttes re smpled n n offpolcy mnner,.e., n the termnology of the prevous secton π = π. Ths ssue ws ddressed by 23, 24 who proposed stochstc grdent descent bsed TD0 lgorthm tht converges for lner rchtectures n the off-polcy settng. Ths ws further extended by 7 who extended t to pproxmtons usng rbtrry smooth functons nd proved convergence to locl optmum. In ths secton we show how to extend these off-polcy methods to the robust settng wth uncertn trnstons. Note tht ths s n lterntve pproch to the requrement of Assumpton 4.1, snce under ths ssumpton ll off-polcy methods would lso converge. The mn de of 24 s to devse stochstc grdent lgorthms to mnmze the followng loss functon clled the men squre projected Bellmn error MSPBE lso studed n 1, 12. MSPBEθ := v θ ΠT π v θ 2 ξ. 96 Note tht the loss functon s 0 for θ tht stsfes the projected Bellmn equton, Φθ = T π Φθ. Consder lner rchtecture s n Secton 4.1 where v θ := Φθ. Let X be rndom stte chosen wth dstrbuton ξ. Denote φ by the shorthnd φ nd φ by φ. Then t s esy to show tht MSPBEθ := v θ ΠT π v θ 2 ξ = E dφ E φφ 1 E dφ, 97 where the expectton s over the rndom stte nd d s the temporl dfference error for the trnston,.e., d := c, + ϑθ φ θ φ, where the cton nd the new stte re chosen ccordng to the explorton polcy π. The negtve grdent of the MSPBE functon s 1 φ 2 MSPBEθ = E ϑφ φ w 98 = E dφ ϑe φ φ w 99 17

18 where w = E φφ 1 E dφ. Both d nd w depend on θ. Snce the expectton s hrd to compute exctly 24 ntroduce set of weghts w k whose purpose s to estmte w for fxed θ. Let d k denote the temporl dfference error for prmeter θ k. The weghts w k re then updted on fst tme scle s w k+1 := w k + β k d k φk w k φ k, 100 whle the prmeter θ k s updted on slower tmescle n the followng two possble mnners θ k+1 := θ k + α k φk ϑφ k φ k w k GTD2 101 θ k+1 := θ k + α k d k φ k ϑα k φ k φ k w k TDC extended ths to the cse of smooth nonlner rchtectures, where the spce S := { v θ θ R d} spnned by ll vlue functons v θ s now dfferentble sub-mnfold of R n rther thn lner subspce. Projectng onto such nonlner mnfolds s computtonlly hrd problem, nd to get round ths 7 project nsted onto the tngent plne t θ ssumng the prmeter θ chnges very lttle n one step. Ths llows 7 to generlze the updtes of equtons 100 nd 101 wth n ddtonl Hessn term 2 v θ whch vnshes f v θ s lner n θ. In the followng sectons we extend the stochstc grdent lgorthms of 7, 23, 24 to the robust settng wth uncertn trnston mtrces. Snce the number n of sttes s prohbtvely lrge, we wll mke the smplfyng ssumpton tht U = U nd Û = U for the results of the followng sectons Robust stochstc grdent lgorthms wth lner rchtectures In ths secton we extend the results of 24 to the robust settng, where we re nterested n fndng soluton to the robust projected Bellmn equton Φθ = T π Φθ, where T π s the robust Bellmn opertor of equton 77. Let T π denote the proxy robust Bellmn opertors usng the proxy uncertnty set Û nsted of U. A nturl generlzton of 24 s to ntroduce the followng loss functon whch we cll men squred robust projected Bellmn error MSRPBE: 2 MSRPBEθ := v θ Π T π v θ, 103 ξ where the proxy robust Bellmn opertor T s used. Note tht T π s no longer truly lner n θ even for lner rchtectures v θ = Φθ s T π Φθ = c, π + ϑσ π P Φθ 104 = c, π + ϑθ Φ p π + ϑ sup q θ, 105 q Φ Û where p π re the smultor trnston probblty vector. However, under the ssumpton tht Û s ncely behved set such s bll or n ellpsod, so tht chngng θ n smll neghborhood does not led to jumps n σ Φ Û θ, we my defne the grdent θ T π Φθ s θ T π Φθ := ϑφ p π + ϑ rg mx q θ 106 q Φ Û = ϑ rg mx q Φ P π q θ. 107 Recll the robust temporl dfference error d for stte wth respect to the proxy set Û s n equton 48 d := c, π + ϑv θ + σûv θ v θ

19 Under the ssumpton tht E φφ s full rnk, we my wrte the MSRPBE loss functon n terms of the robust temporl dfference errors d of equton 48 s n 24: MSRPBEθ = E dφ E φφ 1 E dφ. 109 = 0 becuse of equ- Note tht f E φφ s full rnk, then MSRPBEθ = 0 f nd only f E dφ ton 109. Defne µ P θ := mx y P y v θ = mx y P y Φθ = Φ rg mx y P y θ = rg mx y Φ P y θ 110 for ny convex compct set P R n, so tht the grdent of the MSRPBE loss functon cn be wrtten s 1 φ 2 MSRPBEθ = E ϑµûθ ϑφ φ E φφ 1 E dφ, 111 φ = E ϑµûθ φ w, 112 = E dφ ϑe φ φ w ϑe µûθφ w 113 where w = E φφ 1 E dφ s the sme s n equton 98 nd 24. Therefore, s n 24 we hve n estmtor w k for the weghts w for fxed prmeter θ k s w k+1 := w k + β k dk φ k w k wth the correspondng prmeter θ k beng updted s φ k, 114 θ k+1 := θ k + α k φk ϑµûθ φ k φ k w k robust-gtd2 115 θ k+1 := θ k + α k dk φ k ϑα k φ k + µ Û θφ k w k robust-tdc. 116 Run tme nlyss: Let T n P denote the tme to optmze lner functons over the convex set P for some P R n. Note tht the vlues v θ cn be computed smply n Od tme. Thus the updtes of robust-gtd2 nd robust-tdc cn be computed n O d + T n Û tme. In prtculr f the set Û s smple set lke n ellpsod wth ssocted mtrx A, then the optmum vlue σûv θ s smply θ Φ AΦθ, where Φ s the feture mtrx. In ths cse we only need to compute Φ AΦ once nd store t for future use. However, note tht ths stll tkes tme polynoml n n, whch s undesrble for n d. In ths cse, we need to to mke the ssumpton tht there re good rnk-d pproxmtons to Û.e., A BB for some n d mtrx B. Thus the totl run tme for ech updte n ths cse s Od 2. If the uncertnty set s spherclly symmetrc,.e., bll, then the expresson s smply Φθ 2 nd the robust temporl dfference errors of equton 48 nd the updtes of equton 114 nd 115 cn be vewed smply s regulr updtes of 23 wth n dded nose term Robust stochstc grdent lgorthms wth nonlner rchtectures In ths secton we generlze the results of Secton where we show how to extend the lgorthms of equton 114 nd 115 to the cse when the vlue functon v θ s no longer lner functon of θ. Ths lso generlzes the results of 7 to the robust settng wth correspondng robust nlogues of nonlner GTD2 nd nonlner TDC respectvely. Let M := { v θ θ R d} be the mnfold spnned by ll possble vlue functons nd let PM θ be the tngent plne of M t θ. Let TM θ be the tngent spce,.e., the trnslton of PM θ to the orgn. In other words, TM θ := { Φ θ u u R d}, where Φ θ s n n d mtrx wth entres 19

20 Φ θ, j := θ j v θ. Let Π θ denote the projecton wth to the weghted Euclden norm ξ on to the spce TM θ, so tht Π θ = Φ θ Φ θ ΞΦ θ 1 Φ θ Ξ 117 where Ξ s the n n dgonl mtrx wth entres ξ for X s n Secton 4.1. The men squred projected Bellmn equton MSPBE loss functon consdered by 7 cn then be defned s MSPBEθ = v θ Π θ Tv θ 2 ξ, 118 where we now project to the the tngent spce TM θ. The robust verson of the MSPBE loss functon, the men squred robust projected Bellmn equton MSRPBE loss cn then be defned n terms of the robust Bellmn opertor over the proxy uncertnty set Û MSRPBEθ = v θ Π θ Tv θ 2 ξ, 119 nd under the ssumpton tht E v θ v θ s non-sngulr, ths my be expressed n terms of the robust temporl dfference error d of equton 48 s n 7 nd equton 109: MSRPBEθ = E d vθ E v θ v θ 1 E d vθ, 120 where the expectton s over the sttes X drwn from the dstrbuton ξ. Note tht under the ssumpton tht E v θ v θ s non-sngulr, t follows due to equton 120 tht MSRPBEθ = 0 f nd only f E d vθ = 0. Snce v θ s no longer lner n θ, we need to redefne the grdent µ of σ for ny convex, compct set P s µ P θ := mx y P y v θ = Φ θ rg mx y P y v θ, 121 where Φ θ := v θ. The followng lemm expresses the grdent MSRPBEθ n terms of the robust temporl dfference errors, see Theorem 1 of 7 for the non-robust verson. Lemm 4.5. Assume tht v θ s twce dfferentble wth respect to θ for ny X nd tht Wθ := E v θ v θ s non-sngulr n neghborhood of θ. Let φ := v θ nd defne for ny u R d hθ, u := E d φ u 2 v θ u. 122 Then the grdent of MSRPBE wth respect to θ cn be expressed s 1 2 MSRPBEθ = E φ ϑµûθ ϑφ φ w + hθ, w, 123 where w = E φφ 1 E dφ s before. Proof. The proof s smlr to Theorem 1 of 7 by usng µûθ s the grdent of σûθ. Lemm 4.5 leds us to the followng robust nlogues of nonlner GTD nd nonlner TDC. The updte of the weght estmtors w k s the sme s n equton 114 w k+1 := w k + β k dk φk w k φ k,

21 wth the prmeters θ k beng updted on slower tmescle s { φk } θ k+1 := Γ θ k + α k ϑφ k ϑµ Û θ φk w k h k robust-nonlner-gtd2 125 { } θ k+1 := Γ θ k + α k dk φ k ϑφ k ϑµ Û θφ k w k h k robust-nonlner-tdc, 126 where h k := dk φk k w 2 v θk k w k nd Γ s projecton nto n pproprtely chosen compct set C wth smooth boundry s n 7. As n 7 the mn m of the projecton s to prevent the prmeters to dverge n the erly stges of the lgorthm due to the nonlnertes { n the lgorthm. } In prctce, f C s lrge enough tht t contns the set of ll possble solutons θ E d vθ = 0 then t s qute lkely tht no projectons wll hppen. However, we requre the projecton for the convergence nlyss of the robustnonlner-gtd2 nd robust-nonlner-tdc lgorthms, see Secton Let T n P denote the tme to optmze lner functon over the set P R n. Then the run tme s O d + T n Û. If Û s n ellpsod wth ssocted mtrx A, then n pproxmte optmum my be computed by smplng, f we hve rnk-d pproxmton to A,.e., A BB for some n d mtrx. If Û s spherclly symmetrc, then the σ Û s smply v θ 2 so tht the updtes of equtons 124 nd 115 my be vewed s the regulr updtes of 7 wth n dded nose term Convergence nlyss In ths secton we provde convergence nlyss for the robust-nonlner-gtd2 nd robust-nonlner-tdc lgorthms of equtons 124 nd 125. Note tht ths lso proves convergence of the robust-gtd2 nd robust-tdc lgorthms of equtons 114 nd 115 s specl cse. Gven the set C let CC denote the spce of ll C R d contnuous functons. Defne s n 7 the functon Γ : CC C R d Γθ + ε f θ θ Γ f θ := lm. 127 ε 0 ε Snce Γθ = rg mn θ C θ θ nd the boundry of C s smooth, t follows tht Γ s well defned. Let C denote the nteror of C nd C denote ts boundry so tht C = C \ C. If θ C, then Γvθ = vθ, otherwse Γθ s the projecton of vθ to the tngent spce of C t θ. Consder the followng ODE s n 7: θ = Γ 12 MSRPBE θ, θ0 C 128 { } nd let K be the set of ll stble equlbr of equton 128. Note tht the soluton set θ E dφ = 0 K. The followng theorem shows tht under the ssumpton of Lpschtz contnuous grdents nd sutble ssumptons on the step lengths α k nd β k nd the uncertnty set Û, the updtes of equtons 124 nd 125 converge. Theorem 4.6 Convergence of robust-nonlner-gtd2. Consder the robust nonlner updtes of equtons 124 nd 125 wth step szes tht stsfy k=0 α k = k=0 β k =, k=0 α2 k, k=0 β2 k <, nd α k β k 0 s k. Assume tht for every θ we hve E φ θ φθ s non-sngulr. Also ssume tht the mtrx Φ θ of grdents of the vlue functon defned s Φ θ := v θ s Lpschtz contnuous wth constnt L,.e., Φ θ Φ θ 2 L θ θ 2. Then wth probblty 1, θ k K s k. Proof. The rgument s smlr to the proof of Theorem 2 n 7. The only thng we need to verfy s the Lpschtz contnuty of the robust verson gθ k, w k of the functon gθ k, w k of 7 defned s gθ k, w k := E φ k ϑµûθφk w k h k θ k, w k,

22 Fgure 2: Performnce of robust models wth dfferent szes of confdence regons on two envronments. Left: FrozenLke-v0 Rght: Acrobot-v1 where gθ k, w k s defned s gθ k, w k := E φ k ϑφ k θφ k w k h k θ k, w k, where φ k s the fetures of the stte the smultor trnstons to from stte. Thus we only need to verfy Lpschtz contnuty of µûθ. Let y := rg mx y Û y v θ nd let z := rg mx z Û z v θ. µûθ µûθ 2 = Φ θ y Φ θ z Φ θ y Φ θ y Φ θ Φ θ 2 y Φ θ Φ θ 2 rg mx y Therefore the µûθ s Lpschtz contnuous wth constnt L rg mx y Û y 2. y Û L rg mx y 2 θ θ y Û Corollry 4.7. Under the sme condtons s n Theorem 4.6, the robust-gtd2, robust-tdc nd robustnonlner-tdc lgorthms stsfy wth probblty 1 tht θ k K s k. 5 Experments We mplemented robust versons of Q-lernng, SARSA, nd TDλ-lernng s descrbed n Secton 3 nd evluted ther performnce gnst the nomnl lgorthms usng the OpenAI gym frmework 10. The envronments consdered for the exct dynmc progrmmng lgorthms re the text envronments of FrozenLke-v0, FrozenLke8x8-v0, Tx-v2, Roulette-v0, NChn-v0, s well s the control tsks of CrtPole-v0, CrtPole-v1, InvertedPendulum-v1, together wth the contnuous control tsks of MuJoCo 27. To test the performnce of the robust lgorthms, we perturb the models slghtly by choosng wth smll probblty p rndom stte fter every cton. The sze of the confdence regon U for the robust model s chosen by 10-fold cross vldton usng lne serch. After the Q-tble or the vlue functons re lerned for the robust nd the nomnl lgorthms, we evlute ther performnce on the true envronment. To compre the true lgorthms we compre both the cumultve rewrd s well s the tl dstrbuton functon complementry cumultve dstrbuton functon s n 26 whch for every plots the probblty tht the lgorthm erned rewrd of t lest. 22

23 Fgure 3: Tl dstrbuton nd cumultve rewrds durng trnsent nd sttonry phse of robust vs nomnl Q-lernng on FrozenLke8x8-v0 wth p = Fgure 4: Tl dstrbuton nd cumultve rewrds durng trnsent nd sttonry phse of robust vs nomnl Q-lernng on FrozenLke8x8-v0 wth p = 0.1. Note tht there s trdeoff n the performnce of the robust lgorthms versus the nomnl lgorthms n terms of the vlue p. As the vlue of p ncreses, we expect the robust lgorthm to gn n edge over the nomnl ones s long s Û s stll wthn the smplex n. Once we exceed the smplex n however, the robust lgorthms decys n performnce. Ths s due to the presence of the β term n the convergence results, whch s defned s β := mx mx mn y x 1, 135 X, A y Û nd t grows lrger proportonl to how much the proxy confdence regon Û s outsde n. Note tht whle β s 0, the robust lgorthms converge to the exct Q-fctor nd vlue functon, whle the nomnl lgorthm does not. However, snce lrge vlues of β lso led to suboptml convergence, we lso expect poor performnce for too lrge confdence regons,.e., lrge vlues of p. Fgure 2 depcts how the sze of the confdence regon ffects the performnce of the robust models; note tht the. Note tht the verge score ppers somewht errtc s functon of the sze of the uncertnty set, however ths s due to our smll smple sze used n the lne serch. See Fgures 3, 4, 5, 6, 7, 8, 9, 10, 11, nd 12 for comprson of the best robust model nd the nomnl model. 6 Acknowledgments The uthors would lke to thnk Guy Tennenholtz nd nonymous revewers for helpng mprove the presentton of the pper. References 1 András Antos, Csb Szepesvár, nd Rém Munos. Lernng ner-optml polces wth bellmnresdul mnmzton bsed ftted polcy terton nd sngle smple pth. Mchne Lernng, 711:89 129, x U 23

24 Fgure 5: Tl dstrbuton nd cumultve rewrds durng trnsent nd sttonry phse of robust vs nomnl Q-lernng on FrozenLke-v0 wth p = 0.1. Fgure 6: Tl dstrbuton nd cumultve rewrds durng trnsent nd sttonry phse of robust vs nomnl Q-lernng on CrtPole-v0 wth p = Fgure 7: Tl dstrbuton nd cumultve rewrds durng trnsent nd sttonry phse of robust vs nomnl Q-lernng on CrtPole-v0 wth p = Fgure 8: Tl dstrbuton nd cumultve rewrds durng trnsent nd sttonry phse of robust vs nomnl Q-lernng on CrtPole-v0 wth p =

25 Fgure 9: Tl dstrbuton nd cumultve rewrds durng trnsent nd sttonry phse of robust vs nomnl Q-lernng on CrtPole-v1 wth p = 0.1. Fgure 10: Tl dstrbuton nd cumultve rewrds durng trnsent nd sttonry phse of robust vs nomnl Q-lernng on CrtPole-v1 wth p = 0.3. Fgure 11: Tl dstrbuton nd cumultve rewrds durng trnsent nd sttonry phse of robust vs nomnl Q-lernng on Tx-v2 wth p = 0.1. Fgure 12: Tl dstrbuton nd cumultve rewrds durng trnsent nd sttonry phse of robust vs nomnl Q-lernng on InvertedPendulum-v1 wth p =

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism CS294-40 Lernng for Rootcs nd Control Lecture 10-9/30/2008 Lecturer: Peter Aeel Prtlly Oservle Systems Scre: Dvd Nchum Lecture outlne POMDP formlsm Pont-sed vlue terton Glol methods: polytree, enumerton,

More information

Dennis Bricker, 2001 Dept of Industrial Engineering The University of Iowa. MDP: Taxi page 1

Dennis Bricker, 2001 Dept of Industrial Engineering The University of Iowa. MDP: Taxi page 1 Denns Brcker, 2001 Dept of Industrl Engneerng The Unversty of Iow MDP: Tx pge 1 A tx serves three djcent towns: A, B, nd C. Ech tme the tx dschrges pssenger, the drver must choose from three possble ctons:

More information

Remember: Project Proposals are due April 11.

Remember: Project Proposals are due April 11. Bonformtcs ecture Notes Announcements Remember: Project Proposls re due Aprl. Clss 22 Aprl 4, 2002 A. Hdden Mrov Models. Defntons Emple - Consder the emple we tled bout n clss lst tme wth the cons. However,

More information

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC Introducton Rnk One Updte And the Google Mtrx y Al Bernsten Sgnl Scence, LLC www.sgnlscence.net here re two dfferent wys to perform mtrx multplctons. he frst uses dot product formulton nd the second uses

More information

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Chapter Newton-Raphson Method of Solving a Nonlinear Equation Chpter.4 Newton-Rphson Method of Solvng Nonlner Equton After redng ths chpter, you should be ble to:. derve the Newton-Rphson method formul,. develop the lgorthm of the Newton-Rphson method,. use the Newton-Rphson

More information

UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS. M.Sc. in Economics MICROECONOMIC THEORY I. Problem Set II

UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS. M.Sc. in Economics MICROECONOMIC THEORY I. Problem Set II Mcroeconomc Theory I UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS MSc n Economcs MICROECONOMIC THEORY I Techng: A Lptns (Note: The number of ndctes exercse s dffculty level) ()True or flse? If V( y )

More information

Applied Statistics Qualifier Examination

Applied Statistics Qualifier Examination Appled Sttstcs Qulfer Exmnton Qul_june_8 Fll 8 Instructons: () The exmnton contns 4 Questons. You re to nswer 3 out of 4 of them. () You my use ny books nd clss notes tht you mght fnd helpful n solvng

More information

Math 497C Sep 17, Curves and Surfaces Fall 2004, PSU

Math 497C Sep 17, Curves and Surfaces Fall 2004, PSU Mth 497C Sep 17, 004 1 Curves nd Surfces Fll 004, PSU Lecture Notes 3 1.8 The generl defnton of curvture; Fox-Mlnor s Theorem Let α: [, b] R n be curve nd P = {t 0,...,t n } be prtton of [, b], then the

More information

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Chapter Newton-Raphson Method of Solving a Nonlinear Equation Chpter 0.04 Newton-Rphson Method o Solvng Nonlner Equton Ater redng ths chpter, you should be ble to:. derve the Newton-Rphson method ormul,. develop the lgorthm o the Newton-Rphson method,. use the Newton-Rphson

More information

Lecture 4: Piecewise Cubic Interpolation

Lecture 4: Piecewise Cubic Interpolation Lecture notes on Vrtonl nd Approxmte Methods n Appled Mthemtcs - A Perce UBC Lecture 4: Pecewse Cubc Interpolton Compled 6 August 7 In ths lecture we consder pecewse cubc nterpolton n whch cubc polynoml

More information

Variable time amplitude amplification and quantum algorithms for linear algebra. Andris Ambainis University of Latvia

Variable time amplitude amplification and quantum algorithms for linear algebra. Andris Ambainis University of Latvia Vrble tme mpltude mplfcton nd quntum lgorthms for lner lgebr Andrs Ambns Unversty of Ltv Tlk outlne. ew verson of mpltude mplfcton;. Quntum lgorthm for testng f A s sngulr; 3. Quntum lgorthm for solvng

More information

Review of linear algebra. Nuno Vasconcelos UCSD

Review of linear algebra. Nuno Vasconcelos UCSD Revew of lner lgebr Nuno Vsconcelos UCSD Vector spces Defnton: vector spce s set H where ddton nd sclr multplcton re defned nd stsf: ) +( + ) (+ )+ 5) λ H 2) + + H 6) 3) H, + 7) λ(λ ) (λλ ) 4) H, - + 8)

More information

4. Eccentric axial loading, cross-section core

4. Eccentric axial loading, cross-section core . Eccentrc xl lodng, cross-secton core Introducton We re strtng to consder more generl cse when the xl force nd bxl bendng ct smultneousl n the cross-secton of the br. B vrtue of Snt-Vennt s prncple we

More information

Online Appendix to. Mandating Behavioral Conformity in Social Groups with Conformist Members

Online Appendix to. Mandating Behavioral Conformity in Social Groups with Conformist Members Onlne Appendx to Mndtng Behvorl Conformty n Socl Groups wth Conformst Members Peter Grzl Andrze Bnk (Correspondng uthor) Deprtment of Economcs, The Wllms School, Wshngton nd Lee Unversty, Lexngton, 4450

More information

Definition of Tracking

Definition of Tracking Trckng Defnton of Trckng Trckng: Generte some conclusons bout the moton of the scene, objects, or the cmer, gven sequence of mges. Knowng ths moton, predct where thngs re gong to project n the net mge,

More information

Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service

Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service Dynmc Power Mngement n Moble Multmed System wth Gurnteed Qulty-of-Servce Qnru Qu, Qng Wu, nd Mssoud Pedrm Dept. of Electrcl Engneerng-Systems Unversty of Southern Clforn Los Angeles CA 90089 Outlne! Introducton

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 9

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 9 CS434/541: Pttern Recognton Prof. Olg Veksler Lecture 9 Announcements Fnl project proposl due Nov. 1 1-2 prgrph descrpton Lte Penlt: s 1 pont off for ech d lte Assgnment 3 due November 10 Dt for fnl project

More information

An Introduction to Support Vector Machines

An Introduction to Support Vector Machines An Introducton to Support Vector Mchnes Wht s good Decson Boundry? Consder two-clss, lnerly seprble clssfcton problem Clss How to fnd the lne (or hyperplne n n-dmensons, n>)? Any de? Clss Per Lug Mrtell

More information

The Schur-Cohn Algorithm

The Schur-Cohn Algorithm Modelng, Estmton nd Otml Flterng n Sgnl Processng Mohmed Njm Coyrght 8, ISTE Ltd. Aendx F The Schur-Cohn Algorthm In ths endx, our m s to resent the Schur-Cohn lgorthm [] whch s often used s crteron for

More information

Katholieke Universiteit Leuven Department of Computer Science

Katholieke Universiteit Leuven Department of Computer Science Updte Rules for Weghted Non-negtve FH*G Fctorzton Peter Peers Phlp Dutré Report CW 440, Aprl 006 Ktholeke Unverstet Leuven Deprtment of Computer Scence Celestjnenln 00A B-3001 Heverlee (Belgum) Updte Rules

More information

Principle Component Analysis

Principle Component Analysis Prncple Component Anlyss Jng Go SUNY Bufflo Why Dmensonlty Reducton? We hve too mny dmensons o reson bout or obtn nsghts from o vsulze oo much nose n the dt Need to reduce them to smller set of fctors

More information

DCDM BUSINESS SCHOOL NUMERICAL METHODS (COS 233-8) Solutions to Assignment 3. x f(x)

DCDM BUSINESS SCHOOL NUMERICAL METHODS (COS 233-8) Solutions to Assignment 3. x f(x) DCDM BUSINESS SCHOOL NUMEICAL METHODS (COS -8) Solutons to Assgnment Queston Consder the followng dt: 5 f() 8 7 5 () Set up dfference tble through fourth dfferences. (b) Wht s the mnmum degree tht n nterpoltng

More information

Many-Body Calculations of the Isotope Shift

Many-Body Calculations of the Isotope Shift Mny-Body Clcultons of the Isotope Shft W. R. Johnson Mrch 11, 1 1 Introducton Atomc energy levels re commonly evluted ssumng tht the nucler mss s nfnte. In ths report, we consder correctons to tomc levels

More information

6 Roots of Equations: Open Methods

6 Roots of Equations: Open Methods HK Km Slghtly modfed 3//9, /8/6 Frstly wrtten t Mrch 5 6 Roots of Equtons: Open Methods Smple Fed-Pont Iterton Newton-Rphson Secnt Methods MATLAB Functon: fzero Polynomls Cse Study: Ppe Frcton Brcketng

More information

We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and

We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and MANAGEMENT SCIENCE Vol. 53, No. 2, Februry 2007, pp. 308 322 ssn 0025-1909 essn 1526-5501 07 5302 0308 nforms do 10.1287/mnsc.1060.0614 2007 INFORMS Bs nd Vrnce Approxmton n Vlue Functon Estmtes She Mnnor

More information

Two Coefficients of the Dyson Product

Two Coefficients of the Dyson Product Two Coeffcents of the Dyson Product rxv:07.460v mth.co 7 Nov 007 Lun Lv, Guoce Xn, nd Yue Zhou 3,,3 Center for Combntorcs, LPMC TJKLC Nnk Unversty, Tnjn 30007, P.R. Chn lvlun@cfc.nnk.edu.cn gn@nnk.edu.cn

More information

Statistics and Probability Letters

Statistics and Probability Letters Sttstcs nd Probblty Letters 79 (2009) 105 111 Contents lsts vlble t ScenceDrect Sttstcs nd Probblty Letters journl homepge: www.elsever.com/locte/stpro Lmtng behvour of movng verge processes under ϕ-mxng

More information

Quiz: Experimental Physics Lab-I

Quiz: Experimental Physics Lab-I Mxmum Mrks: 18 Totl tme llowed: 35 mn Quz: Expermentl Physcs Lb-I Nme: Roll no: Attempt ll questons. 1. In n experment, bll of mss 100 g s dropped from heght of 65 cm nto the snd contner, the mpct s clled

More information

INTRODUCTION TO COMPLEX NUMBERS

INTRODUCTION TO COMPLEX NUMBERS INTRODUCTION TO COMPLEX NUMBERS The numers -4, -3, -, -1, 0, 1,, 3, 4 represent the negtve nd postve rel numers termed ntegers. As one frst lerns n mddle school they cn e thought of s unt dstnce spced

More information

CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVEYS

CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVEYS CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVES Rodolphe Prm, Ntle Shlomo Southmpton Sttstcl Scences Reserch Insttute Unverst of Southmpton Unted Kngdom SAE, August 20 The BLUE-ETS Project s fnnced

More information

Improving Anytime Point-Based Value Iteration Using Principled Point Selections

Improving Anytime Point-Based Value Iteration Using Principled Point Selections In In Proceedngs of the Twenteth Interntonl Jont Conference on Artfcl Intellgence (IJCAI-7) Improvng Anytme Pont-Bsed Vlue Iterton Usng Prncpled Pont Selectons Mchel R. Jmes, Mchel E. Smples, nd Dmtr A.

More information

International Journal of Pure and Applied Sciences and Technology

International Journal of Pure and Applied Sciences and Technology Int. J. Pure Appl. Sc. Technol., () (), pp. 44-49 Interntonl Journl of Pure nd Appled Scences nd Technolog ISSN 9-67 Avlle onlne t www.jopst.n Reserch Pper Numercl Soluton for Non-Lner Fredholm Integrl

More information

Introduction to Numerical Integration Part II

Introduction to Numerical Integration Part II Introducton to umercl Integrton Prt II CS 75/Mth 75 Brn T. Smth, UM, CS Dept. Sprng, 998 4/9/998 qud_ Intro to Gussn Qudrture s eore, the generl tretment chnges the ntegrton prolem to ndng the ntegrl w

More information

Computing a complete histogram of an image in Log(n) steps and minimum expected memory requirements using hypercubes

Computing a complete histogram of an image in Log(n) steps and minimum expected memory requirements using hypercubes Computng complete hstogrm of n mge n Log(n) steps nd mnmum expected memory requrements usng hypercubes TAREK M. SOBH School of Engneerng, Unversty of Brdgeport, Connectcut, USA. Abstrct Ths work frst revews

More information

Electrochemical Thermodynamics. Interfaces and Energy Conversion

Electrochemical Thermodynamics. Interfaces and Energy Conversion CHE465/865, 2006-3, Lecture 6, 18 th Sep., 2006 Electrochemcl Thermodynmcs Interfces nd Energy Converson Where does the energy contrbuton F zϕ dn come from? Frst lw of thermodynmcs (conservton of energy):

More information

Statistics 423 Midterm Examination Winter 2009

Statistics 423 Midterm Examination Winter 2009 Sttstcs 43 Mdterm Exmnton Wnter 009 Nme: e-ml: 1. Plese prnt your nme nd e-ml ddress n the bove spces.. Do not turn ths pge untl nstructed to do so. 3. Ths s closed book exmnton. You my hve your hnd clcultor

More information

Machine Learning Support Vector Machines SVM

Machine Learning Support Vector Machines SVM Mchne Lernng Support Vector Mchnes SVM Lesson 6 Dt Clssfcton problem rnng set:, D,,, : nput dt smple {,, K}: clss or lbel of nput rget: Construct functon f : X Y f, D Predcton of clss for n unknon nput

More information

Multiple view geometry

Multiple view geometry EECS 442 Computer vson Multple vew geometry Perspectve Structure from Moton - Perspectve structure from moton prolem - mgutes - lgerc methods - Fctorzton methods - Bundle djustment - Self-clrton Redng:

More information

ON SIMPSON S INEQUALITY AND APPLICATIONS. 1. Introduction The following inequality is well known in the literature as Simpson s inequality : 2 1 f (4)

ON SIMPSON S INEQUALITY AND APPLICATIONS. 1. Introduction The following inequality is well known in the literature as Simpson s inequality : 2 1 f (4) ON SIMPSON S INEQUALITY AND APPLICATIONS SS DRAGOMIR, RP AGARWAL, AND P CERONE Abstrct New neultes of Smpson type nd ther pplcton to udrture formule n Numercl Anlyss re gven Introducton The followng neulty

More information

GAUSS ELIMINATION. Consider the following system of algebraic linear equations

GAUSS ELIMINATION. Consider the following system of algebraic linear equations Numercl Anlyss for Engneers Germn Jordnn Unversty GAUSS ELIMINATION Consder the followng system of lgebrc lner equtons To solve the bove system usng clsscl methods, equton () s subtrcted from equton ()

More information

Linear and Nonlinear Optimization

Linear and Nonlinear Optimization Lner nd Nonlner Optmzton Ynyu Ye Deprtment of Mngement Scence nd Engneerng Stnford Unversty Stnford, CA 9430, U.S.A. http://www.stnford.edu/~yyye http://www.stnford.edu/clss/msnde/ Ynyu Ye, Stnford, MS&E

More information

A Tri-Valued Belief Network Model for Information Retrieval

A Tri-Valued Belief Network Model for Information Retrieval December 200 A Tr-Vlued Belef Networ Model for Informton Retrevl Fernndo Ds-Neves Computer Scence Dept. Vrgn Polytechnc Insttute nd Stte Unversty Blcsburg, VA 24060. IR models t Combnng Evdence Grphcl

More information

THE COMBINED SHEPARD ABEL GONCHAROV UNIVARIATE OPERATOR

THE COMBINED SHEPARD ABEL GONCHAROV UNIVARIATE OPERATOR REVUE D ANALYSE NUMÉRIQUE ET DE THÉORIE DE L APPROXIMATION Tome 32, N o 1, 2003, pp 11 20 THE COMBINED SHEPARD ABEL GONCHAROV UNIVARIATE OPERATOR TEODORA CĂTINAŞ Abstrct We extend the Sheprd opertor by

More information

7.2 Volume. A cross section is the shape we get when cutting straight through an object.

7.2 Volume. A cross section is the shape we get when cutting straight through an object. 7. Volume Let s revew the volume of smple sold, cylnder frst. Cylnder s volume=se re heght. As llustrted n Fgure (). Fgure ( nd (c) re specl cylnders. Fgure () s rght crculr cylnder. Fgure (c) s ox. A

More information

18.7 Artificial Neural Networks

18.7 Artificial Neural Networks 310 18.7 Artfcl Neurl Networks Neuroscence hs hypotheszed tht mentl ctvty conssts prmrly of electrochemcl ctvty n networks of brn cells clled neurons Ths led McCulloch nd Ptts to devse ther mthemtcl model

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

3/6/00. Reading Assignments. Outline. Hidden Markov Models: Explanation and Model Learning

3/6/00. Reading Assignments. Outline. Hidden Markov Models: Explanation and Model Learning 3/6/ Hdden Mrkov Models: Explnton nd Model Lernng Brn C. Wllms 6.4/6.43 Sesson 2 9/3/ courtesy of JPL copyrght Brn Wllms, 2 Brn C. Wllms, copyrght 2 Redng Assgnments AIMA (Russell nd Norvg) Ch 5.-.3, 2.3

More information

Jean Fernand Nguema LAMETA UFR Sciences Economiques Montpellier. Abstract

Jean Fernand Nguema LAMETA UFR Sciences Economiques Montpellier. Abstract Stochstc domnnce on optml portfolo wth one rsk less nd two rsky ssets Jen Fernnd Nguem LAMETA UFR Scences Economques Montpeller Abstrct The pper provdes restrctons on the nvestor's utlty functon whch re

More information

CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION. Indu Manickam, Andrew S. Lan, and Richard G.

CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION. Indu Manickam, Andrew S. Lan, and Richard G. CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION Indu Mnckm, Andrew S. Ln, nd Rchrd G. Brnuk Rce Unversty ABSTRACT Optmzng the selecton of lernng resources nd prctce

More information

In this Chapter. Chap. 3 Markov chains and hidden Markov models. Probabilistic Models. Example: CpG Islands

In this Chapter. Chap. 3 Markov chains and hidden Markov models. Probabilistic Models. Example: CpG Islands In ths Chpter Chp. 3 Mrov chns nd hdden Mrov models Bontellgence bortory School of Computer Sc. & Eng. Seoul Ntonl Unversty Seoul 5-74, Kore The probblstc model for sequence nlyss HMM (hdden Mrov model)

More information

On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization

On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization On-lne Renforcement Lernng Usng Incrementl Kernel-Bsed Stochstc Fctorzton André M. S. Brreto School of Computer Scence McGll Unversty Montrel, Cnd msb@cs.mcgll.c Don Precup School of Computer Scence McGll

More information

Chapter 5 Supplemental Text Material R S T. ij i j ij ijk

Chapter 5 Supplemental Text Material R S T. ij i j ij ijk Chpter 5 Supplementl Text Mterl 5-. Expected Men Squres n the Two-fctor Fctorl Consder the two-fctor fxed effects model y = µ + τ + β + ( τβ) + ε k R S T =,,, =,,, k =,,, n gven s Equton (5-) n the textook.

More information

Mixed Type Duality for Multiobjective Variational Problems

Mixed Type Duality for Multiobjective Variational Problems Ž. ournl of Mthemtcl Anlyss nd Applctons 252, 571 586 2000 do:10.1006 m.2000.7000, vlle onlne t http: www.delrry.com on Mxed Type Dulty for Multoectve Vrtonl Prolems R. N. Mukheree nd Ch. Purnchndr Ro

More information

Research Article On the Upper Bounds of Eigenvalues for a Class of Systems of Ordinary Differential Equations with Higher Order

Research Article On the Upper Bounds of Eigenvalues for a Class of Systems of Ordinary Differential Equations with Higher Order Hndw Publshng Corporton Interntonl Journl of Dfferentl Equtons Volume 0, Artcle ID 7703, pges do:055/0/7703 Reserch Artcle On the Upper Bounds of Egenvlues for Clss of Systems of Ordnry Dfferentl Equtons

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Strong Gravity and the BKL Conjecture

Strong Gravity and the BKL Conjecture Introducton Strong Grvty nd the BKL Conecture Dvd Slon Penn Stte October 16, 2007 Dvd Slon Strong Grvty nd the BKL Conecture Introducton Outlne The BKL Conecture Ashtekr Vrbles Ksner Sngulrty 1 Introducton

More information

Lecture notes. Fundamental inequalities: techniques and applications

Lecture notes. Fundamental inequalities: techniques and applications Lecture notes Fundmentl nequltes: technques nd pplctons Mnh Hong Duong Mthemtcs Insttute, Unversty of Wrwck Eml: m.h.duong@wrwck.c.uk Jnury 4, 07 Abstrct Inequltes re ubqutous n Mthemtcs (nd n rel lfe.

More information

Reinforcement Learning with a Gaussian Mixture Model

Reinforcement Learning with a Gaussian Mixture Model Renforcement Lernng wth Gussn Mxture Model Alejndro Agostn, Member, IEEE nd Enrc Cely Abstrct Recent pproches to Renforcement Lernng (RL) wth functon pproxmton nclude Neurl Ftted Q Iterton nd the use of

More information

Least squares. Václav Hlaváč. Czech Technical University in Prague

Least squares. Václav Hlaváč. Czech Technical University in Prague Lest squres Václv Hlváč Czech echncl Unversty n Prgue hlvc@fel.cvut.cz http://cmp.felk.cvut.cz/~hlvc Courtesy: Fred Pghn nd J.P. Lews, SIGGRAPH 2007 Course; Outlne 2 Lner regresson Geometry of lest-squres

More information

A Family of Multivariate Abel Series Distributions. of Order k

A Family of Multivariate Abel Series Distributions. of Order k Appled Mthemtcl Scences, Vol. 2, 2008, no. 45, 2239-2246 A Fmly of Multvrte Abel Seres Dstrbutons of Order k Rupk Gupt & Kshore K. Ds 2 Fculty of Scence & Technology, The Icf Unversty, Agrtl, Trpur, Ind

More information

Simultaneous estimation of rewards and dynamics from noisy expert demonstrations

Simultaneous estimation of rewards and dynamics from noisy expert demonstrations Smultneous estmton of rewrds nd dynmcs from nosy expert demonstrtons Mchel Hermn,2, Tobs Gndele, Jo rg Wgner, Felx Schmtt, nd Wolfrm Burgrd2 - Robert Bosch GmbH - 70442 Stuttgrt - Germny 2- Unversty of

More information

Lecture 36. Finite Element Methods

Lecture 36. Finite Element Methods CE 60: Numercl Methods Lecture 36 Fnte Element Methods Course Coordntor: Dr. Suresh A. Krth, Assocte Professor, Deprtment of Cvl Engneerng, IIT Guwht. In the lst clss, we dscussed on the ppromte methods

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

consider in the case of 1) internal resonance ω 2ω and 2) external resonance Ω ω and small damping

consider in the case of 1) internal resonance ω 2ω and 2) external resonance Ω ω and small damping consder n the cse o nternl resonnce nd externl resonnce Ω nd smll dmpng recll rom "Two_Degs_Frdm_.ppt" tht θ + μ θ + θ = θφ + cos Ω t + τ where = k α α nd φ + μ φ + φ = θ + cos Ω t where = α τ s constnt

More information

Pyramid Algorithms for Barycentric Rational Interpolation

Pyramid Algorithms for Barycentric Rational Interpolation Pyrmd Algorthms for Brycentrc Rtonl Interpolton K Hormnn Scott Schefer Astrct We present new perspectve on the Floter Hormnn nterpolnt. Ths nterpolnt s rtonl of degree (n, d), reproduces polynomls of degree

More information

Using Predictions in Online Optimization: Looking Forward with an Eye on the Past

Using Predictions in Online Optimization: Looking Forward with an Eye on the Past Usng Predctons n Onlne Optmzton: Lookng Forwrd wth n Eye on the Pst Nngjun Chen Jont work wth Joshu Comden, Zhenhu Lu, Anshul Gndh, nd Adm Wermn 1 Predctons re crucl for decson mkng 2 Predctons re crucl

More information

p (i.e., the set of all nonnegative real numbers). Similarly, Z will denote the set of all

p (i.e., the set of all nonnegative real numbers). Similarly, Z will denote the set of all th Prelmnry E 689 Lecture Notes by B. Yo 0. Prelmnry Notton themtcl Prelmnres It s ssumed tht the reder s fmlr wth the noton of set nd ts elementry oertons, nd wth some bsc logc oertors, e.g. x A : x s

More information

The Number of Rows which Equal Certain Row

The Number of Rows which Equal Certain Row Interntonl Journl of Algebr, Vol 5, 011, no 30, 1481-1488 he Number of Rows whch Equl Certn Row Ahmd Hbl Deprtment of mthemtcs Fcult of Scences Dmscus unverst Dmscus, Sr hblhmd1@gmlcom Abstrct Let be X

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Proof that if Voting is Perfect in One Dimension, then the First. Eigenvector Extracted from the Double-Centered Transformed

Proof that if Voting is Perfect in One Dimension, then the First. Eigenvector Extracted from the Double-Centered Transformed Proof tht f Votng s Perfect n One Dmenson, then the Frst Egenvector Extrcted from the Doule-Centered Trnsformed Agreement Score Mtrx hs the Sme Rn Orderng s the True Dt Keth T Poole Unversty of Houston

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

STURM-LIOUVILLE BOUNDARY VALUE PROBLEMS

STURM-LIOUVILLE BOUNDARY VALUE PROBLEMS STURM-LIOUVILLE BOUNDARY VALUE PROBLEMS Throughout, we let [, b] be bounded intervl in R. C 2 ([, b]) denotes the spce of functions with derivtives of second order continuous up to the endpoints. Cc 2

More information

Bi-level models for OD matrix estimation

Bi-level models for OD matrix estimation TNK084 Trffc Theory seres Vol.4, number. My 2008 B-level models for OD mtrx estmton Hn Zhng, Quyng Meng Abstrct- Ths pper ntroduces two types of O/D mtrx estmton model: ME2 nd Grdent. ME2 s mxmum-entropy

More information

Online Stochastic Matching: New Algorithms with Better Bounds

Online Stochastic Matching: New Algorithms with Better Bounds Onlne Stochstc Mtchng: New Algorthms wth Better Bounds Ptrck Jllet Xn Lu My 202; revsed Jnury 203, June 203 Abstrct We consder vrnts of the onlne stochstc bprtte mtchng problem motvted by Internet dvertsng

More information

Scatterplot Smoothing

Scatterplot Smoothing 1 Sttstcs 540, Smoothng Sctterplot Smoothng Overvew Problem... The usul settng for sctterplot smoothng s the delzed regresson model y = f(x )+σɛ, ɛ N(0, 1), =1,...,n, where the observtons (x,y ) re ndependent.

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

Study of Trapezoidal Fuzzy Linear System of Equations S. M. Bargir 1, *, M. S. Bapat 2, J. D. Yadav 3 1

Study of Trapezoidal Fuzzy Linear System of Equations S. M. Bargir 1, *, M. S. Bapat 2, J. D. Yadav 3 1 mercn Interntonl Journl of Reserch n cence Technology Engneerng & Mthemtcs vlble onlne t http://wwwsrnet IN (Prnt: 38-349 IN (Onlne: 38-3580 IN (CD-ROM: 38-369 IJRTEM s refereed ndexed peer-revewed multdscplnry

More information

Numerische Mathematik

Numerische Mathematik Numer. Mth. (2003) 95: 427 457 Dgtl Object Identfer (DOI) 10.1007/s00211-002-0429-6 Numersche Mthemtk Addtve Schwrz Methods for Ellptc Mortr Fnte Element Problems Petter E. Bjørstd 1,, Mksymln Dryj 2,,

More information

Model Fitting and Robust Regression Methods

Model Fitting and Robust Regression Methods Dertment o Comuter Engneerng Unverst o Clorn t Snt Cruz Model Fttng nd Robust Regresson Methods CMPE 64: Imge Anlss nd Comuter Vson H o Fttng lnes nd ellses to mge dt Dertment o Comuter Engneerng Unverst

More information

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. with respect to λ. 1. χ λ χ λ ( ) λ, and thus:

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. with respect to λ. 1. χ λ χ λ ( ) λ, and thus: More on χ nd errors : uppose tht we re fttng for sngle -prmeter, mnmzng: If we epnd The vlue χ ( ( ( ; ( wth respect to. χ n Tlor seres n the vcnt of ts mnmum vlue χ ( mn χ χ χ χ + + + mn mnmzes χ, nd

More information

Math 426: Probability Final Exam Practice

Math 426: Probability Final Exam Practice Mth 46: Probbility Finl Exm Prctice. Computtionl problems 4. Let T k (n) denote the number of prtitions of the set {,..., n} into k nonempty subsets, where k n. Argue tht T k (n) kt k (n ) + T k (n ) by

More information

523 P a g e. is measured through p. should be slower for lesser values of p and faster for greater values of p. If we set p*

523 P a g e. is measured through p. should be slower for lesser values of p and faster for greater values of p. If we set p* R. Smpth Kumr, R. Kruthk, R. Rdhkrshnn / Interntonl Journl of Engneerng Reserch nd Applctons (IJERA) ISSN: 48-96 www.jer.com Vol., Issue 4, July-August 0, pp.5-58 Constructon Of Mxed Smplng Plns Indexed

More information

Optimal Resource Allocation and Policy Formulation in Loosely-Coupled Markov Decision Processes

Optimal Resource Allocation and Policy Formulation in Loosely-Coupled Markov Decision Processes Optml Resource Allocton nd Polcy Formulton n Loosely-Coupled Mrkov Decson Processes Dmtr A. Dolgov nd Edmund H. Durfee Deprtment of Electrcl Engneerng nd Computer Scence Unversty of Mchgn Ann Arbor, MI

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

COMPLEX NUMBER & QUADRATIC EQUATION

COMPLEX NUMBER & QUADRATIC EQUATION MCQ COMPLEX NUMBER & QUADRATIC EQUATION Syllus : Comple numers s ordered prs of rels, Representton of comple numers n the form + nd ther representton n plne, Argnd dgrm, lger of comple numers, modulus

More information

ESCI 342 Atmospheric Dynamics I Lesson 1 Vectors and Vector Calculus

ESCI 342 Atmospheric Dynamics I Lesson 1 Vectors and Vector Calculus ESI 34 tmospherc Dnmcs I Lesson 1 Vectors nd Vector lculus Reference: Schum s Outlne Seres: Mthemtcl Hndbook of Formuls nd Tbles Suggested Redng: Mrtn Secton 1 OORDINTE SYSTEMS n orthonorml coordnte sstem

More information

90 S.S. Drgomr nd (t b)du(t) =u()(b ) u(t)dt: If we dd the bove two equltes, we get (.) u()(b ) u(t)dt = p(; t)du(t) where p(; t) := for ll ; t [; b]:

90 S.S. Drgomr nd (t b)du(t) =u()(b ) u(t)dt: If we dd the bove two equltes, we get (.) u()(b ) u(t)dt = p(; t)du(t) where p(; t) := for ll ; t [; b]: RGMIA Reserch Report Collecton, Vol., No. 1, 1999 http://sc.vu.edu.u/οrgm ON THE OSTROWSKI INTEGRAL INEQUALITY FOR LIPSCHITZIAN MAPPINGS AND APPLICATIONS S.S. Drgomr Abstrct. A generlzton of Ostrowsk's

More information

Advanced Machine Learning. An Ising model on 2-D image

Advanced Machine Learning. An Ising model on 2-D image Advnced Mchne Lernng Vrtonl Inference Erc ng Lecture 12, August 12, 2009 Redng: Erc ng Erc ng @ CMU, 2006-2009 1 An Isng model on 2-D mge odes encode hdden nformton ptchdentty. They receve locl nformton

More information

CHI-SQUARE DIVERGENCE AND MINIMIZATION PROBLEM

CHI-SQUARE DIVERGENCE AND MINIMIZATION PROBLEM CHI-SQUARE DIVERGENCE AND MINIMIZATION PROBLEM PRANESH KUMAR AND INDER JEET TANEJA Abstrct The mnmum dcrmnton nformton prncple for the Kullbck-Lebler cross-entropy well known n the lterture In th pper

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

M/G/1/GD/ / System. ! Pollaczek-Khinchin (PK) Equation. ! Steady-state probabilities. ! Finding L, W q, W. ! π 0 = 1 ρ

M/G/1/GD/ / System. ! Pollaczek-Khinchin (PK) Equation. ! Steady-state probabilities. ! Finding L, W q, W. ! π 0 = 1 ρ M/G//GD/ / System! Pollcze-Khnchn (PK) Equton L q 2 2 λ σ s 2( + ρ ρ! Stedy-stte probbltes! π 0 ρ! Fndng L, q, ) 2 2 M/M/R/GD/K/K System! Drw the trnston dgrm! Derve the stedy-stte probbltes:! Fnd L,L

More information

Group-based active query selection. for rapid diagnosis in time-critical situations

Group-based active query selection. for rapid diagnosis in time-critical situations Group-bsed ctve query selecton for rpd dgnoss n tme-crtcl stutons *Gowthm Belll, Student Member, IEEE, Suresh K. Bhvnn, nd Clyton Scott, Member, IEEE Abstrct In pplctons such s ctve lernng or dsese/fult

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

MODIFIED CHOLESKY FACTORIZATIONS IN INTERIOR-POINT ALGORITHMS FOR LINEAR PROGRAMMING. STEPHEN J. WRIGHT y

MODIFIED CHOLESKY FACTORIZATIONS IN INTERIOR-POINT ALGORITHMS FOR LINEAR PROGRAMMING. STEPHEN J. WRIGHT y MODIFIED CHOLESKY FACTORIZATIONS IN INTERIOR-POINT ALGORITHMS FOR LINEAR PROGRAMMING STEPHEN J. WRIGHT y Abstrct. We nvestgte moded Cholesky lgorthm typcl of those used n most nterorpont codes for lner

More information

A New Algorithm Linear Programming

A New Algorithm Linear Programming A New Algorthm ner Progrmmng Dhnnjy P. ehendle Sr Prshurmhu College, Tlk Rod, Pune-400, Ind dhnnjy.p.mehendle@gml.com Astrct In ths pper we propose two types of new lgorthms for lner progrmmng. The frst

More information

Soft Set Theoretic Approach for Dimensionality Reduction 1

Soft Set Theoretic Approach for Dimensionality Reduction 1 Interntonl Journl of Dtbse Theory nd pplcton Vol No June 00 Soft Set Theoretc pproch for Dmensonlty Reducton Tutut Herwn Rozd Ghzl Mustf Mt Ders Deprtment of Mthemtcs Educton nversts hmd Dhln Yogykrt Indones

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information