Factorized Decision Forecasting via Combining Value-based and Reward-based Estimation

Size: px
Start display at page:

Download "Factorized Decision Forecasting via Combining Value-based and Reward-based Estimation"

Transcription

1 Fcorized Decision Forecsing vi Combining Vlue-bsed nd Rewrd-bsed Esimion Brin D. Ziebr Crnegie Mellon Universiy Pisburgh, PA Absrc A powerful recen perspecive for predicing sequenil decisions lerns he prmeers of decision problems h produce observed behvior s (ner) opiml soluions. Under his perspecive, behvior is explined in erms of uiliies, which cn ofen be defined s funcions of se nd cion feures o enble generlizion cross decision sks. Two pproches hve been proposed from his perspecive: esime feure-bsed rewrd funcion nd recursively compue vlues from i, or direcly esime feure-bsed vlue funcion. In his work, we invesige he combinion of hese wo pproches ino single lerning sk using direced informion heory nd he principle of mximum enropy. This enbles uncovering which ype of esime is mos pproprie in erms of predicive ccurcy nd/or compuionl benefi for differen porions of he decision spce. I. INTRODUCTION For mny sks, humn-genered decisions re he bes exmples of inelligen behvior vilble nd re vluble source for rining mchines o behve inelligenly. Erly pproches for lerning o imie inelligen behvior direcly esime observed policies s behvior cloning sk. Perhps he mos successful exmple is ALVINN 20, he uonomous vehicle rined o conrol he driving wheel s seering ngle using neurl nework wih forwrd-mouned video feeds s inpu. This reduces decision predicion o clssificion sk. Though successful in keeping he vehicle sfely on he rodwy, his model of decision mking s solely he response o immedie sensor d is limied in is cpbiliies. For exmple, i would no be successful in choosing sequence of rodwys leding o priculr new desinion. In oher words, i is incpble of higher-level decision mking, like sequenil plnning. More recen echniques for lerning nd predicing sequenil decisions hve conneced he lerning sk o decision heory nd conrol heory frmeworks. In hese frmeworks, n insnneous rewrd is received by king priculr cion in priculr se nd i is reled o he expeced cion vlue of cumulive fuure rewrds for ech secion combinion ccording o he Bellmn equion 3. Inverse opiml conrol 13, 4, 19, 1 emps o find rewrds nd vlues h explin observed behvior, following Klmn s erly quesion: When is liner conrol sysem opiml 13? Unforunely, he quesion is ill-posed; mny choices for he rewrd funcion will mke observed sequences of decisions opiml, including degenercies h mke ll decision sequences eqully good 19, bu re uninformive. A number of echniques hve been developed h resolve he mbiguiies of his originl quesion. Mos hve focused on esiming he rewrd funcion bsed on feures of he se nd cion 5, 1, 22, 2, 21, 18, 25, 27. However, recen pproch hs employed his sme perspecive o lern he vlue funcion bsed on se nd cion feures 15. Inverse opiml conrol pproches hve been successfully employed for vehicle 29 nd roboic nvigion pplicions 30, 23, s well s cogniive science models 2. In his pper, we invesige new pproch for combining rewrd-bsed nd vlue-bsed decision lerning. We ssume h he desirbiliy of ech se s cions moives observed behvior eiher s vlue ( funcion of se-cion or se feures h is he ulime moivor of cions) or s cos ( combinion of direc funcion of se-cion feures nd he influence of fuure expeced feures in he decision process i.e., fuure vlue). Porions of he se spce h re vlue-bsed effecively fcorize he decision forecsing sk, since decision leding o hose ses do no consider he fuure beyond he vlue-bsed cions of hose ses. This fcorizion cn drmiclly improve he compuionl efficiency of inverse opiml conrol while reining mny of he dvnges of non-myopic resoning over smller cos-bsed porions of he decision spce. We formule he combinion of rewrd-bsed nd vluebsed inverse opiml conrol s mximum cusl enropy opimizion 27. We hen invesige prmeer nd srucure lerning sks, s well s predicion sks, under he resuling model. We show h given he se-bsed fcorizion of he decision problem, rewrd nd vlue prmeers cn be efficienly lerned s convex opimizion. We hen pose he problem of deciding wheher ech se influences decisions s vlue-bsed or s rewrd-bsed fcor from Byesin perspecive. We presen srucure lerning lgorihms for obining he poserior belief of vlue-bsed versus rewrd-bsed influence for ech se from demonsred behvior nd known vlue nd rewrd weighs. To ccomplish his, we inroduce echnique bsed on Mrkov chin Mone Crlo simulion 8. Combining hese wo procedures, we inroduce n expecion-mximizion 6 pproch for lerning he combined influence ype of ses nd he corresponding rewrd nd vlue prmeers.

2 II. BACKGROUND AND RELATED WORK We begin by reviewing decision mking frmeworks, opiml decision crieri, nd inverse opiml conrol lerning echniques. Our combined vlue-bsed nd rewrd-bsed mximum cusl enropy inverse opiml conrol pproch builds upon hese conceps. A. Decision Processes nd Opiml Conrol Mrkov decision processes (Definiion 1) provide flexible represenion of sequenil decision mking. Definiion 1: A Mrkov decision process (MDP) is uple, M MDP = (S, A, P (s s, ), R(s, )), of: A se of ses (s S); A se of cions ( A) ssocied wih ses; Acion-dependen se rnsiion dynmics probbiliy disribuions (P (s s, )) specifying nex se (s ); nd A rewrd funcion (R(s, ) R). A ech imesep, he se (S ) is genered from he rnsiion probbiliy disribuion (bsed on he previous se S 1 nd previous cion A 1 ). The se is known o he decision mker before he nex cion (A ) is seleced. The disribuion from which decisions re drwn is he (sochsic) policy, P (A S) or π(a S). The sndrd problem of ineres, given n MDP, is o find he opiml policy which mximizes he cumulive expeced rewrd: π(a S) = rgmx E P (A,S) R(, s ) π. π(a S) The Bellmn equion, π(s) = rgmx Q(, s) (1) Q(, s) = R(, s) + E P (S s,) V (s ) (2) V (s) = mx Q(, s), (3) defines fixed poin for his opiml policy. The recurrences of Equion 2 nd Equion 3 cn be ierively pplied o compue he se-cion vlues (lso known s he rewrdo-go), Q(, s), nd he se vlues, V (s), vi he vlue ierion lgorihm 3. These vlues qunify he cumulive fuure expeced rewrd received by he opiml policy from invoking priculr cion in priculr se or from priculr se, respecively. Under his opimliy crieri, deerminisic policy cn lwys be obined h provides he opiml vlues compued vi he Bellmn equion. Ofen, he h rewrd obined in he MDP is discouned by fcor of γ (0 γ < 1) o ensure fs convergence when pplying he Bellmn equion. However, his cn be equivlenly represened by scling ll rnsiion probbiliies P (s, s) by fcor of γ nd ermining fer ech cion wih probbiliy (1 γ). Thus, we do no explicily consider he discouned rewrd seing in ou formulion. In prcice, compuing he Bellmn equion in decision spces wih lrge numbers of ses nd cions cn be compuionlly burdensome. One pproch o me his complexiy is o inelligenly order he se updes of he Bellmn equion h re pplied nd o employ heurisics h bound he vlue funcions, limiing he size of he decision spce needed o be considered. This pproch is fmous for is use in plnning problems nd is known s he A* serch lgorihm 10. The vlue-bsed inverse opiml conrol porion of our pproch cn be viewed s providing reled compuionl benefis. B. Inverse Opiml Conrol Inverse opiml conrol (IOC) invesiges he problem of deermining wh rewrd funcion bes explins demonsred behvior sequences of ses s = (s 1,..., s T ) S nd cions = ( 1,..., T ) A. Typiclly, he prmeers of liner rewrd funcions, R θ (, s) = θ f,s, (4) which re defined in erms of rel-vlued feure vecors, f,s R K, re lerned. A broder view of inverse opiml conrol llows ny of he underlying Bellmn vribles of Equions 1 3 o be esimed. This leds o hree ypes of esimes: Acion vlue esimion, Q(, s) = φ Q f,s, from which he policy cn be obined vi Equion 1. Se vlue esimion, V (s) = φ V f s, from which he policy cn be obined vi Equion 1 nd Equion 2. Rewrd esimion, R(, s) = θ f s,, from which he policy cn be recursively obined vi Equions 1-3. Unforunely, he nïve problem formulion find rewrd prmeers θ or vlue prmeers φ Q or φ V h mke ll demonsred behvior opiml is ill-posed; mny choices of hese prmeers, including degenercies will chieve his objecive. Consider he zero weigh vecor, θ = φ Q = φ V = 0. I mkes ll decision choices, including demonsred decision sequences, hve n equl vlue of zero. While his sisfies he nïve formulion, i is compleely uninformive soluion. Erly pproches employed heurisics o selec he more meningful rewrd funcion weigh soluions o he inverse opiml conrol problem 19. However, he choice of heurisic is no priculrly well-jusified nd ofen he degenere soluion is he only vlid soluion in his problem formulion when demonsred sequences re no perfecly predicble. A key insigh of Abbeel & Ng 1 is h o ensure h n esimed policy mches he performnce of demonsred (smple from ) policy on decision mker s unknown rewrd funcion, he expeced feures mus mch. Following lineriy, E ˆP f,s = E P f,s θ R K, θ E ˆP f,s = θ E P f,s θ R K, E ˆP R θ (, s ) =E P R θ (, s ),

3 we cn see h his is indeed he cse. Unforunely, mching feures in expecion does no resolve ll of he mbiguiies in he inverse opiml conrol problem; when demonsred decision sequences re no consisenly explined s he opiml resul of single choice of rewrd prmeers, θ, hen here re mny sochsic policies h mch feure couns. This corresponds o he siuion in which demonsred decision sequences re no perfecly predicble nd degenere soluions re he only soluions o he nïve inverse opiml conrol formulion. When lerning from humn-genere decision sequences, his is ofen he cse no necessrily becuse people re nonopiml, bu lso becuse MDP nd se-cion feures re ofen simplificion of he decision sk nd is influences. Abbeel & Ng 1 mix se of deerminisic policies h re he opiml policies for sequence of rewrd prmeers, {θ i }: T T E P (S,A) f s, π θi (A S) E P (S,A) f s,. i =1 =1 However, here re mny oher wys o find such feuremching policy (mixure), nd mny do no provide good predicive performnce. Indeed, very simple exmples hve been shown o hve infinie log-loss under he Abbeel & Ng pproch 28. Oher pproches o inverse opiml conrol employ gmeheoreic prmeer selecion crieri for obining rewrd funcion prmeers 25, mximum mrgin echniques 22, or Byesin poserior disribuions of rewrd prmeers 5. A Bolzmnn cion disribuion pproch 2, 21, 18 is similr o he mximum enropy echniques we use in his pper. I employs disribuion over cions bsed on he se-cion vlue, P ( s) e Q θ(,s), (5) where he Q(, s) cion vlues re compued from he Bellmn equion (Equion 1) wih liner rewrd funcion (Equion 4). Unforunely, he d likelihood under his model is no convex, so opimizion echniques for finding rewrd weighs θ re subjec o locl opim. Addiionlly, he policy wih he lrges expeced rewrd does no necessrily hve he lrges probbiliy in his model 28. C. Rewrd-Bsed Mximum Cusl Enropy IOC The principle of mximum enropy 12 prescribes probbiliy disribuions h re he mos uncerin subjec o consrins mching mesured properies of d. This provides robus predicive log-loss minimizion gurnees 9. I hs been recenly exended o seings wih inercion nd feedbck, mking i pplicble o inverse opiml conrol problems 27. Definiion 2: Rewrd-bsed mximum cusl enropy inverse opiml conrol 27 is defined by he following opimizion: mx H(A S) (6) such h: E P (S,A) f s, = E P (S,A) f s,, s S, A, P (s, ) = P ( s) P (s T T 1 ) s S, A, P ( s) 0 nd s S, P ( s) = 1. where he cuslly condiioned enropy 14 from he Mrko- Mssey heory of direced informion 16, 17, H(A S) = E P (A,S) log P ( s), (7) is bsed on he cuslly condiioned probbiliy disribuion, P ( s) = T P ( s 1:, 1: 1 ), (8) =1 nd is emporl complemen, P (s T T 1 ) = T P (s s 1: 1, 1: 1 ). (9) =1 These disribuions crucilly differ from he sndrd condiionl probbiliy, P ( s) = T P ( s 1:T, 1: 1 ), (10) =1 in h ech cion is condiioned only on previously vilble se informion, s 1: nd no fuure se informion, s +1:T. This difference is refleced in Mrkov decision processes where ech se is only reveled o he decision mker fer previous cions re invoked nd policies h condiion on fuure ses cnno be execued. Togeher, he wo cuslly condiioned disribuions form he join disribuion: P (, s) = P ( s)p (s T T 1 ), which is explicily enforced s consrin in Equion 6. By mximizing his cusl enropy mesure (Equion 6), his pproch provides robus log-loss esime for he policy P (A S) 27. The soluion o his opimizion (Definiion 2) fcors ino sochsic policy wih cions disribued ccording o P ( s) e Qsof θ (,s), where Q sof (, s) is recursively defined s follows: θ Q sof θ (, s) = R θ (, s) + E P (S s,) V sof θ (s) (11) Vθ sof (s) = sofmx Q sof θ (, s), wih rewrd funcion R θ (, s) = θ f s, nd sofmx x f(x) = log x ef(x). This recursive relionship cn be viewed s smooh relxion of he Bellmn equion (Equion 1). The vlue updes of Equion 11, like he Bellmn equion, cn be ierively pplied vi dynmic progrmming o obin he sochsic policy of he model. This disribuion is equivlen o mximum likelihood esimion under mulivrie exreme-vlue noise disribuion for rewrd vlues in n MDP 24, bu cn be pplied more brodly o seings in which pproprie

4 noise erms re difficul o specify (e.g., coninuous conrol wih liner dynmics nd qudric coss 27). The rewrd funcion prmeers cn be lerned using sndrd grdien-bsed opimizion echniques. The grdien is simply he difference beween empiricl nd expeced feures: θ log L(θ P (S, A)) = E P (S,A) f s, E Pθ (S,A) f s,. Due o he convexiy of he log-likelihood funcion, log L(θ P (S, A)) = A,S P (A, S) log P θ (A S), rewrd prmeers converging o globl opim re obined using e.g., he grdien scen lgorihm. D. Vlue-bsed mximum enropy inverse opiml conrol An lerne pproch lerns vlue funcion rher hn rewrd funcion 15. This pproch cn lso be posed s mximum enropy esimion sk nd is closely reled o logisic regression. Definiion 3: Acion vlue-bsed mximum enropy inverse opiml conrol is defined by he following opimizion: mx H(A S) (12) such h: E P (S,A) f s, = E P (S,A) f s, A, s S P ( s) 0 nd s S, A P ( s) = 1, where he condiionl enropy is: H(A S) = E P log P (A S). The condiionl probbiliy disribuion of cions is of he form: P ( s) e Q(,s) where Q(, s) = φ Q f s,. Noe h he disribuion is over single se nd cion pir. Acion disribuions from se-bsed vlues re lso possible s resul of mximum enropy formulion. Definiion 4: Se vlue-bsed mximum enropy inverse opiml conrol 15 is defined by he following opimizion: mx H(A S) (13) such h: E P (S,S,A) f s = E P (S,S,A) f s A, s S P ( s) 0 nd s S, A P ( s) = 1, where S is he nex se experienced fer employing cion A in se S, nd is disribued ccording o he known decision process dynmics P (S S, A). Boh esimes cn similrly be obined using sndrd grdien-bsed opimizion echniques h void locl opim s consequence of convexiy. A key dvnge of he vlue-bsed esimes is h he policy cn be direcly obined wihou requiring poenilly compuionlly expensive dynmic progrmming sk (Equion 11). However, lerned vlue funcions cn ypiclly be less generlly pplied o e.g., differences in he gol erminl se of he MDP, for which he rewrd-bsed pproch is more pproprie. E. Inverse Opiml Heurisic Conrol In our previous work, we combined rewrd-bsed, IOCsyle lerning wih insnneous influence lerning, similr o he vlue-bsed perspecive, by ugmening lerned rewrd funcions hving long-erm influence wih lerned vlue funcions hving only insnneous influence. Acions under his model re disribued ccording o: P ( s) e Qsof θ (,s)+qins φ (,s), (14) where he se-cion vlues, Q sof θ (, s), re recursively compued from he dynmic progrm of Equion 11 nd he insnneous vlue is liner funcion of se-cion feures, Q ins φ (, s) = φ f s,, nd is no recursively reled o he se-cion vlues Q sof θ (s, ) ssocied wih he lerned rewrd funcion. Richer feures cn be employed wihin he insnneous vlue funcion (e.g., chrcerisics of he previous se-cion sequence) hn cn be efficienly incorpored in he rewrd funcion. While his pproch improves upon he predicive cpbiliies of rewrd-bsed mximum cusl enropy inverse opiml conrol 27, mny of he niceies of he mximum enropy formulion re los in his formulion. Firs nd foremos, lerning rewrd nd vlue funcion prmeers θ nd φ is no longer convex opimizion sk. Therefore, prcicl lgorihms for lerning model prmeers re suscepible o locl opim. Addiionlly, i does no improve upon he compuionl complexiy of he underlying mximum cusl enropy pproch i employs by combining he long-erm nd insnneous rewrds in his mnner. In conrs, he pproch we inroduce in his pper permis ech se s cions o influence behvior s rewrd or s vlue, bu no simulneously combinion of boh. This reins he convexiy properies of he mximum enropy formulion. I lso provides compuionl benefis, s he Bellmn-like inference procedure is effecively resriced o smller porions of he decision spce. III. A UNIFYING STATISTICAL ESTIMATION FRAMEWORK FOR REWARD-BASED AND VALUE-BASED INVERSE OPTIMAL CONTROL We leverge he recenly developed mximum cusl enropy frmework 27 o pose se-dependen combinion of rewrd-bsed nd vlue-bsed inverse opiml conrol under unified esimion sk. We hen inroduce lgorihms for inference nd lerning in he resuling model. A. A Unified Esimion Formulion We combine vlue-bsed nd rewrd-bsed inverse opiml conrol by priioning he ses ino hree subses: one of rewrd-bsed esimes, S R, one of cion vlue-bsed esimes, S Q, nd one of se vlue-bsed esimes, S V. Ech se s S hs ype, denoed ype(s), indicing o which of hese ses i belongs. We denoe he se of ypes for ll ses s ype(s). Using hese subses of ses, we

5 view decision sequences s shorer subsequences ccording o Definiion 5 1. Definiion 5: A vlue-se-segmened behvior sequence is rnsformion of sequence of ses nd cions ino se of smller subsequences in which only he finl se of ech sub-sequence cn be vlue-bsed se. For exmple, priculr se-cion sequence, (s, x, s b, y, s c, x, s d, z, s, y, s b ), wih cion vluebsed se S Q = {s b } nd se vlue-bsed se S V = {s d } is segmened ino sequences: (s, x, s b, y ), (s c, x, s d, z ), nd (s, y, s b ). Our view of demonsred decision sequences nd disribuions over decision sequences considers only hese segmened subsequences. We mke use of he empiricl disribuion of he iniil sub-sequence se, Psubseq (S 1 ), which, in our exmple, would be: P subseq (S 1 = s ) = 2 3 nd P subseq (S 1 = s c ) = 1 3. Definiion 6: Hybrid rewrd nd vlue mximum cusl enropy inverse opiml conrol given rewrd-bsed ses S R nd vlue-bsed ses S V nd ll se-cion sequences segmened ccording o Definiion 5 hs cions wih disribuions obined from he following opimizion: {P (A S)} = rgmx H(A S) (15) {P (A S)} T 1 T 1 such h: E P (S,A) f s, = E P (S,A) f s, =1 =1 E P (S,A) ISQ (s T )f st, T = E P (S,A) ISQ (s T )f st, T E P (S,A) I SV (s T ) P (s s T, T ) f s s S = E P (S,A) I SV (s T ) s S P (s s T, T ) f s s S, A, P (s, ) = P ( s) P (s T T 1 ) s S, A P ( s) 0, s S A P ( s) = 1 nd s 1 S, P (s 1 ) = P subseq (s 1 ), wih indicor funcion I A () = 1 if A nd 0 oherwise. We noe h he cusl enropy is concve nd h he join sequence disribuion P (S, A) is liner funcion of unknown cuslly condiioned vribles P (A S). This enbles he opimizion of Definiion 6 o be expressed s convex progrm. B. Dul Form nd Inference Procedure Though he convex progrm of Definiion 6 cn be direcly opimized, he dul progrm enbles beer undersnding of he model nd suppors more compc opimizion procedures. Theorem 1: The hybrid IOC opimizion of Definiion 6 cn be formuled s sofened version of he Bellmn 1 For simpliciy, we ssume h he finl se of ech originl sequence is vlue se. However, his ssumpion cn be relxed wih only ddiionl noionl complexiy. equion 3 vi convex duliy: θ f s, + E P (S Q hyb s,) V hyb (s ) if s S R (, s) = φ Q f s, if s S Q s S P (s s, ) φ V f s if s S V (16) V hyb (s) = sofmx Q hyb (, s), wih sofmx x f(x) = log x ef(x), he rewrd funcion prmeerized by rewrd weighs θ, nd he vlue funcions prmeerized by vlue weighs φ = {φ Q, φ V }. A srigh-forwrd procedure, Algorihm 1, follows from hese sofmx Bellmn-like updes. For simpliciy, we ssume in our noion h he decision ime-horizon is infinie nd h he corresponding policies re ime-invrin. Exensions o ime-vrying, fixed horizon decision seings re srigh-forwrd, bu noionlly more cumbersome. Algorihm 1 Rewrd/vlue-bsed IOC inference procedure Require: MDP M MDP, rewrd prmeers θ, cion vlue prmeers φ Q, se vlue prmeers φ V, cion vluebsed se S Q, se vlue-bsed se S V, nd rewrd-bsed se S R. Ensure: Se-cion vlues Q (, s) nd se vlues V (s) ccording o Equion 16 1: for ll s S Q do 2: for ll A do 3: Q (, s) φ Q f s, 4: end for 5: end for 6: for ll s S V do 7: for ll A do 8: Q (, s) φ V f s, 9: end for 10: end for 11: while no converged do 12: for ll s S R do 13: V (s) sofmx Q(, s) 14: end for 15: for ll s S R do 16: for ll A do 17: Q (, s) θ f s, + E P (S s,)v (s ) s, 18: end for 19: end for 20: end while The compuionl benefi of replcing some rewrd-bsed ses wih vlue-bsed ses cn be undersood from considering he longes phs of he decision spce. In he wors cse, vlue ierion requires O( S 2 A ) ime qudric in he number of ses o obin he opiml infinie imehorizon policy. Inuiively, his is becuse he bes policy could correspond o ph of lengh O( S ) nd propging vlues over ech se of h ph requires O( S A ) ime vi vlue ierion. Replcing rewrd-bsed ses wih vluebsed ses so h here re no phs consising enirely

6 of rewrd-bsed ses of lengh bounded below O( S 1 2 ) or O(log S ) reduces he vlue ierion o O( S 3 2 A ) or O( S A log S ) ime. The inference procedure (Algorihm 1) cn be generlized o seings in which here is belief bou he vluebsed se se, P (S V ). Nïvely, he resuling policy cn be obined for ech belief combinion for vlue se ses: P ( s) = P (S V )P ( s, S V ). (17) S V S However, more efficien lgorihm is obined when beliefs in individul se ypes, P (ype(s)) re independen beween ses (Theorem 2) nd we ssume h decisions re mde under his sme unceriny. Theorem 2: When se ype beliefs re independen, P (ype(s)) = s S P (ype(s)), he se-cion vlue of Equion 16 cn be re-wrien s: Q (, s) =P (s S R ) (θ f s, + EV (s ) s, ), + P (s S Q ) φ Qf s, + P (s S V ) s S P (s s, ) φ V f s (18) wih he inerpreion h ech visi o se, smple from P (ype(s)) is drwn indicing wheher he se is reed s rewrd-bsed influence, n cion vlue-bsed influence, or se vlue-bsed influence. C. Rewrd nd Vlue Weigh Lerning from Known Se Ses We now invesige he sk of lerning rewrd weighs θ nd vlue weighs φ = {φ Q, φ V } given he rewrd se se. Our objecive is o mximize he log likelihood: {θ, φ Q, φ V } = rgmx θ, φ Q, φ V log L(θ, φ Q, φ V P (, s)) = rgmx P (, s) log P ( s). θ, φ Q, φ Q A, s S The grdien of his opimizion wih respec o rewrd prmeers nd vlue prmeers is: θ log L(θ, φ Q, φ V P (S, A)) = (19) T 1 T 1 E P (S,A) f s, E P (S,A) f s, =1 =1 φq log L(θ, φ Q, φ V P (S, A)) = (20) E P (S,A) IsT S Q f st, T EP (S,A) IsT S Q f st, T φq log L(θ, φ Q, φ V P (S, A)) = (21) E P (S,A) P (s s T, T ) f s E P (S,A) I st S V s S I st S V P (s s T, T ) f s. s S Sndrd grdien-bsed opimizion echniques cn be employed o find prmeers h converge o globl opimum. Algorihm 2 Rewrd/vlue-bsed IOC rewrd/vlue prmeer lerning procedure Require: MDP M MDP, demonsred sequences P (S, A), iniil rewrd prmeers θ 0, iniil cion vlue prmeers φ Q,0, iniil se vlue prmeers φ V,0, se ypes ype(s). Ensure: Rewrd weigh esimes ˆθ nd vlue weigh esimes ˆφ h re (pproximely) opiml mximum likelihood esimes 1: ˆθ θ0, ˆφ Q φ Q,0, ˆφ V φ V,0 2: while ˆθ, ˆφ Q, nd ˆφ V no (pproximely) converged do 3: Compue Qˆθ, ˆφ(, s) nd Vˆθ, ˆφ(s) given ype(s) vi Algorihm 1 4: Compue policy P ( s) = e Qˆθ, ˆφ(,s) Vˆθ, ˆφ(s) 5: Compue grdien θ log L(θ, φ P (S, A)) θ=ˆθ vi Equion 19 6: Compue grdien vi Equion 20 φq log L(θ, φ P (S, A)) φq = ˆφ Q 7: Compue grdien vi Equion 21 φv log L(θ, φ P (S, A)) φv = ˆφ V 8: ˆθ ˆθ + η θ log L(θ, φ P (S, A)) 9: ˆφQ ˆφ Q + η φq log L(θ, φ P (S, A)) 10: ˆφV ˆφ V + η φv log L(θ, φ P (S, A)) 11: end while One such procedure is shown s Algorihm 2. The lerning re prmeers ( η re chosen o slowly decy o 0 re of e.g., Θ 1 ). D. Inferring he Influence Types of Ses Inferring which ses re vlue-bsed influences of decision mking nd which re rewrd-bsed influences is more difficul sk. Nïvely, considering he powerse of ses is compuionlly burdensome due o he Θ(3 S ) subse choices. We drw upon pproches h hve been successfully employed for Byesin srucure lerning 11, which is similrly difficul sk of deermining how join disribuion should fcor ino produc of condiionl probbiliies. Specificlly, Mrkov chin Mone Crlo (MCMC) simulion 8, hs been employed for Byesin nework srucure lerning 7 nd provides generl pproch for ddressing he influence ype inference seing of his pper s well. We re ineresed in obining poserior disribuion of se ypes given vilble evidence. Here, we represen h evidence s π, he demonsred policy. Using Byes rule, he poserior disribuion is: P (ype(s) π, θ, φ) P ( π ype(s), θ, φ)p (ype(s)). (22) We rely on key propery of he rewrd/vlue-bsed IOC disribuion (Theorem 3) reling he log likelihood o rewrd nd sofened se vlues. Theorem 3: The log probbiliy of policy under he hybrid rewrd-bsed/vlue-bsed inverse opiml conrol p-

7 proch is reled o he policy s expeced rewrds s follows: log P ( π ype(s), θ, φ) = (23) T 1 E P (S,A) θ f s, + φ f st, T V hyb (s 1), π =1 where V hyb is defined ccording o Theorem 1. Using sndrd Mrkov chin Mone Crlo simulion echniques, n pproximion o he poserior disribuion of rewrd/vlue se ype given demonsred policies is obined using Algorihm 3. Algorihm 3 Poserior rewrd/vlue se MCMC procedure Require: MDP M MDP, rewrd prmeers θ, cion vlue prmeers φ Q, se vlue prmeers φ V, demonsred policy π(a S), burn-in size N b, smple size N s Ensure: ˆP (ype(s)) bsed on smples from Byesin poserior. 1: s S, se ype(s) = Acion vlue 2: 1 3: while (N b + N s ) do 4: Smple i {1,..., S } uniformly rndom 5: Se ype(s i ) = Rewrd 6: Compue P R = P ( π ype(s), θ, φ) P (ype(s)) vi Equion 23 7: Se ype(s i ) = Acion vlue 8: Compue P Q = P ( π ype(s), θ, φ) P (ype(s)) vi Equion 23 9: Se ype(s i ) = Se vlue 10: Compue P V = P ( π ype(s), θ, φ) P (ype(s)) vi Equion 23 11: Smple ype(s i ) from he disribuion: ( Mulinomil P R P R +P Q +P V, P Q P R +P Q +P V, ) P V P R +P Q +P V 12: if > N b hen 13: Add smple ype(s) o disribuion ˆP (ype(s)) 14: end if 15: : end while Algorihm 3 ssumes h he rewrd nd vlue weighs, θ, φ Q, nd φ V re known. Ofen his is no he cse. Insed, we would like o esime boh hose rewrd nd vlue weighs s well s he ype of ech se. Algorihm 4 employs n expecion-mximizion pproch 6 o obin boh se ypes nd prmeer weighs. I obins he mximum likelihood weighs for priculr se of se beliefs in line 3 (he mximizion sep) nd clcules he expecion of se ypes given hose weigh esimes in line 4 (he expecion sep). When boh he expecion nd mximizion seps re exc, his procedure is gurneed o converge o locl opim of he likelihood funcion of rewrd/vlue weighs nd se ypes. In his seing, such gurnee cn be provided when he Mrkov chin Mone Crlo procedure of Algorihm 3 mixes well nd he rue disribuion of se ypes fcors independenly (s ssumed in line 5 of Algorihm 4 for compuionl efficiency benefis). Algorihm 4 Rewrd/vlue se nd prmeer EM procedure Require: MDP M MDP, demonsred sequences P (S, A) / policy π Ensure: Se se esime ˆP (ype(s)) nd rewrd/vlue prmeer esimes ˆθ, ˆφ h re (pproximely) locl opim. 1: s S, le P (ype(s)) be disribued uniformly rndom 2: while ype(s), ˆθ, nd ˆφ no converged do 3: Obin rewrd/vlue weighs ˆθ, ˆφ given ˆP (ype(s)) nd P (S, A) vi Algorihm 2 using Equion 18 4: Obin ˆP (ype(s)) given π nd prmeers ˆθ, ˆφ vi Algorihm 3 5: Approxime ˆP (ype(s)) s n independen disribuion wih ˆP (ype(s i )) s he mrginl disribuion of ˆP (ype(s)) 6: end while IV. DISCUSSION In his pper, we hve combined rewrd-bsed inverse opiml conrol wih vlue-bsed inverse opiml conrol s sisicl esimion sk using he principle of mximum cusl enropy. We llow ech se o influence behvior s eiher rewrd, n cion vlue, or se vlue. This llows compuionlly dvngeous vlue-bsed esimes o be employed in some porions of he decision spce while reining he sequenil resoning of cos-bsed esimes in oher porions of he se spce. When he ype of ech se is known, lerning rewrd/vlue weighs is ccomplished by convex opimizion. When se ypes re unknown, Mrkov chin Mone Crlo nd Expecion-Mximizion pproches cn be pplied wih weker gurnees. One of he grees benefis of inverse opiml conrol is he biliy o rnsfer lerned rewrd/vlue weighs o differen decision processes h re chrcerized by rewrd/vlue feures from he sme feure spce. This benefi is seemingly los by he hybrid formulion of inverse opiml conrol in his pper becuse he belief in he ype of ech se is specific o he decision process of he demonsred decision sequences nd does no rnsfer. Esiming se ypes bsed on vilble (nd rnsferble) informion, such s se-cion feures, is n imporn fuure direcion for enbling his pproch o he rnsfer seing. REFERENCES 1 P. Abbeel nd A. Y. Ng. Appreniceship lerning vi inverse reinforcemen lerning. In Proc. Inernionl Conference on Mchine Lerning, pges 1 8, C. Bker, J. Tenenbum, nd R. Sxe. Gol inference s inverse plnning. In Proceedings of he 29h nnul meeing of he cogniive science sociey, R. Bellmn. A Mrkovin decision process. Journl of Mhemics nd Mechnics, 6: , S. Boyd, L. El Ghoui, E. Feron, nd V. Blkrishnn. Liner mrix inequliies in sysem nd conrol heory. SIAM, 15, U. Chjewsk, D. Koller, nd D. Ormonei. Lerning n gen s uiliy funcion by observing behvior. In Inernionl Conference on Mchine Lerning, pges 35 42, 2001.

8 6 A. P. Dempser, N. M. Lird, nd D. B. Rubin. Mximum likelihood from incomplee d vi he EM lgorihm. Journl of he Royl Sisicl Sociey. Series B (Mehodologicl), 39(1):1 38, N. Friedmn nd D. Koller. Being Byesin bou nework srucure. Byesin pproch o srucure discovery in Byesin neworks. Mchine lerning, 50(1):95 125, W. Gilks, S. Richrdson, nd D. Spiegelhler. Mrkov chin Mone Crlo in prcice. Chpmn & Hll/CRC, P. D. Grünwld nd A. P. Dwid. Gme heory, mximum enropy, minimum discrepncy, nd robus Byesin decision heory. Annls of Sisics, 32: , P. Hr, N. Nilsson, nd B. Rphel. A forml bsis for he heurisic deerminion of minimum cos phs. IEEE Trnscions on Sysems Science nd Cyberneics, 4: , D. Heckermn, D. Geiger, nd D. Chickering. Lerning Byesin neworks: The combinion of knowledge nd sisicl d. Mchine lerning, 20(3): , E. T. Jynes. Informion heory nd sisicl mechnics. Physicl Review, 106: , R. Klmn. When is liner conrol sysem opiml? Trns. ASME, J. Bsic Engrg., 86:51 60, G. Krmer. Direced Informion for Chnnels wih Feedbck. PhD hesis, Swiss Federl Insiue of Technology (ETH) Zurich, D. Krishnmurhy nd E. Todorov. Inverse opiml conrol wih linerly-solvble MDPs. In Proc. Inernionl Conference on Mchine Lerning, pges , H. Mrko. The bidirecionl communicion heory generlizion of informion heory. In IEEE Trnscions on Communicions, pges , J. L. Mssey. Cusliy, feedbck nd direced informion. In Proc. IEEE Inernionl Symposium on Informion Theory nd Is Applicions, pges 27 30, G. Neu nd C. Szepesvári. Appreniceship lerning using inverse reinforcemen lerning nd grdien mehods. In Proc. UAI, pges , A. Y. Ng nd S. Russell. Algorihms for inverse reinforcemen lerning. In Proc. Inernionl Conference on Mchine Lerning, pges , D. Pomerleu. ALVINN: An uonomous lnd vehicle in neurl nework. Advnces in Neurl Informion Processing Sysems I, 1:305, D. Rmchndrn nd E. Amir. Byesin inverse reinforcemen lerning. In Proc. Inernionl Join Conference on Arificil Inelligence, pges , N. Rliff, J. A. Bgnell, nd M. Zinkevich. Mximum mrgin plnning. In Proc. Inernionl Conference on Mchine Lerning, pges , N. Rliff, D. Brdley, J. Bgnell, nd J. Chesnu. Boosing srucured predicion for imiion lerning. In In Advnces in Neurl Informion Processing Sysems (NIPS) 19, J. Rus. Mximum likelihood esimion of discree conrol processes. SIAM Journl on Conrol nd Opimizion, 26:1006, U. Syed nd R. Schpire. A gme-heoreic pproch o ppreniceship lerning. Advnces in Neurl Informion Processing Sysems, 20: , B. D. Ziebr. Modeling Purposeful Adpive Behvior wih he Principle of Mximum Cusl Enropy. PhD hesis, Mchine Lerning Deprmen, Crnegie Mellon Universiy, Dec B. D. Ziebr, J. A. Bgnell, nd A. K. Dey. Modeling inercion vi he principle of mximum cusl enropy. In Proc. Inernionl Conference on Mchine Lerning, pges , B. D. Ziebr, A. Ms, J. A. Bgnell, nd A. K. Dey. Mximum enropy inverse reinforcemen lerning. In Proc. Conference on Arificil Inelligence (AAAI), pges , B. D. Ziebr, A. Ms, A. K. Dey, nd J. A. Bgnell. Nvige like cbbie: Probbilisic resoning from observed conex-wre behvior. In Proc. Inernionl Conference on Ubiquious Compuing, B. D. Ziebr, N. Rliff, G. Gllgher, C. Merz, K. Peerson, J. A. Bgnell, M. Heber, A. K. Dey, nd S. Srinivs. Plnning-bsed predicion for pedesrins. In Proc. of he Inernionl Conference on Inelligen Robos nd Sysems, APPENDIX Theorem 1: The hybrid IOC opimizion of Definiion 6 cn be formuled s sofened version of he Bellmn equion 3 vi convex duliy: Q hyb (, s) = V hyb θ f s, + E P (S s,) φ Q f s, s S P (s s, ) φ V f s (s) = sofmx Q hyb (, s), V hyb (s ) if s S R if s S Q if s S V (24) wih sofmx x f(x) = log x ef(x), he rewrd funcion prmeerized by rewrd weighs θ, nd he vlue funcions prmeerized by vlue weighs φ = {φ Q, φ V }. Proof: Afer seing feures over he enire sequence s F(s, ) = T 1 =1 f s, I SQ (s T ) f st I SV (s T ) s S P (s s T, T ) f s, (25) his is corollry of Theorem Theorem 2: When se ype beliefs re independen, P (ype(s)) = s S P (ype(s)), he se-cion vlue of Equion 16 cn be re-wrien s: Q (, s) =P (s S R ) (θ f s, + EV (s ) s, ), + P (s S Q ) φ Qf s, + P (s S V ) s S P (s s, ) φ V f s (26) wih he inerpreion h ech visi o se, smple from P (ype(s)) is drwn indicing wheher he se is reed s rewrd-bsed influence, n cion vlue-bsed influence, or se vlue-bsed influence. Proof: (skech) The siuion where se ype is disribued ccording o n independen belief disribuion cn be represened wih n ugmened se of dynmics over he join of ses nd se ypes: P expnd (s, ype(s ) s, ) = P (s s, )P (ype(s )), (27) where P (s s, ) is he originl dynmics nd P (ype(s )) is he belief disribuion over ypes. Following Theorem 1 in his expnded se spce complees he proof. Theorem 3: The log probbiliy of policy under he hybrid rewrd-bsed/vlue-bsed inverse opiml conrol pproch is reled o he policy s expeced rewrds s follows: log P ( π ype(s), θ, φ) = (28) T 1 E P (S,A) θ f s, + φ f st, T V hyb (s 1), π =1 where V hyb is defined ccording o Theorem 1. Proof: This is corollry of Theorem wih feures defined ccording o Equion 25.

e t dt e t dt = lim e t dt T (1 e T ) = 1

e t dt e t dt = lim e t dt T (1 e T ) = 1 Improper Inegrls There re wo ypes of improper inegrls - hose wih infinie limis of inegrion, nd hose wih inegrnds h pproch some poin wihin he limis of inegrion. Firs we will consider inegrls wih infinie

More information

Chapter 2: Evaluative Feedback

Chapter 2: Evaluative Feedback Chper 2: Evluive Feedbck Evluing cions vs. insrucing by giving correc cions Pure evluive feedbck depends olly on he cion ken. Pure insrucive feedbck depends no ll on he cion ken. Supervised lerning is

More information

Minimum Squared Error

Minimum Squared Error Minimum Squred Error LDF: Minimum Squred-Error Procedures Ide: conver o esier nd eer undersood prolem Percepron y i > 0 for ll smples y i solve sysem of liner inequliies MSE procedure y i i for ll smples

More information

Minimum Squared Error

Minimum Squared Error Minimum Squred Error LDF: Minimum Squred-Error Procedures Ide: conver o esier nd eer undersood prolem Percepron y i > for ll smples y i solve sysem of liner inequliies MSE procedure y i = i for ll smples

More information

Reinforcement Learning. Markov Decision Processes

Reinforcement Learning. Markov Decision Processes einforcemen Lerning Mrkov Decision rocesses Mnfred Huber 2014 1 equenil Decision Mking N-rmed bi problems re no good wy o model sequenil decision problem Only dels wih sic decision sequences Could be miiged

More information

A Kalman filtering simulation

A Kalman filtering simulation A Klmn filering simulion The performnce of Klmn filering hs been esed on he bsis of wo differen dynmicl models, ssuming eiher moion wih consn elociy or wih consn ccelerion. The former is epeced o beer

More information

September 20 Homework Solutions

September 20 Homework Solutions College of Engineering nd Compuer Science Mechnicl Engineering Deprmen Mechnicl Engineering A Seminr in Engineering Anlysis Fll 7 Number 66 Insrucor: Lrry Creo Sepember Homework Soluions Find he specrum

More information

Contraction Mapping Principle Approach to Differential Equations

Contraction Mapping Principle Approach to Differential Equations epl Journl of Science echnology 0 (009) 49-53 Conrcion pping Principle pproch o Differenil Equions Bishnu P. Dhungn Deprmen of hemics, hendr Rn Cmpus ribhuvn Universiy, Khmu epl bsrc Using n eension of

More information

4.8 Improper Integrals

4.8 Improper Integrals 4.8 Improper Inegrls Well you ve mde i hrough ll he inegrion echniques. Congrs! Unforunely for us, we sill need o cover one more inegrl. They re clled Improper Inegrls. A his poin, we ve only del wih inegrls

More information

5.1-The Initial-Value Problems For Ordinary Differential Equations

5.1-The Initial-Value Problems For Ordinary Differential Equations 5.-The Iniil-Vlue Problems For Ordinry Differenil Equions Consider solving iniil-vlue problems for ordinry differenil equions: (*) y f, y, b, y. If we know he generl soluion y of he ordinry differenil

More information

3. Renewal Limit Theorems

3. Renewal Limit Theorems Virul Lborories > 14. Renewl Processes > 1 2 3 3. Renewl Limi Theorems In he inroducion o renewl processes, we noed h he rrivl ime process nd he couning process re inverses, in sens The rrivl ime process

More information

EXISTENCE AND UNIQUENESS OF SOLUTIONS FOR A SECOND-ORDER ITERATIVE BOUNDARY-VALUE PROBLEM

EXISTENCE AND UNIQUENESS OF SOLUTIONS FOR A SECOND-ORDER ITERATIVE BOUNDARY-VALUE PROBLEM Elecronic Journl of Differenil Equions, Vol. 208 (208), No. 50, pp. 6. ISSN: 072-669. URL: hp://ejde.mh.xse.edu or hp://ejde.mh.un.edu EXISTENCE AND UNIQUENESS OF SOLUTIONS FOR A SECOND-ORDER ITERATIVE

More information

A new model for limit order book dynamics

A new model for limit order book dynamics Anewmodelforlimiorderbookdynmics JeffreyR.Russell UniversiyofChicgo,GrdueSchoolofBusiness TejinKim UniversiyofChicgo,DeprmenofSisics Absrc:Thispperproposesnewmodelforlimiorderbookdynmics.Thelimiorderbookconsiss

More information

S Radio transmission and network access Exercise 1-2

S Radio transmission and network access Exercise 1-2 S-7.330 Rdio rnsmission nd nework ccess Exercise 1 - P1 In four-symbol digil sysem wih eqully probble symbols he pulses in he figure re used in rnsmission over AWGN-chnnel. s () s () s () s () 1 3 4 )

More information

Making Complex Decisions Markov Decision Processes. Making Complex Decisions: Markov Decision Problem

Making Complex Decisions Markov Decision Processes. Making Complex Decisions: Markov Decision Problem Mking Comple Decisions Mrkov Decision Processes Vsn Honvr Bioinformics nd Compuionl Biology Progrm Cener for Compuionl Inelligence, Lerning, & Discovery honvr@cs.ise.edu www.cs.ise.edu/~honvr/ www.cild.ise.edu/

More information

Procedia Computer Science

Procedia Computer Science Procedi Compuer Science 00 (0) 000 000 Procedi Compuer Science www.elsevier.com/loce/procedi The Third Informion Sysems Inernionl Conference The Exisence of Polynomil Soluion of he Nonliner Dynmicl Sysems

More information

0 for t < 0 1 for t > 0

0 for t < 0 1 for t > 0 8.0 Sep nd del funcions Auhor: Jeremy Orloff The uni Sep Funcion We define he uni sep funcion by u() = 0 for < 0 for > 0 I is clled he uni sep funcion becuse i kes uni sep = 0. I is someimes clled he Heviside

More information

1 jordan.mcd Eigenvalue-eigenvector approach to solving first order ODEs. -- Jordan normal (canonical) form. Instructor: Nam Sun Wang

1 jordan.mcd Eigenvalue-eigenvector approach to solving first order ODEs. -- Jordan normal (canonical) form. Instructor: Nam Sun Wang jordnmcd Eigenvlue-eigenvecor pproch o solving firs order ODEs -- ordn norml (cnonicl) form Insrucor: Nm Sun Wng Consider he following se of coupled firs order ODEs d d x x 5 x x d d x d d x x x 5 x x

More information

A Time Truncated Improved Group Sampling Plans for Rayleigh and Log - Logistic Distributions

A Time Truncated Improved Group Sampling Plans for Rayleigh and Log - Logistic Distributions ISSNOnline : 39-8753 ISSN Prin : 347-67 An ISO 397: 7 Cerified Orgnizion Vol. 5, Issue 5, My 6 A Time Trunced Improved Group Smpling Plns for Ryleigh nd og - ogisic Disribuions P.Kvipriy, A.R. Sudmni Rmswmy

More information

Probability, Estimators, and Stationarity

Probability, Estimators, and Stationarity Chper Probbiliy, Esimors, nd Sionriy Consider signl genered by dynmicl process, R, R. Considering s funcion of ime, we re opering in he ime domin. A fundmenl wy o chrcerize he dynmics using he ime domin

More information

The solution is often represented as a vector: 2xI + 4X2 + 2X3 + 4X4 + 2X5 = 4 2xI + 4X2 + 3X3 + 3X4 + 3X5 = 4. 3xI + 6X2 + 6X3 + 3X4 + 6X5 = 6.

The solution is often represented as a vector: 2xI + 4X2 + 2X3 + 4X4 + 2X5 = 4 2xI + 4X2 + 3X3 + 3X4 + 3X5 = 4. 3xI + 6X2 + 6X3 + 3X4 + 6X5 = 6. [~ o o :- o o ill] i 1. Mrices, Vecors, nd Guss-Jordn Eliminion 1 x y = = - z= The soluion is ofen represened s vecor: n his exmple, he process of eliminion works very smoohly. We cn elimine ll enries

More information

A new model for solving fuzzy linear fractional programming problem with ranking function

A new model for solving fuzzy linear fractional programming problem with ranking function J. ppl. Res. Ind. Eng. Vol. 4 No. 07 89 96 Journl of pplied Reserch on Indusril Engineering www.journl-prie.com new model for solving fuzzy liner frcionl progrmming prolem wih rning funcion Spn Kumr Ds

More information

Outline. Expectation Propagation in Practice. EP in a nutshell. Extensions to EP. EP algorithm Examples: Tom Minka CMU Statistics

Outline. Expectation Propagation in Practice. EP in a nutshell. Extensions to EP. EP algorithm Examples: Tom Minka CMU Statistics Eecion Progion in Prcice Tom Mink CMU Sisics Join work wih Yun Qi nd John Lffery EP lgorihm Emles: Ouline Trcking dynmic sysem Signl deecion in fding chnnels Documen modeling Bolzmnn mchines Eensions o

More information

3D Transformations. Computer Graphics COMP 770 (236) Spring Instructor: Brandon Lloyd 1/26/07 1

3D Transformations. Computer Graphics COMP 770 (236) Spring Instructor: Brandon Lloyd 1/26/07 1 D Trnsformions Compuer Grphics COMP 770 (6) Spring 007 Insrucor: Brndon Lloyd /6/07 Geomery Geomeric eniies, such s poins in spce, exis wihou numers. Coordines re nming scheme. The sme poin cn e descried

More information

f t f a f x dx By Lin McMullin f x dx= f b f a. 2

f t f a f x dx By Lin McMullin f x dx= f b f a. 2 Accumulion: Thoughs On () By Lin McMullin f f f d = + The gols of he AP* Clculus progrm include he semen, Sudens should undersnd he definie inegrl s he ne ccumulion of chnge. 1 The Topicl Ouline includes

More information

The Finite Element Method for the Analysis of Non-Linear and Dynamic Systems

The Finite Element Method for the Analysis of Non-Linear and Dynamic Systems Swiss Federl Insiue of Pge 1 The Finie Elemen Mehod for he Anlysis of Non-Liner nd Dynmic Sysems Prof. Dr. Michel Hvbro Fber Dr. Nebojs Mojsilovic Swiss Federl Insiue of ETH Zurich, Swizerlnd Mehod of

More information

MATH 124 AND 125 FINAL EXAM REVIEW PACKET (Revised spring 2008)

MATH 124 AND 125 FINAL EXAM REVIEW PACKET (Revised spring 2008) MATH 14 AND 15 FINAL EXAM REVIEW PACKET (Revised spring 8) The following quesions cn be used s review for Mh 14/ 15 These quesions re no cul smples of quesions h will pper on he finl em, bu hey will provide

More information

Convergence of Singular Integral Operators in Weighted Lebesgue Spaces

Convergence of Singular Integral Operators in Weighted Lebesgue Spaces EUROPEAN JOURNAL OF PURE AND APPLIED MATHEMATICS Vol. 10, No. 2, 2017, 335-347 ISSN 1307-5543 www.ejpm.com Published by New York Business Globl Convergence of Singulr Inegrl Operors in Weighed Lebesgue

More information

1.0 Electrical Systems

1.0 Electrical Systems . Elecricl Sysems The ypes of dynmicl sysems we will e sudying cn e modeled in erms of lgeric equions, differenil equions, or inegrl equions. We will egin y looking fmilir mhemicl models of idel resisors,

More information

REAL ANALYSIS I HOMEWORK 3. Chapter 1

REAL ANALYSIS I HOMEWORK 3. Chapter 1 REAL ANALYSIS I HOMEWORK 3 CİHAN BAHRAN The quesions re from Sein nd Shkrchi s e. Chper 1 18. Prove he following sserion: Every mesurble funcion is he limi.e. of sequence of coninuous funcions. We firs

More information

Some Inequalities variations on a common theme Lecture I, UL 2007

Some Inequalities variations on a common theme Lecture I, UL 2007 Some Inequliies vriions on common heme Lecure I, UL 2007 Finbrr Hollnd, Deprmen of Mhemics, Universiy College Cork, fhollnd@uccie; July 2, 2007 Three Problems Problem Assume i, b i, c i, i =, 2, 3 re rel

More information

Neural assembly binding in linguistic representation

Neural assembly binding in linguistic representation Neurl ssembly binding in linguisic represenion Frnk vn der Velde & Mrc de Kmps Cogniive Psychology Uni, Universiy of Leiden, Wssenrseweg 52, 2333 AK Leiden, The Neherlnds, vdvelde@fsw.leidenuniv.nl Absrc.

More information

22.615, MHD Theory of Fusion Systems Prof. Freidberg Lecture 9: The High Beta Tokamak

22.615, MHD Theory of Fusion Systems Prof. Freidberg Lecture 9: The High Beta Tokamak .65, MHD Theory of Fusion Sysems Prof. Freidberg Lecure 9: The High e Tokmk Summry of he Properies of n Ohmic Tokmk. Advnges:. good euilibrium (smll shif) b. good sbiliy ( ) c. good confinemen ( τ nr )

More information

( ) ( ) ( ) ( ) ( ) ( y )

( ) ( ) ( ) ( ) ( ) ( y ) 8. Lengh of Plne Curve The mos fmous heorem in ll of mhemics is he Pyhgoren Theorem. I s formulion s he disnce formul is used o find he lenghs of line segmens in he coordine plne. In his secion you ll

More information

Motion. Part 2: Constant Acceleration. Acceleration. October Lab Physics. Ms. Levine 1. Acceleration. Acceleration. Units for Acceleration.

Motion. Part 2: Constant Acceleration. Acceleration. October Lab Physics. Ms. Levine 1. Acceleration. Acceleration. Units for Acceleration. Moion Accelerion Pr : Consn Accelerion Accelerion Accelerion Accelerion is he re of chnge of velociy. = v - vo = Δv Δ ccelerion = = v - vo chnge of velociy elpsed ime Accelerion is vecor, lhough in one-dimensionl

More information

ENGR 1990 Engineering Mathematics The Integral of a Function as a Function

ENGR 1990 Engineering Mathematics The Integral of a Function as a Function ENGR 1990 Engineering Mhemics The Inegrl of Funcion s Funcion Previously, we lerned how o esime he inegrl of funcion f( ) over some inervl y dding he res of finie se of rpezoids h represen he re under

More information

Bayesian Inference for Static Traffic Network Flows with Mobile Sensor Data

Bayesian Inference for Static Traffic Network Flows with Mobile Sensor Data Proceedings of he 51 s Hwii Inernionl Conference on Sysem Sciences 218 Byesin Inference for Sic Trffic Nework Flows wih Mobile Sensor D Zhen Tn Cornell Universiy z78@cornell.edu H.Oliver Go Cornell Universiy

More information

Optimality of Myopic Policy for a Class of Monotone Affine Restless Multi-Armed Bandit

Optimality of Myopic Policy for a Class of Monotone Affine Restless Multi-Armed Bandit Univeriy of Souhern Cliforni Opimliy of Myopic Policy for Cl of Monoone Affine Rele Muli-Armed Bndi Pri Mnourifrd USC Tr Jvidi UCSD Bhkr Krihnmchri USC Dec 0, 202 Univeriy of Souhern Cliforni Inroducion

More information

Green s Functions and Comparison Theorems for Differential Equations on Measure Chains

Green s Functions and Comparison Theorems for Differential Equations on Measure Chains Green s Funcions nd Comprison Theorems for Differenil Equions on Mesure Chins Lynn Erbe nd Alln Peerson Deprmen of Mhemics nd Sisics, Universiy of Nebrsk-Lincoln Lincoln,NE 68588-0323 lerbe@@mh.unl.edu

More information

T-Match: Matching Techniques For Driving Yagi-Uda Antennas: T-Match. 2a s. Z in. (Sections 9.5 & 9.7 of Balanis)

T-Match: Matching Techniques For Driving Yagi-Uda Antennas: T-Match. 2a s. Z in. (Sections 9.5 & 9.7 of Balanis) 3/0/018 _mch.doc Pge 1 of 6 T-Mch: Mching Techniques For Driving Ygi-Ud Anenns: T-Mch (Secions 9.5 & 9.7 of Blnis) l s l / l / in The T-Mch is shun-mching echnique h cn be used o feed he driven elemen

More information

TIMELINESS, ACCURACY, AND RELEVANCE IN DYNAMIC INCENTIVE CONTRACTS

TIMELINESS, ACCURACY, AND RELEVANCE IN DYNAMIC INCENTIVE CONTRACTS TIMELINESS, ACCURACY, AND RELEVANCE IN DYNAMIC INCENTIVE CONTRACTS by Peer O. Chrisensen Universiy of Souhern Denmrk Odense, Denmrk Gerld A. Felhm Universiy of Briish Columbi Vncouver, Cnd Chrisin Hofmnn

More information

A Simple Method to Solve Quartic Equations. Key words: Polynomials, Quartics, Equations of the Fourth Degree INTRODUCTION

A Simple Method to Solve Quartic Equations. Key words: Polynomials, Quartics, Equations of the Fourth Degree INTRODUCTION Ausrlin Journl of Bsic nd Applied Sciences, 6(6): -6, 0 ISSN 99-878 A Simple Mehod o Solve Quric Equions Amir Fhi, Poo Mobdersn, Rhim Fhi Deprmen of Elecricl Engineering, Urmi brnch, Islmic Ad Universi,

More information

Removing Redundancy and Inconsistency in Memory- Based Collaborative Filtering

Removing Redundancy and Inconsistency in Memory- Based Collaborative Filtering Removing Redundncy nd Inconsisency in Memory- Bsed Collborive Filering Ki Yu Siemens AG, Corpore Technology & Universiy of Munich, Germny ki.yu.exernl@mchp.siemens. de Xiowei Xu Informion Science Deprmen

More information

Reinforcement Learning

Reinforcement Learning Reiforceme Corol lerig Corol polices h choose opiml cios Q lerig Covergece Chper 13 Reiforceme 1 Corol Cosider lerig o choose cios, e.g., Robo lerig o dock o bery chrger o choose cios o opimize fcory oupu

More information

Physics 2A HW #3 Solutions

Physics 2A HW #3 Solutions Chper 3 Focus on Conceps: 3, 4, 6, 9 Problems: 9, 9, 3, 41, 66, 7, 75, 77 Phsics A HW #3 Soluions Focus On Conceps 3-3 (c) The ccelerion due o grvi is he sme for boh blls, despie he fc h he hve differen

More information

Tax Audit and Vertical Externalities

Tax Audit and Vertical Externalities T Audi nd Vericl Eernliies Hidey Ko Misuyoshi Yngihr Ngoy Keizi Universiy Ngoy Universiy 1. Inroducion The vericl fiscl eernliies rise when he differen levels of governmens, such s he federl nd se governmens,

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

INVESTIGATION OF REINFORCEMENT LEARNING FOR BUILDING THERMAL MASS CONTROL

INVESTIGATION OF REINFORCEMENT LEARNING FOR BUILDING THERMAL MASS CONTROL INVESTIGATION OF REINFORCEMENT LEARNING FOR BUILDING THERMAL MASS CONTROL Simeng Liu nd Gregor P. Henze, Ph.D., P.E. Universiy of Nebrsk Lincoln, Archiecurl Engineering 1110 Souh 67 h Sree, Peer Kiewi

More information

LAPLACE TRANSFORM OVERCOMING PRINCIPLE DRAWBACKS IN APPLICATION OF THE VARIATIONAL ITERATION METHOD TO FRACTIONAL HEAT EQUATIONS

LAPLACE TRANSFORM OVERCOMING PRINCIPLE DRAWBACKS IN APPLICATION OF THE VARIATIONAL ITERATION METHOD TO FRACTIONAL HEAT EQUATIONS Wu, G.-.: Lplce Trnsform Overcoming Principle Drwbcks in Applicion... THERMAL SIENE: Yer 22, Vol. 6, No. 4, pp. 257-26 257 Open forum LAPLAE TRANSFORM OVEROMING PRINIPLE DRAWBAKS IN APPLIATION OF THE VARIATIONAL

More information

On Source and Channel Codes for Multiple Inputs and Outputs: Does Multiple Description Meet Space Time? 1

On Source and Channel Codes for Multiple Inputs and Outputs: Does Multiple Description Meet Space Time? 1 On Source nd Chnnel Codes for Muliple Inpus nd Oupus: oes Muliple escripion Mee Spce Time? Michelle Effros Rlf Koeer 3 Andre J. Goldsmih 4 Muriel Médrd 5 ep. of Elecricl Eng., 36-93, Cliforni Insiue of

More information

An integral having either an infinite limit of integration or an unbounded integrand is called improper. Here are two examples.

An integral having either an infinite limit of integration or an unbounded integrand is called improper. Here are two examples. Improper Inegrls To his poin we hve only considered inegrls f(x) wih he is of inegrion nd b finie nd he inegrnd f(x) bounded (nd in fc coninuous excep possibly for finiely mny jump disconinuiies) An inegrl

More information

Machine Learning Sequential Data Markov Chains Hidden Markov Models State space models

Machine Learning Sequential Data Markov Chains Hidden Markov Models State space models Mchine Lerning Sequenil D Mrov Chins Hidden Mrov Models Se sce models Lesson Sequenil D Consider sysem which cn occuy one of N discree ses or cegories. : se ime Discree {s s 2 s } or Coninue R d Sequenil

More information

Reinforcement learning

Reinforcement learning CS 75 Mchine Lening Lecue b einfocemen lening Milos Huskech milos@cs.pi.edu 539 Senno Sque einfocemen lening We wn o len conol policy: : X A We see emples of bu oupus e no given Insed of we ge feedbck

More information

GENERALIZATION OF SOME INEQUALITIES VIA RIEMANN-LIOUVILLE FRACTIONAL CALCULUS

GENERALIZATION OF SOME INEQUALITIES VIA RIEMANN-LIOUVILLE FRACTIONAL CALCULUS - TAMKANG JOURNAL OF MATHEMATICS Volume 5, Number, 7-5, June doi:5556/jkjm555 Avilble online hp://journlsmhkueduw/ - - - GENERALIZATION OF SOME INEQUALITIES VIA RIEMANN-LIOUVILLE FRACTIONAL CALCULUS MARCELA

More information

Transforms II - Wavelets Preliminary version please report errors, typos, and suggestions for improvements

Transforms II - Wavelets Preliminary version please report errors, typos, and suggestions for improvements EECS 3 Digil Signl Processing Universiy of Cliforni, Berkeley: Fll 007 Gspr November 4, 007 Trnsforms II - Wveles Preliminry version plese repor errors, ypos, nd suggesions for improvemens We follow n

More information

Estimation of Markov Regime-Switching Regression Models with Endogenous Switching

Estimation of Markov Regime-Switching Regression Models with Endogenous Switching Esimion of Mrkov Regime-Swiching Regression Models wih Endogenous Swiching Chng-Jin Kim Kore Universiy nd Universiy of Wshingon Jeremy Piger Federl Reserve Bnk of S. Louis Richrd Srz Universiy of Wshingon

More information

Efficient Optimal Learning for Contextual Bandits

Efficient Optimal Learning for Contextual Bandits fficien Opiml Lerning for Conexul Bndis Miroslv Dudik mdudik@yhoo-inccom Dniel Hsu djhsu@rcirugersedu Syen Kle skle@yhoo-inccom Nikos Krmpzikis nk@cscornelledu John Lngford jl@yhoo-inccom Lev Reyzin lreyzin@ccgechedu

More information

HUI-HSIUNG KUO, ANUWAT SAE-TANG, AND BENEDYKT SZOZDA

HUI-HSIUNG KUO, ANUWAT SAE-TANG, AND BENEDYKT SZOZDA Communicions on Sochsic Anlysis Vol 6, No 4 2012 603-614 Serils Publicions wwwserilspublicionscom THE ITÔ FORMULA FOR A NEW STOCHASTIC INTEGRAL HUI-HSIUNG KUO, ANUWAT SAE-TANG, AND BENEDYKT SZOZDA Absrc

More information

Temperature Rise of the Earth

Temperature Rise of the Earth Avilble online www.sciencedirec.com ScienceDirec Procedi - Socil nd Behviorl Scien ce s 88 ( 2013 ) 220 224 Socil nd Behviorl Sciences Symposium, 4 h Inernionl Science, Socil Science, Engineering nd Energy

More information

Some basic notation and terminology. Deterministic Finite Automata. COMP218: Decision, Computation and Language Note 1

Some basic notation and terminology. Deterministic Finite Automata. COMP218: Decision, Computation and Language Note 1 COMP28: Decision, Compuion nd Lnguge Noe These noes re inended minly s supplemen o he lecures nd exooks; hey will e useful for reminders ou noion nd erminology. Some sic noion nd erminology An lphe is

More information

ON NEW INEQUALITIES OF SIMPSON S TYPE FOR FUNCTIONS WHOSE SECOND DERIVATIVES ABSOLUTE VALUES ARE CONVEX

ON NEW INEQUALITIES OF SIMPSON S TYPE FOR FUNCTIONS WHOSE SECOND DERIVATIVES ABSOLUTE VALUES ARE CONVEX Journl of Applied Mhemics, Sisics nd Informics JAMSI), 9 ), No. ON NEW INEQUALITIES OF SIMPSON S TYPE FOR FUNCTIONS WHOSE SECOND DERIVATIVES ABSOLUTE VALUES ARE CONVEX MEHMET ZEKI SARIKAYA, ERHAN. SET

More information

CONTINUOUS DYNAMIC NETWORK LOADING MODELS 1

CONTINUOUS DYNAMIC NETWORK LOADING MODELS 1 CONTINUOUS DYNAMIC NETWORK LOADING MODELS 1 Ricrdo Grcí*, Mª Luz López*, Alejndro Niño**, nd Doroeo Versegui* * Deprmeno de Memáics. Universidd de Csill-L Mnch ** Progrm de Invesigción en Tránsio y Trnspore.

More information

Compulsory Flow Q-Learning: an RL algorithm for robot navigation based on partial-policy and macro-states

Compulsory Flow Q-Learning: an RL algorithm for robot navigation based on partial-policy and macro-states Journl of he Brzilin Compuer Sociey, 2009; 5(3):65-75. ISSN 004-6500 Compulsory Flow Q-Lerning: n RL lgorihm for robo nvigion bsed on pril-policy nd mcro-ses Vldinei Freire d Silv*, Ann Helen Reli Cos

More information

Systems Variables and Structural Controllability: An Inverted Pendulum Case

Systems Variables and Structural Controllability: An Inverted Pendulum Case Reserch Journl of Applied Sciences, Engineering nd echnology 6(: 46-4, 3 ISSN: 4-7459; e-issn: 4-7467 Mxwell Scienific Orgniion, 3 Submied: Jnury 5, 3 Acceped: Mrch 7, 3 Published: November, 3 Sysems Vribles

More information

PHYSICS 1210 Exam 1 University of Wyoming 14 February points

PHYSICS 1210 Exam 1 University of Wyoming 14 February points PHYSICS 1210 Em 1 Uniersiy of Wyoming 14 Februry 2013 150 poins This es is open-noe nd closed-book. Clculors re permied bu compuers re no. No collborion, consulion, or communicion wih oher people (oher

More information

Inventory Management Models with Variable Holding Cost and Salvage value

Inventory Management Models with Variable Holding Cost and Salvage value OSR Journl of Business nd Mngemen OSR-JBM e-ssn: -X p-ssn: 9-. Volume ssue Jul. - Aug. PP - www.iosrjournls.org nvenory Mngemen Models wi Vrile Holding os nd Slvge vlue R.Mon R.Venkeswrlu Memics Dep ollege

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

ANSWERS TO EVEN NUMBERED EXERCISES IN CHAPTER 2

ANSWERS TO EVEN NUMBERED EXERCISES IN CHAPTER 2 ANSWERS TO EVEN NUMBERED EXERCISES IN CHAPTER Seion Eerise -: Coninuiy of he uiliy funion Le λ ( ) be he monooni uiliy funion defined in he proof of eisene of uiliy funion If his funion is oninuous y hen

More information

1. Find a basis for the row space of each of the following matrices. Your basis should consist of rows of the original matrix.

1. Find a basis for the row space of each of the following matrices. Your basis should consist of rows of the original matrix. Mh 7 Exm - Prcice Prolem Solions. Find sis for he row spce of ech of he following mrices. Yor sis shold consis of rows of he originl mrix. 4 () 7 7 8 () Since we wn sis for he row spce consising of rows

More information

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information

Average & instantaneous velocity and acceleration Motion with constant acceleration

Average & instantaneous velocity and acceleration Motion with constant acceleration Physics 7: Lecure Reminders Discussion nd Lb secions sr meeing ne week Fill ou Pink dd/drop form if you need o swich o differen secion h is FULL. Do i TODAY. Homework Ch. : 5, 7,, 3,, nd 6 Ch.: 6,, 3 Submission

More information

Nonlinear System Modelling: How to Estimate the. Highest Significant Order

Nonlinear System Modelling: How to Estimate the. Highest Significant Order IEEE Insrumenion nd Mesuremen Technology Conference nchorge,, US, - My Nonliner Sysem Modelling: ow o Esime he ighes Significn Order Neophyos Chirs, Ceri Evns nd Dvid Rees, Michel Solomou School of Elecronics,

More information

A 1.3 m 2.5 m 2.8 m. x = m m = 8400 m. y = 4900 m 3200 m = 1700 m

A 1.3 m 2.5 m 2.8 m. x = m m = 8400 m. y = 4900 m 3200 m = 1700 m PHYS : Soluions o Chper 3 Home Work. SSM REASONING The displcemen is ecor drwn from he iniil posiion o he finl posiion. The mgniude of he displcemen is he shores disnce beween he posiions. Noe h i is onl

More information

A LIMIT-POINT CRITERION FOR A SECOND-ORDER LINEAR DIFFERENTIAL OPERATOR IAN KNOWLES

A LIMIT-POINT CRITERION FOR A SECOND-ORDER LINEAR DIFFERENTIAL OPERATOR IAN KNOWLES A LIMIT-POINT CRITERION FOR A SECOND-ORDER LINEAR DIFFERENTIAL OPERATOR j IAN KNOWLES 1. Inroducion Consider he forml differenil operor T defined by el, (1) where he funcion q{) is rel-vlued nd loclly

More information

Chapter Direct Method of Interpolation

Chapter Direct Method of Interpolation Chper 5. Direc Mehod of Inerpolion Afer reding his chper, you should be ble o:. pply he direc mehod of inerpolion,. sole problems using he direc mehod of inerpolion, nd. use he direc mehod inerpolns o

More information

ME 141. Engineering Mechanics

ME 141. Engineering Mechanics ME 141 Engineeing Mechnics Lecue 13: Kinemics of igid bodies hmd Shhedi Shkil Lecue, ep. of Mechnicl Engg, UET E-mil: sshkil@me.bue.c.bd, shkil6791@gmil.com Websie: eche.bue.c.bd/sshkil Couesy: Veco Mechnics

More information

1. Introduction. 1 b b

1. Introduction. 1 b b Journl of Mhemicl Inequliies Volume, Number 3 (007), 45 436 SOME IMPROVEMENTS OF GRÜSS TYPE INEQUALITY N. ELEZOVIĆ, LJ. MARANGUNIĆ AND J. PEČARIĆ (communiced b A. Čižmešij) Absrc. In his pper some inequliies

More information

Journal of Mathematical Analysis and Applications. Two normality criteria and the converse of the Bloch principle

Journal of Mathematical Analysis and Applications. Two normality criteria and the converse of the Bloch principle J. Mh. Anl. Appl. 353 009) 43 48 Conens liss vilble ScienceDirec Journl of Mhemicl Anlysis nd Applicions www.elsevier.com/loce/jm Two normliy crieri nd he converse of he Bloch principle K.S. Chrk, J. Rieppo

More information

Distributed Quickest Detection of Cyber-Attacks in Smart Grid

Distributed Quickest Detection of Cyber-Attacks in Smart Grid 1 Disribued Quickes Deecion of Cyber-Acks in Smr Grid Mehme Necip Kur, Ysin Yılmz, Member, IEEE, nd Xiodong Wng, Fellow, IEEE Absrc In his pper, online deecion of flse d injecion FDI) cks nd denil of service

More information

Sequential Importance Resampling (SIR) Particle Filter

Sequential Importance Resampling (SIR) Particle Filter Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle

More information

A NUMERICAL SOLUTION OF THE URYSOHN-TYPE FREDHOLM INTEGRAL EQUATIONS

A NUMERICAL SOLUTION OF THE URYSOHN-TYPE FREDHOLM INTEGRAL EQUATIONS A NUMERICAL SOLUTION OF THE URYSOHN-TYPE FREDHOLM INTEGRAL EQUATIONS A JAFARIAN 1,, SA MEASOOMY 1,b, ALIREZA K GOLMANKHANEH 1,c, D BALEANU 2,3,4 1 Deprmen of Mhemics, Urmi Brnch, Islmic Azd Universiy,

More information

USING ITERATIVE LINEAR REGRESSION MODEL TO TIME SERIES MODELS

USING ITERATIVE LINEAR REGRESSION MODEL TO TIME SERIES MODELS Elecronic Journl of Applied Sisicl Anlysis EJASA (202), Elecron. J. App. S. Anl., Vol. 5, Issue 2, 37 50 e-issn 2070-5948, DOI 0.285/i20705948v5n2p37 202 Universià del Sleno hp://sib-ese.unile.i/index.php/ejs/index

More information

One Practical Algorithm for Both Stochastic and Adversarial Bandits

One Practical Algorithm for Both Stochastic and Adversarial Bandits One Prcicl Algorihm for Boh Sochsic nd Adversril Bndis Full Version Including Appendices Yevgeny Seldin Queenslnd Universiy of Technology, Brisbne, Ausrli Aleksndrs Slivkins Microsof Reserch, New York

More information

Honours Introductory Maths Course 2011 Integration, Differential and Difference Equations

Honours Introductory Maths Course 2011 Integration, Differential and Difference Equations Honours Inroducory Mhs Course 0 Inegrion, Differenil nd Difference Equions Reding: Ching Chper 4 Noe: These noes do no fully cover he meril in Ching, u re men o supplemen your reding in Ching. Thus fr

More information

INTEGRALS. Exercise 1. Let f : [a, b] R be bounded, and let P and Q be partitions of [a, b]. Prove that if P Q then U(P ) U(Q) and L(P ) L(Q).

INTEGRALS. Exercise 1. Let f : [a, b] R be bounded, and let P and Q be partitions of [a, b]. Prove that if P Q then U(P ) U(Q) and L(P ) L(Q). INTEGRALS JOHN QUIGG Eercise. Le f : [, b] R be bounded, nd le P nd Q be priions of [, b]. Prove h if P Q hen U(P ) U(Q) nd L(P ) L(Q). Soluion: Le P = {,..., n }. Since Q is obined from P by dding finiely

More information

Fault-Tolerant Guaranteed Cost Control of Uncertain Networked Control Systems with Time-varying Delay

Fault-Tolerant Guaranteed Cost Control of Uncertain Networked Control Systems with Time-varying Delay IJCSNS Inernionl Journl of Compuer Science nd Nework Securiy VOL.9 No.6 June 9 3 Ful-olern Gurneed Cos Conrol of Uncerin Neworked Conrol Sysems wih ime-vrying Dely Guo Yi-nn Zhng Qin-ying Chin Universiy

More information

SOME USEFUL MATHEMATICS

SOME USEFUL MATHEMATICS SOME USEFU MAHEMAICS SOME USEFU MAHEMAICS I is esy o mesure n preic he behvior of n elecricl circui h conins only c volges n currens. However, mos useful elecricl signls h crry informion vry wih ime. Since

More information

Asymptotic relationship between trajectories of nominal and uncertain nonlinear systems on time scales

Asymptotic relationship between trajectories of nominal and uncertain nonlinear systems on time scales Asympoic relionship beween rjecories of nominl nd uncerin nonliner sysems on ime scles Fim Zohr Tousser 1,2, Michel Defoor 1, Boudekhil Chfi 2 nd Mohmed Djemï 1 Absrc This pper sudies he relionship beween

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

Hermite-Hadamard-Fejér type inequalities for convex functions via fractional integrals

Hermite-Hadamard-Fejér type inequalities for convex functions via fractional integrals Sud. Univ. Beş-Bolyi Mh. 6(5, No. 3, 355 366 Hermie-Hdmrd-Fejér ype inequliies for convex funcions vi frcionl inegrls İmd İşcn Asrc. In his pper, firsly we hve eslished Hermie Hdmrd-Fejér inequliy for

More information

Introduction to LoggerPro

Introduction to LoggerPro Inroducion o LoggerPro Sr/Sop collecion Define zero Se d collecion prmeers Auoscle D Browser Open file Sensor seup window To sr d collecion, click he green Collec buon on he ool br. There is dely of second

More information

Solutions to Problems from Chapter 2

Solutions to Problems from Chapter 2 Soluions o Problems rom Chper Problem. The signls u() :5sgn(), u () :5sgn(), nd u h () :5sgn() re ploed respecively in Figures.,b,c. Noe h u h () :5sgn() :5; 8 including, bu u () :5sgn() is undeined..5

More information

Solutions for Nonlinear Partial Differential Equations By Tan-Cot Method

Solutions for Nonlinear Partial Differential Equations By Tan-Cot Method IOSR Journl of Mhemics (IOSR-JM) e-issn: 78-578. Volume 5, Issue 3 (Jn. - Feb. 13), PP 6-11 Soluions for Nonliner Pril Differenil Equions By Tn-Co Mehod Mhmood Jwd Abdul Rsool Abu Al-Sheer Al -Rfidin Universiy

More information

Aircraft safety modeling for time-limited dispatch

Aircraft safety modeling for time-limited dispatch Loughborough Universiy Insiuionl Reposiory Aircrf sfey modeling for ime-limied dispch This iem ws submied o Loughborough Universiy's Insiuionl Reposiory by he/n uhor. Ciion: PRESCOTT, D.R. nd ANDREWS,

More information

Aho-Corasick Automata

Aho-Corasick Automata Aho-Corsick Auom Sring D Srucures Over he nex few dys, we're going o be exploring d srucures specificlly designed for sring processing. These d srucures nd heir vrins re frequenly used in prcice Looking

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

IX.2 THE FOURIER TRANSFORM

IX.2 THE FOURIER TRANSFORM Chper IX The Inegrl Trnsform Mehods IX. The Fourier Trnsform November, 7 7 IX. THE FOURIER TRANSFORM IX.. The Fourier Trnsform Definiion 7 IX.. Properies 73 IX..3 Emples 74 IX..4 Soluion of ODE 76 IX..5

More information

A Forward-Looking Nash Game and Its Application to Achieving Pareto-Efficient Optimization

A Forward-Looking Nash Game and Its Application to Achieving Pareto-Efficient Optimization Applied Mhemics 013 4 1609-1615 Pulished Online Decemer 013 (hp://wwwscirporg/journl/m) hp://ddoiorg/10436/m0134118 A Forwrd-Looing Nsh Gme nd Is Applicion o Achieving Preo-Efficien Opimizion Jie Ren 1

More information

Bipartite Matching. Matching. Bipartite Matching. Maxflow Formulation

Bipartite Matching. Matching. Bipartite Matching. Maxflow Formulation Mching Inpu: undireced grph G = (V, E). Biprie Mching Inpu: undireced, biprie grph G = (, E).. Mching Ern Myr, Hrld äcke Biprie Mching Inpu: undireced, biprie grph G = (, E). Mflow Formulion Inpu: undireced,

More information