Efficient Optimal Learning for Contextual Bandits

Size: px
Start display at page:

Download "Efficient Optimal Learning for Contextual Bandits"

Transcription

1 fficien Opiml Lerning for Conexul Bndis Miroslv Dudik Dniel Hsu Syen Kle Nikos Krmpzikis John Lngford Lev Reyzin Tong Zhng Absrc We ddress he problem of lerning in n online seing where he lerner repeedly observes feures, selecs mong se of cions, nd receives rewrd for he cion ken We provide he firs efficien lgorihm wih n opiml regre Our lgorihm uses cos sensiive clssificion lerner s n orcle nd hs running ime polylogn, where N is he number of clssificion rules mong which he orcle migh choose This is exponenilly fser hn ll previous lgorihms h chieve opiml regre in his seing Our formulion lso enbles us o cree n lgorihm wih regre h is ddiive rher hn muliplicive in feedbck dely s in ll previous work INTRODUCTION The conexul bndi seing consiss of he following loop repeed indefiniely: The world presens conex informion s feures x 2 The lerning lgorihm chooses n cion from K possible cions 3 The world presens rewrd r for he cion The key difference beween he conexul bndi seing nd sndrd supervised lerning is h only he rewrd of he chosen cion is reveled For exmple, fer lwys choosing he sme cion severl imes in row, he feedbck given provides lmos no bsis o prefer he chosen cion over noher cion In essence, he conexul bndi seing cpures he difficuly of explorion while voiding he difficuly of credi ssignmen s in more generl reinforcemen lerning seings The conexul bndi seing is hlf-wy poin beween sndrd supervised lerning nd full-scle reinforcemen lerning where i ppers possible o consruc lgorihms wih convergence re gurnees similr o supervised lerning Mny nurl seings sisfy his hlf-wy poin, moiving he invesigion of conexul bndi lerning For exmple, he problem of choosing ineresing news ricles or ds for users by inerne compnies cn be nurlly modeled s conexul bndi seing In he medicl domin where discree remens re esed before pprovl, he process of deciding which piens re eligible for remen kes conexs ino ccoun More generlly, we cn imgine h in fuure wih personlized medicine, new remens re essenilly equivlen o new cions in conexul bndi seing In he iid seing, he world drws pir x, r consising of conex nd rewrd vecor from some unknown disribuion D, reveling x in Sep, bu only he rewrd r of he chosen cion in Sep 3 Given se of policies Π = {π : X A}, he gol is o cree n lgorihm for Sep 2 which compees wih he se of policies We mesure our success by compring he lgorihm s cumulive rewrd o he expeced cumulive rewrd of he bes policy in he se The difference of he wo is clled regre All exising lgorihms for his seing eiher chieve subopiml regre Lngford nd Zhng, 2007 or require compuion liner in he number of policies Auer e l, 2002b; Beygelzimer e l, 20 In unsrucured policy spces, his compuionl complexiy is he bes one cn hope for On he oher hnd, in he cse where he rewrds of ll cions re reveled, he problem is equivlen o cos-sensiive clssificion, nd we know of lgorihms o efficienly serch he spce of policies clssificion rules such s cos-sensiive logisic regression nd suppor vecor mchines In hese cses, he spce of clssific-

2 ion rules is exponenil in he number of feures, bu hese problems cn be efficienly solved using convex opimizion Our gol here is o efficienly solve he conexul bndi problems for similrly lrge policy spces We do his by reducing he conexul bndi problem o cos-sensiive clssificion Given supervised cos-sensiive lerning lgorihm s n orcle Beygelzimer e l, 2009, our lgorihm runs in ime only polylogn while chieving regre O T K ln N, where N is he number of possible policies clssificion rules, K is he number of cions clsses, nd T is he number of ime seps This efficiency is chieved in modulr wy, so ny fuure improvemen in cossensiive lerning immediely pplies here PRVIOUS WORK AND MOTIVATION All previous regre-opiml pproches re mesure bsed hey work by upding mesure over policies, n operion which is liner in he number of policies In conrs, regre gurnees scle only logrihmiclly in he number of policies If no for he compuionl boleneck, hese regre gurnees imply h we could drmiclly increse performnce in conexul bndi seings using more expressive policies We overcome he compuionl boleneck using n lgorihm which works by creing cos-sensiive clssificion insnces nd clling n orcle o choose opiml policies Acions re chosen bsed on he policies reurned by he orcle rher hn ccording o mesure over ll policies This is reminiscen of AdBoos Freund nd Schpire, 997, which crees weighed binry clssificion insnces nd clls wek lerner orcle o obin clssificion rules These clssificion rules re hen combined ino finl clssifier wih boosed ccurcy Similrly s AdBoos convers wek lerner ino srong lerner, our pproch convers cos-sensiive clssificion lerner ino n lgorihm h solves he conexul bndi problem In more difficul version of conexul bndis, n dversry chooses x, r given knowledge of he lerning lgorihm bu no ny rndom numbers All known regre-opiml soluions in he dversril seing re vrins of he XP4 lgorihm Auer e l, 2002b XP4 chieves he sme regre re s our lgorihm: KT O ln N, where T is he number of ime seps, K is he number of cions vilble in ech ime sep, nd N is he number of policies Why no use XP4 in he iid seing? For exmple, i is known h he lgorihm cn be modified o succeed wih high probbiliy Beygelzimer e l, 20, nd lso for VC clsses when he dversry is consrined o iid smpling There re wo cenrl benefis h we hope o relize by direcly ssuming iid conexs nd rewrd vecors Compuionl Trcbiliy ven when he rewrd vecor is fully known, dversril regres ln scle s O N while compuion scles s ON in generl One emp o ge round his is he follow-he-perurbed-leder lgorihm Kli nd Vempl, 2005 which provides compuionlly rcble soluion in cerin specil-cse srucures This lgorihm hs no mechnism for efficien pplicion o rbirry policy spces, even given n efficien cos-sensiive clssificion orcle An efficien cos-sensiive clssificion orcle hs been shown effecive in rnsducive seings Kkde nd Kli, 2005 Aside from he drwbck of requiring rnsducive seing, he regre chieved here is subsnilly worse hn for XP4 2 Improved Res When he world is no compleely dversril, i is possible o chieve subsnilly lower regres hn re possible wih lgorihms opimized for he dversril seing For exmple, in supervised lerning, i is possible o obin regres scling s OlogT wih problem dependen consn Brle e l, 2007 When he feedbck is delyed by τ rounds, lower bounds imply h he regre in he dversril seing increses by muliplicive τ while in he iid seing, i is possible o chieve n ddiive regre of τ Lngford e l, 2009 In direc iid seing, he previous-bes pproch using cos-sensiive clssificion orcle ws given by ɛ-greedy nd epoch greedy lgorihms Lngford nd Zhng, 2007 which hve regre scling s OT 2/3 in he wors cse There hve lso been mny specil-cse nlyses For exmple, heory of conex-free seing is well undersood Li nd Robbins, 985; Auer e l, 2002; ven-dr e l, 2006 Similrly, good lgorihms exis when rewrds re liner funcions of feures Auer, 2002 or cions lie in coninuous spce wih he rewrd funcion smpled ccording o Gussin process Srinivs e l, WHAT W PROV In Secion 3 we se he Policyliminion lgorihm, nd prove he following regre bound for i Theorem 4 For ll disribuions D over x, r wih K cions, for ll ses of N policies Π, wih probbil-

3 iy les, he regre of Policyliminion Algorihm over T rounds is mos 6 2T K ln 4T 2 N This resul cn be exended o del wih VC clsses, s well s oher specil cses I forms he simples mehod we hve of exhibiing he new nlysis The new key elemen of his lgorihm is idenificion of disribuion over cions which simulneously chieves smll expeced regre nd llows esiming vlue of every policy wih smll vrince The exisence of such disribuion is shown nonconsrucively by minimx rgumen Policyliminion is compuionlly inrcble nd lso requires exc knowledge of he conex disribuion bu no he rewrd disribuion! We show how o ddress hese issues in Secion 4 using n lgorihm we cll RndomizedUCB Nmely, we prove he following heorem Theorem 5 For ll disribuions D over x, r wih K cions, for ll ses of N policies Π, wih probbiliy les, he regre of RndomizedUCB Algorihm 2 over T rounds is mos O T K log T N/ + K lognk/ RndomizedUCB s nlysis is subsnilly more complex, wih key subrouine being n pplicion of he ellipsoid lgorihm wih cossensiive clssificion orcle described in Secion 5 RndomizedUCB does no ssume knowledge of he conex disribuion, nd insed works wih he hisory of conexs i hs observed Modifying he proof for his empiricl disribuion requires covering rgumen over he disribuions over policies which uses he probbilisic mehod The ne resul is n lgorihm wih similr op-level nlysis s Policyliminion, bu wih he running ime only poly-logrihmic in he number of policies given cossensiive clssificion orcle Theorem In ech ime sep, RndomizedUCB mkes mos Opoly, K, log/, log N clls o cos-sensiive clssificion orcle, nd requires ddiionl Opoly, K, log N processing ime Apr from rcble lgorihm, our nlysis cn be used o derive igher regres hn would be possible in dversril seing For exmple, in Secion 6, we consider common seing where rewrd feedbck is delyed by τ rounds A srighforwrd modificion of Policyliminion yields regre wih n ddiive erm proporionl o τ compred wih he dely-free seing Nmely, we prove he following Theorem 2 For ll disribuions D over x, r wih K cions, for ll ses of N policies Π, nd ll dely inervls τ, wih probbiliy les, he regre of DelyedP Algorihm 3 is mos 6 2K ln 4T 2 N τ + T We sr nex wih precise seings nd definiions 2 STTING AND DFINITIONS 2 TH STTING Le A be he se of K cions, le X be he domin of conexs x, nd le D be n rbirry join disribuion on x, r We denoe he mrginl disribuion of D over X by D X We denoe Π o be finie se of policies {π : X A}, where ech policy π, given conex x in round, chooses he cion πx The crdinliy of Π is denoed by N Le r 0, K be he vecor of rewrds, where r is he rewrd of cion on round In he iid seing, on ech round = T, he world chooses x, r iid ccording o D nd revels x o he lerner The lerner, hving ccess o Π, chooses cion {,, K} Then he world revels rewrd r which we cll r for shor o he lerner, nd he inercion proceeds o he nex round We consider wo modes of ccessing he se of policies Π The firs opion is hrough he enumerion of ll policies This is imprcicl in generl, bu suffices for he illusrive purpose of our firs lgorihm The second opion is n orcle ccess, hrough n rgmx orcle, corresponding o cos-sensiive lerner: Definiion For se of policies Π, n rgmx orcle AMO for shor, is n lgorihm, which for ny sequence {x, r } =, x X, r R K, compues rg mx r πx π Π = The reson why he bove cn be viewed s cossensiive clssificion orcle is h vecors of rewrds r cn be inerpreed s negive coss nd hence he policy reurned by AMO is he opiml cos-sensiive clssifier on he given d 22 XPCTD AND MPIRICAL RWARDS Le he expeced insnneous rewrd of policy π Π be denoed by η D π = rπx x, r D

4 The bes policy π mx Π is h which mximizes η D π More formlly, π mx = rgmx η D π π Π We define h o be he hisory ime h he lerner hs seen Specificlly h = x,, r, p, = where p is he probbiliy of he lgorihm choosing cion ime Noe h nd p re produced by he lerner while x, r re produced by nure We wrie x h o denoe choosing x uniformly rndom from he x s in hisory h Using he hisory of ps cions nd probbiliies wih which hey were ken, we cn form n unbised esime of he policy vlue for ny π Π: η π = riπx = p x,,r,p h The unbisedness follows, becuse p riπx= p = p riπx= p = rπx The empiriclly bes policy ime is denoed 23 RGRT π = rgmx η π π Π The gol of his work is o obin lerner h hs smll regre relive o he expeced performnce of π mx over T rounds, which is η D π mx r 2 =T We sy h he regre of he lerner over T rounds is bounded by ɛ wih probbiliy les, if Pr η D π mx r ɛ =T where he probbiliy is ken wih respec o he rndom pirs x, r D for = T, s well s ny inernl rndomness used by he lerner We cn lso define noions of regre nd empiricl regre for policies π For ll π Π, le D π = η D π mx η D π, π = η π η π Our lgorihms work by choosing disribuions over policies, which in urn hen induce disribuions over cions For ny disribuion P over policies Π, le W P x, denoe he induced condiionl disribuion over cions given he conex x: W P x, = P π 22 π Π:πx= In generl, we shll use W, W nd Z s condiionl probbiliy disribuions over he cions A given conexs X, ie, W : X A 0, such h W x, is probbiliy disribuion over A nd similrly for W nd Z We shll hink of W s smoohed version of W wih minimum cion probbiliy of µ o be defined by he lgorihm, such h W x, = KµW x, + µ Condiionl disribuions such s W nd W, Z, ec correspond o rndomized policies We define noions rue nd empiricl vlue nd regre for hem s follows: η D W = x, r D r W x η W = rw x, p x,,r,p h D W = η D π mx η D W W = η π η W 3 POLICY LIMINATION The bsic ides behind our pproch re demonsred in our firs lgorihm: Policyliminion Algorihm The key sep is Sep, which finds disribuion over policies which induces low vrince in he esime of he vlue of ll policies Below we use minimx heorem o show h such disribuion lwys exiss How o find his disribuion is no specified here, bu in Secion 5 we develop mehod bsed on he ellipsoid lgorihm Sep 2 hen projecs his disribuion ono disribuion over cions nd pplies smoohing Finlly, Sep 5 elimines he policies h hve been deermined o be subopiml wih high probbiliy ALGORITHM ANALYSIS We nlyze Policyliminion in severl seps Firs, we prove he exisence of P in Sep, provided h Π is non-empy We recs he fesibiliy problem in Sep s gme beween wo plyers: Prover, who is rying o produce P, nd Flsifier, who is rying o find π violing he consrins We give more power o Flsifier nd llow him o choose disribuion over π ie, rndomized policy which would viole he consrins

5 Algorihm PolicyliminionΠ,,K,D X Le Π 0 = Π nd hisory h 0 = Define: = / 4N 2 2K ln/ Define: b = 2 { } Define: µ = min 2K, ln/ 2K For ech imesep = T, observe x nd do: Choose disribuion P over Π s π Π : 2K x D X Kµ W P x, πx + µ 2 Le W = Kµ W P x, +µ for ll A 3 Choose W 4 Observe rewrd r { 5 Le Π = π Π : } η π mx η π 2b π Π 6 Le h = h x,, r, W Noe h ny policy π corresponds o poin in he spce of rndomized policies viewed s funcions X A 0,, wih πx, = Iπx = For ny disribuion P over policies in Π, he induced rndomized policy W P hen corresponds o poin in he convex hull of Π Denoing he convex hull of Π by C, Prover s choice by W nd Flsifier s choice by Z, he fesibiliy of Sep follows by he following lemm: Lemm Le C be compc nd convex se of rndomized policies Le µ 0, /K nd for ny W C, W x, = KµW x, + µ Then for ll disribuions D, min mx W C Z C x D X Zx, W x, K Kµ Proof Le fw, Z = x D X Zx, /W x, denoe he inner expression of he minimx problem Noe h fw, Z is: everywhere defined: Since W x, µ, we obin h /W x, 0, /µ, hence he expecions re defined for ll W nd Z liner in Z: fw, Z s fw, Z = Lineriy follows from rewriing Zx, x D X W x, A convex in W : Noe h /W x, is convex in W x, by convexiy of /c w +c 2 in w 0, for c 0, c 2 > 0 Convexiy of fw, Z in W hen follows by king expecions over x nd Hence, by Theorem 4 in Appendix B, min nd mx cn be reversed wihou ffecing he vlue: min mx W C Z C fw, Z = mx min fw, Z Z C W C The righ-hnd side cn be furher upper-bounded by mx Z C fz, Z, which is upper-bounded by Zx, fz, Z = x D X Z x, x D X A: Zx,>0 A Zx, KµZx, = K Kµ Corollry 2 The se of disribuions sisfying consrins of Sep is non-empy Given he exisence of P, we will see below h he consrins in Sep ensure low vrince of he policy vlue esimor η π for ll π Π The smll vrince is used o ensure ccurcy of policy eliminion in Sep 5 s qunified in he following lemm: Lemm 3 Wih probbiliy les, for ll : π mx Π ie, Π is non-empy 2 η D π mx η D π 4b for ll π Π Proof We will show h for ny policy π Π, he probbiliy h η π devies from η D π by more h b is mos 2 Tking he union bound over ll policies nd ll ime seps we find h wih probbiliy les, for ll nd ll π Π Then: η π η D π b 3 By he ringle inequliy, in ech ime sep, η π η π mx + 2b for ll π Π, yielding he firs pr of he lemm 2 Also by he ringle inequliy, if η D π < η D π mx 4b for π Π, hen η π < η π mx 2b Hence he policy π is elimined in Sep 5, yielding he second pr of he lemm I remins o show q 3 We fix he policy π Π nd ime, nd show h he deviion bound is violed wih probbiliy mos 2 Our rgumen

6 ress on Freedmn s inequliy see Theorem 3 in Appendix A Le y = r Iπx = W ie, η π = = y / Le denoe he condiionl expecion h To use Freedmn s inequliy, we need o bound he rnge of y nd is condiionl second momen y 2 Since r 0, nd W µ, we hve he bound 0 y /µ = R Nex, y 2 = y 2 x, r D W r 2 = Iπx = x, r D W W 2 W πx x, r D W πx 2 32 = x D W πx 2K 33 where q 32 follows by boundedness of r nd q 33 follows from he consrins in Sep Hence, y 2 2K = V = Since ln / is decresing for 3, we obin h µ is non-incresing by seprely nlyzing =, = 2, 3 Le 0 be he firs such h µ < /2K Noe h b 4Kµ, so for < 0, we hve b 2 nd Π = Π Hence, he deviion bound holds for < 0 Le 0 For, by he monooniciy of µ 2K R = /µ /µ = ln/ = V ln/ Hence, he ssumpions of Theorem 3 re sisfied, nd Pr η π η D π b 2 The union bound over π nd yields q 3 This immediely implies h he cumulive regre is bounded by η D π mx r 8 2K ln 4NT 2 T =T = 6 2T K ln 4T 2 N 34 nd gives us he following heorem, Algorihm 2 RndomizedUCBΠ,,K Le h 0 = be he iniil hisory Define he following quniies: { N C = 2 log nd µ = min 2K, } C 2K For ech imesep = T, observe x nd do: Le P be disribuion over Π h pproximely solves he opimizion problem min P π π P π Π s for ll disribuions Q over Π : π Q Kµ i= W P x i, πx i + µ { W Q 2 } mx 4K, 80C 4 so h he objecive vlue P is wihin ε op, = O KC / of he opiml vlue, nd so h ech consrin is sisfied wih slck K 2 Le W be he disribuion over A given by for ll A 3 Choose W W = Kµ W P x, + µ 4 Observe rewrd r 5 Le h = h x,, r, W Theorem 4 For ll disribuions D over x, r wih K cions, for ll ses of N policies Π, wih probbiliy les, he regre of Policyliminion Algorihm over T rounds is mos 6 2T K ln 4T 2 N 4 TH RANDOMIZD UCB ALGORITHM Policyliminion is he simples exhibiion of he minimx rgumen, bu i hs some drwbcks: The lgorihm keeps explici rck of he spce of good policies like version spce, which is difficul o implemen efficienly in generl

7 2 If he opiml policy is miskenly elimined by chnce, he lgorihm cn never recover 3 The lgorihm requires perfec knowledge of he disribuion D X over conexs These difficulies re ddressed by RndomizedUCB or RUCB for shor, n lgorihm which we presen nd nlyze in his secion Our pproch is reminiscen of he UCB lgorihm Auer e l, 2002, developed for conex-free seing, which keeps n upperconfidence bound on he expeced rewrd for ech cion However, insed of choosing he highes upper confidence bound, we rndomize over choices ccording o he vlue of heir empiricl performnce The lgorihm hs he following properies: The opimizion sep required by he lgorihm lwys considers he full se of policies ie, explici rcking of he se of good policies is voided, nd hus i cn be efficienly implemened using n rgmx orcle We discuss his furher in Secion 5 2 Subopiml policies re implicily used wih decresing frequency by using non-uniform vrince consrin h depends on policy s esimed regre A consequence of his is bound on he vlue of he opimizion, sed in Lemm 7 below 3 Insed of D X, he lgorihm uses he hisory of previously seen conexs The effec of his pproximion is qunified in Theorem 6 below The regre of RndomizedUCB is he following: Theorem 5 For ll disribuions D over x, r wih K cions, for ll ses of N policies Π, wih probbiliy les, he regre of RndomizedUCB Algorihm 2 over T rounds is mos O T K log T N/ + K lognk/ The proof is given in Appendix D4 Here, we presen n overview of he nlysis 4 MPIRICAL VARIANC STIMATS A key echnicl prerequisie for he regre nlysis is he ccurcy of he empiricl vrince esimes For disribuion P over policies Π nd priculr policy π Π, define V P,π, = x D X Kµ W P x, πx + µ V P,π, = Kµ i= W P x i, πx i + µ The firs quniy V P,π, is bound on he vrince incurred by n impornce-weighed esime of rewrd in round using he cion disribuion induced by P, nd he second quniy V P,π, is n empiricl esime of V P,π, using he finie smple {x,, x } X drwn from D X We show h for ll disribuions P nd ll π Π, V P,π, is close o V P,π, wih high probbiliy Theorem 6 For ny ɛ 0,, wih probbiliy les, V P,π, + ɛ V P,π, ɛ 3 K for ll disribuions P over Π, ll π Π, nd ll 6K log8kn/ The proof ppers in Appendix C 42 RGRT ANALYSIS Cenrl o he nlysis is he following lemm h bounds he vlue of he opimizion in ech round I is direc corollry of Lemm 24 in Appendix D4 Lemm 7 If OPT is he vlue of he opimizion problem 4 in round, hen KC K logn/ OPT O = O This lemm implies h he lgorihm is lwys ble o selec disribuion over he policies h focuses mosly on he policies wih low esimed regre Moreover, he vrince consrins ensure h good policies never pper oo bd, nd h only bd policies re llowed o incur high vrince in heir rewrd esimes Hence, minimizing he objecive in 4 is n effecive surroge for minimizing regre The bulk of he nlysis consiss of nlyzing he vrince of he impornce-weighed rewrd esimes η π, nd showing how hey rele o heir cul expeced rewrds η D π The deils re deferred o Appendix D 5 USING AN ARGMAX ORACL In his secion, we show how o solve he opimizion problem 4 using he rgmx orcle AMO for our se of policies Nmely, we describe n lgorihm running in polynomil ime independen of he number of policies, which mkes queries o AMO o compue disribuion over policies suible for he opimizion sep of Algorihm 2 Or rher dependen only on log N, he represenion size of policy

8 This lgorihm relies on he ellipsoid mehod The ellipsoid mehod is generl echnique for solving convex progrms equipped wih seprion orcle A seprion orcle is defined s follows: Definiion 2 Le S be convex se in R n A seprion orcle for S is n lgorihm h, given poin x R n, eiher declres correcly h x S, or produces hyperplne H such h x nd S re on opposie sides of H We do no describe he ellipsoid lgorihm here since i is sndrd, bu only spell ou is key properies in he following lemm For poin x R n nd r 0, we use he noion Bx, r o denoe he l 2 bll of rdius r cenered x Lemm 8 Suppose we re required o decide wheher convex se S R n is empy or no We re given seprion orcle for S nd wo numbers R nd r, such h S B0, R nd if S is non-empy, hen here is poin x such h S Bx, r The ellipsoid lgorihm decides correcly if S is empy or no, by execuing mos On 2 log R r ierions, ech involving one cll o he seprion orcle nd ddiionl On 2 processing ime We now wrie convex progrm whose soluion is he required disribuion, nd show how o solve i using he ellipsoid mehod by giving seprion orcle for is fesible se using AMO Fix ime period Le X be he se of ll conexs seen so fr, ie X = {x, x 2,, x } We embed ll policies π Π in R K, wih coordines idenified wih x, X A Wih buse of noion, policy π is represened by he vecor π wih coordine πx, = if πx = nd 0 oherwise Le C be he convex hull of ll policy vecors π Recll h disribuion P over policies corresponds o poin inside C, ie, W P x, = π:πx= P π, nd h W x, = µ KW x, + µ, where µ is s defined in Algorihm 2 Also define β = 80C In he following, we use he noion x h o denoe conex drwn uniformly rndom from X Consider he following convex progrm: min s s W s 5 W C 52 Z C : x h Zx, W mx{4k, β Z 2 } 53 x, We clim h his progrm is equivlen o he RUCB opimizion problem 4, up o finding n explici disribuion over policies which corresponds o he opiml soluion This cn be seen s follows Since we require W C, i cn be inerpreed s being equl o W P for some disribuion over policies P The consrins 53 re equivlen o 4 by subsiuion Z = W Q The bove convex progrm cn be solved by performing binry serch over s nd esing fesibiliy of he consrins For fixed vlue of s, he fesibiliy problem defined by 5 53 is denoed by A We now give skech of how we consruc seprion orcle for he fesible region of A The deils of he lgorihm re bi compliced due o he fc h we need o ensure h he fesible region, when non-empy, hs non-negligible volume recll he requiremens of Lemm 8 This necessies hving smll error in sisfying he consrins of he progrm We leve he deils o Appendix Modulo hese deils, he consrucion of he seprion orcle essenilly implies h we cn solve A Before giving he consrucion of he seprion orcle, we firs show h AMO llows us o do liner opimizion over C efficienly: Lemm 9 Given vecor w R K, we cn compue rg mx Z C w Z using one invocion of AMO Proof The sequence for AMO consiss of x X nd r = wx, The lemm now follows since w π = x X wx, πx We need noher simple echnicl lemm which explins how o ge sepring hyperplne for violions of convex consrins: Lemm 0 For x R n, le fx be convex funcion of x, nd consider he convex se K defined by K = {x : fx 0} Suppose we hve poin y such h fy > 0 Le fy be subgrdien of f y Then he hyperplne fy + fy x y = 0 sepres y from K Proof Le gx = fy + fy x y By he convexiy of f, we hve fx gx for ll x Thus, for ny x K, we hve gx fx 0 Since gy = fy > 0, we conclude h gx = 0 sepres y from K Now given cndide poin W, seprion orcle cn be consruced s follows We check wheher W sisfies he consrins of A If ny consrin is violed, hen we find hyperplne sepring W from ll poins sisfying he consrin

9 Firs, for consrin 5, noe h η W is liner in W, nd so we cn compue mx π η π vi AMO s in Lemm 9 We cn hen compue η W nd check if he consrin is sisfied If no, hen he consrin, being liner, uomiclly yields sepring hyperplne 2 Nex, we consider consrin 52 To check if W C, we use he percepron lgorihm We shif he origin o W, nd run he percepron lgorihm wih ll poins π Π being posiive exmples The percepron lgorihm ims o find hyperplne puing ll policies π Π on one side In ech ierion of he percepron lgorihm, we hve cndide hyperplne specified by is norml vecor, nd hen if here is policy π h is on he wrong side of he hyperplne, we cn find i by running liner opimizion over C in he negive norml vecor direcion s in Lemm 9 If W / C, hen in bounded number of ierions depending on he disnce of W from C, nd he mximum mgniude π 2 we obin sepring hyperplne In pssing we lso noe h if W C, he sme echnique llows us o explicily compue n pproxime convex combinion of policies in Π h yields W This is done by running he percepron lgorihm s before nd sopping fer he bound on he number of ierions hs been reched Then we collec ll he policies we hve found in he run of he percepron lgorihm, nd we re gurneed h W is close in disnce o heir convex hull We cn hen find he closes poin in he convex hull of hese policies by solving simple qudric progrm 3 Finlly, we consider consrin 53 We rewrie η W s η W = w W, where wx, = r I = /W Thus, Z = v w Z, where v = mx π η π = mx π w π, which cn be compued by using AMO once Nex, using he cndide poin W, compue he vecor u defined s ux, = nx/ W x,, where n x is he number of imes x ppers in h, so h Zx, x h W x, = u Z Now, he problem reduces o finding policy Z C which violes he consrin u Z mx{4k, β w Z v 2 } Define fz = mx{4k, β w Z v 2 } u Z Noe h f is convex funcion of Z Finding poin Z h violes he bove consrin is equivlen o solving he following convex progrm: fz 0 54 Z C 55 To do his, we gin pply he ellipsoid mehod For his, we need seprion orcle for he progrm A seprion orcle for he consrins 55 cn be consruced s in Sep 2 bove For he consrins 54, if he cndide soluion Z hs fz > 0, hen we cn consruc sepring hyperplne s in Lemm 0 Suppose h fer solving he progrm, we ge poin Z C such h fz 0, ie W violes he consrin 53 for Z Then since consrin 53 is convex in W, we cn consruc sepring hyperplne s in Lemm 0 This complees he descripion of he seprion orcle Working ou he deils crefully yields he following heorem, proved in Appendix : Theorem There is n ierive lgorihm wih O 5 K 4 log 2 K ierions, ech involving one cll o AMO nd O 2 K 2 processing ime, h eiher declres correcly h A is infesible or oupus disribuion P over policies in Π such h W P sisfies x h where ɛ = 8 µ 2 Z C : Zx, mx{4k, β Z 2 } + 5ɛ x, W P W s + 2γ, nd γ = µ 6 DLAYD FDBACK In delyed feedbck seing, we observe rewrds wih τ sep dely ccording o: The world presens feures x 2 The lerning lgorihm chooses n cion {,, K} 3 The world presens rewrd r τ for he cion τ given he feures x τ We del wih dely by suibly modifying Algorihm o incorpore he dely τ, giving Algorihm 3 Now we cn prove he following heorem, which shows he dely hs n ddiive effec on regre Theorem 2 For ll disribuions D over x, r wih K cions, for ll ses of N policies Π, nd ll dely inervls τ, wih probbiliy les, he regre of DelyedP Algorihm 3 is mos 6 2K ln 4T 2 N τ + T

10 Algorihm 3 DelyedPΠ,,K,D X,τ Le Π 0 = Π nd hisory h 0 = Define: = / 4N 2 2K ln/ nd b = 2 { } Define: µ = min 2K, ln/ 2K For ech imesep = T, observe x nd do: Le = mx τ, 2 Choose disribuion P over Π s π Π : 2K Kµ W P x, πx + µ x D X 3 A, Le W = Kµ W P x, + µ 4 Choose W 5 Observe rewrd r { 6 Le Π = π Π : } η h π mx η h π 2b π Π 7 Le h = h x,, r, W Proof ssenilly s Theorem 4 The vrince bound is unchnged becuse i depends only on he conex disribuion Thus, i suffices o replce T wih τ + T +τ =τ+ τ = τ + T = in q 34 Acknowledgemens We hnk Alin Beygelzimer, who helped in severl formive discussions References Peer Auer Using confidence bounds for exploiionexplorion rde-offs Journl of Mchine Lerning Reserch, 3: , 2002 Peer Auer, Nicolò Ces-Binchi, nd Pul Fischer Finieime nlysis of he mulirmed bndi problem Mchine Lerning, 472 3: , 2002 Peer Auer, Nicolò Ces-Binchi, Yov Freund, nd Rober Schpire The nonsochsic mulirmed bndi problem SIAM Journl of Compuing, 32:48 77, 2002b P L Brle, Hzn, nd A Rkhlin Adpive online grdien descen In NIPS, 2007 Alin Beygelzimer, John Lngford, nd Prdeep Rvikumr rror correcing ournmens In ALT, 2009 Alin Beygelzimer, John Lngford, Lihong Li, Lev Reyzin, nd Rober Schpire Conexul bndi lgorihms wih supervised lerning gurnees In AISTATS, 20 yl ven-dr, Shie Mnnor, nd Yishy Mnsour Acion eliminion nd sopping condiions for he muli-rmed bndi nd reinforcemen lerning problems Journl of Mchine Lerning Reserch, 7:079 05, 2006 Dvid A Freedmn On il probbiliies for mringles Annls of Probbiliy, 3:00 8, 975 Y Freund nd R Schpire A decision-heoreic generlizion of on-line lerning nd n pplicion o boosing Journl of Compuer nd Sysem Sciences, 55: 9 39, 997 Shm M Kkde nd Adm Kli From bch o rnsducive online lerning In NIPS, 2005 Adm Tumn Kli nd Snosh Vempl fficien lgorihms for online decision problems J Compu Sys Sci, 73:29 307, 2005 Tze Leung Li nd Herber Robbins Asympoiclly efficien dpive llocion rules Advnces in Applied Mhemics, 6:4 22, 985 J Lngford, A Smol, nd M Zinkevich Slow lerners re fs In NIPS, 2009 John Lngford nd Tong Zhng The epoch-greedy lgorihm for conexul muli-rmed bndis In NIPS, 2007 Murice Sion On generl minimx heorems Pcific J Mh, 8:7 76, 958 Nirnjn Srinivs, Andres Kruse, Shm Kkde, nd Mhis Seeger Gussin process opimizion in he bndi seing: No regre nd experimenl design In ICML, 200 A Concenrion Inequliy The following is n immedie corollry of Theorem of Beygelzimer e l, 20 I cn be viewed s version of Freedmn s Inequliy Freedmn, 975 Le y,, y T be sequence of rel-vlued rndom vribles Le denoe he condiionl expecion y,, y nd V condiionl vrince Theorem 3 Freedmn-syle Inequliy Le V, R R such h T = V y V, nd for ll, y y R Then for ny > 0 such h R V/ ln2/, wih probbiliy les, B T y = T y 2 V ln2/ = Minimx Theorem The following is coninuous version of Sion s Minimx Theorem Sion, 958, Theorem 34 Theorem 4 Le W nd Z be compc nd convex ses, nd f : W Z R funcion which for ll Z Z is convex nd coninuous in W nd for ll W W is concve nd coninuous in Z Then min mx fw, Z = mx W W Z Z Z Z min fw, Z W W

11 C mpiricl Vrince Bounds nd hen pplying he AM/GM inequliy In his secion we prove Theorem 6 We firs show uniform convergence for cerin clss of policy disribuions Lemm 5, nd rgue h ech disribuion P is close o some disribuion P from his clss, in he sense h V P,π, is close o V P,π, nd V P,π, is close o V P,π, Lemm 6 Togeher, hey imply he min uniform convergence resul in Theorem 6 For ech posiive ineger m, le Sprsem be he se of disribuions P over Π h cn be wrien s P π = m Iπ = π i m i= ie, he verge of m del funcions for some π,, π m Π In our nlysis, we pproxime n rbirry disribuion P over Π by disribuion P Sprsem chosen rndomly by independenly drwing π,, π m P ; we denoe his process by P P m Lemm 5 Fix posiive inegers m, m 2, Wih probbiliy les over he rndom smples x, x 2, from D X, Lemm 6 Fix ny γ 0,, nd ny x X For ny disribuion P over Π nd ny π Π, if hen P P m m = 6 γ 2, µ Kµ W P x, πx + µ Kµ W P x, πx + µ γ Kµ W P x, πx + µ V P,π, + λ V P,π, λ m + log N + log 2 2 µ for ll λ > 0, ll, ll π Π, nd ll disribuions P Sprsem Proof Le Z P,π, x = Kµ W P x, πx + µ so V P,π, = x D X Z P,π, x nd V P,π, = i= Z P,π, x i Also le ε = log Sprsem N2 2 / µ = m + log N + log 2 2 µ We pply Bernsein s inequliy nd union bounds over P Sprsem, π Π, nd so h wih probbiliy les, V P,π, V P,π, + 2V P,π, ε + 2/3ε ll, ll π Π, nd ll disribuions P Sprsem The conclusion follows by solving he qudric inequliy for V P,π, o ge V P,π, V P,π, + 2 V P,π, ε + 5ε This implies h for ll disribuions P over Π nd ny π Π, here exiss P Sprsem such h for ny λ > 0, V P,π, V P,π, + + λ V P,π, V P,π, γv P,π, + + λ V P,π, Proof We rndomly drw P P m, wih P π m m i= Iπ = π i, nd hen define z = π Π P π Iπ x = πx nd ẑ = π Π P π Iπ x = πx We hve z = π P Iπ x = πx nd ẑ = m m i= Iπ ix = πx In oher words, ẑ is he verge of m independen Bernoulli rndom vribles, ech wih men z Thus, P P mẑ z 2 = z z/m nd Pr P P mẑ z/2 exp mz/8 by Chernoff =

12 bound We hve P P Kµ m ẑ + µ Kµ z + µ Kµ ẑ z P P Kµ m ẑ + µ Kµ z + µ Kµ ẑ z Iẑ 05z P P 05 Kµ m z + µ 2 Kµ ẑ z Iẑ 05z + P P µ m Kµ z + µ Kµ P P m ẑ z 2 05 Kµ z + µ 2 + Kµ z Pr P P mẑ 05z µ Kµ z + µ Kµ z/m 052 Kµ zµ Kµ z + µ + Kµ z exp mz/8 µ Kµ z + µ γ Kµ z/m z6/m Kµ z + µ + Kµ γ 2 mz exp mz/8, 6 Kµ z + µ where he hird inequliy follows from Jensen s inequliy, nd he fourh inequliy uses he AM/GM inequliy in he denominor of he firs erm nd he previous observions in he numerors The finl expression simplifies o he firs desired displyed inequliy by observing h mz exp mz/8 3 for ll mz 0 he mximum is chieved mz = 8 The second displyed inequliy follows from he following fcs: V P,π, V P,π, γv P,π,, P P m + λ V P,π, V P,π, γ + λ V P,π, P P m Boh inequliies follow from he firs displyed bound of he lemm, by king expecion wih respec o he rue nd empiricl disribuions over x The desired bound follows by dding he bove wo inequliies, which implies h he bound holds in expecion, nd hence he exisence of P for which he bound holds Now, we cn prove Theorem 6 Proof of Theorem 6 Le m = 6 λ 2 µ for some λ 0, /5 o be deermined nd condiion on he probbiliy even from Lemm 5 h V P,π, + λ V P,π, K 5 + 2λ K 5 + λ m + logn + log2 2 / Kµ m + logn + log2 2 / Kµ for ll 2, ll P Sprsem, nd ll π Π Using he definiions of m nd µ, he second erm is mos 40/λ 2 + /λ K for ll 6K log8kn/: he key here is h for 6K log8kn/, we hve µ = logn//k /2K nd herefore m logn Kµ 6 λ 2 nd logn + log2 2 / Kµ 2 Now fix 6K log8kn/, π Π, nd disribuion P over Π Le P Sprsem be he disribuion gurneed by Lemm 6 wih γ = λ sisfying V P,π, V P,π, + λ V P,π, + + λ 2 VP,π, λ Subsiuing he previous bound for V P,π, + λ V P,π, gives V P,π, 40 λ λ 2 + /λk + + λ2 VP,π, This cn be bounded s + ɛ V P,π, /ɛ 3 K by seing λ = ɛ/5 D Anlysis of RndomizedUCB D Preliminries Firs, we define he following consns ɛ 0, is fixed consn, nd ρ = 7500 ɛ is he fcor h ppers in he bound 3 from Theorem 6 θ = ρ + / + ɛ/2 = 2 ɛ ɛ 5 3 is consn cenrl o Lemm 2, which bounds he vrince of he opiml policy s esimed rewrds Recll he lgorihm-specific quniies N C = 2 log { } µ = min 2K, C 2K

13 I cn be checked h µ is non-incresing We define he following ime indices: 0 is he firs round in which µ = C /2K Noe h 8K 0 8K lognk/ := 6K log8kn/ is he round given by Theorem 6 such h, wih probbiliy les, x D X W πx + ɛ x h W P,µ x, πx + ρk D for ll π Π nd ll, where W P,µ x, is he disribuion over A given by W P,µ x, = KµW P x, + µ, nd he noion x h denoes expecion wih respec o he empiricl uniform disribuion over x,, x The following lemm shows he effec of llowing slck in he opimizion consrins Lemm 7 If P sisfies he consrins of he opimizion problem 4 wih slck K for ech disribuion Q over Π, ie, π Q x h Kµ W P x, πx + µ mx {4K, W Q 2 80C for ll Q, hen P sisfies π Q x h for ll Q } + K Kµ W P x, πx + µ mx {5K, W Q 2 } 44C Proof Le b = mx {4K, π2 80C } Noe h b 5b 4 K Hence b + K 4 which gives he sed bound Noe h he llownce of slck K is somewh rbirry; ny OK slck is olerble provided h oher consns re djused ppropriely D2 Deviion Bound for η π For ny policy π Π, define, for 0, nd for > 0, V π = K + V π = K, x D X W πx The V π bounds he vrinces of he erms in η π Lemm 8 Assume he bound in D holds for ll π Π nd For ll π Π: If, hen 2 If >, hen V π + ɛ x h + ρ + K K V π 4K Kµ W P x, πx + µ Proof For he firs clim, noe h if < 0, hen V π = K, nd if 0 <, hen logn/ logn 0 / µ = K 6K 2 log8kn/ 4K ; so W µ /4K For he second clim, pick ny >, nd noe h by definiion of, for ny π Π we hve W πx + ɛ + ρk x h Kµ W P x, πx + µ x D X The sed bound on V π now follows from is definiion Le V mx, π = mx{ V τ π, τ =, 2,, } The following lemm gives deviion bound for η π in erms of hese quniies Lemm 9 Pick ny 0, Wih probbiliy les, for ll pirs π, π Π nd 0, we hve η π η π η D π η D π Vmx, π + 2 V mx, π C D2

14 Proof Fix ny 0 nd π, π Π Le := exp C Pick ny τ Le Z τ π = r τ τ Iπx τ = τ W τ τ so η π = τ= Z τ π I is esy o see h nd x τ, r τ D, τ W τ τ= Z τ π Z τ π = η D π η D π x τ, rτ D, τ W τ x τ= τ D X Zτ π Z τ π 2 W τ πx τ + W τ π x τ V mx, π + V mx, π Moreover, wih probbiliy, Z τ π Z τ π µ τ Now, noe h since 0, µ = C 2K, so h = C Furher, boh V 2Kµ 2 mx, π nd V mx, π re les K Using hese bounds we ge log/ V mx, π + V mx, π C C 2Kµ 2 2K = µ µ τ, for ll τ, since he µ τ s re non-incresing Therefore, by Freedmn s inequliy Theorem 3, we hve η Pr π η π η D π η D π Vmx, π + > 2 V mx, π log/ 2 The conclusion follows by king union bound over 0 < T nd ll pirs π, π Π D3 Vrince Anlysis We define he following condiion, which will be ssumed by mos of he subsequen lemms in his secion Condiion The deviion bound D holds for ll π Π nd, nd he deviion bound D2 holds for ll pirs π, π Π nd 0 The nex wo lemms rele he V π o he π Lemm 20 Assume Condiion For ny nd π Π, if V π > θk, hen π 72 V πc Proof By Lemm 8, he fc V π > θk implies h x h Kµ W P x, πx + µ > ρ + V π + ɛ θ 2 V π Since V π > θk 5K, Lemm 7 implies h in order for P o sisfy he opimizion consrin in 4 corresponding o π wih slck K, i mus be he cse h π 44C x h Kµ W P x, πx + µ Combining wih he bove, we obin π 72 V πc Lemm 2 Assume Condiion For ll, V mx, π mx θk nd V mx, π θk Proof By inducion on The clim for ll follows from Lemm 8 So ke >, nd ssume s he srong inducive hypohesis h V mx,τ π mx θk nd V mx,τ π τ θk for τ {,, } Suppose for ske of conrdicion h V π mx > θk By Lemm 20, π mx 72 V π mx C However, by he deviion bounds, we hve π mx + D π 2 V mx, π + V mx, π mx C 2 2 V π mx C 72 < V π mx C

15 The second inequliy follows from our ssumpion nd he inducion hypohesis: V π mx > θk V mx, π, V mx, π mx Since D π 0, we hve conrdicion, so i mus be h V π mx θk This proves h V mx, π mx θk I remins o show h V mx, π θk So suppose for ske of conrdicion h he inequliy fils, nd le < τ be ny round for which V τ π = V mx, π > θk By Lemm 20, τ π On he oher hnd, 72 V τ π C τ D3 τ τ π D π τ + τ π + π mx = D π τ + τ π mx + η τ π mx η τ π D π + D π + π mx The prenhesized erms cn be bounded using he deviion bounds, so we hve τ π 2 V mx,τ π τ + V mx,τ π mx C τ τ + 2 V mx,τ π + V mx,τ π mx C τ τ Vmx, π V mx, π mx C 2 2 V τ π C τ V τ π C τ τ τ 2 Vτ π C < V τ π C τ τ where he second inequliy follows from he following fcs: By inducion hypohesis, we hve V mx,τ π τ, V mx,τ π mx, V mx, π mx θk, nd V τ π > θk, 2 Vτ π V mx, π, nd 3 since τ is round h chieves V mx, π, we hve V τ π V τ π This conrdics he inequliy in D3, so i mus be h V mx, π θk Corollry 22 Under he ssumpions of Lemm 2, for ll 0 D π + π mx 2 2θKC Proof Immedie from Lemm 2 nd he deviion bounds from D2 The following lemm shows h if policy π hs lrge τ π in some round τ, hen π remins lrge in ler rounds > τ Lemm 23 Assume Condiion Pick ny π Π nd If V mx, π > θk, hen 2 Vmx, πc π > 2 Proof Le τ be ny round in which V τ π = V mx, π > θk We hve π π π mx D π τ = τ π + η π mx η π D π + η D π τ η D π τ π 72 V τ πc τ τ Vmx, π + 2 V mx, π mx C 2 V mx,τ π + V mx,τ π τ C τ τ 72 > V mx, πc τ 2 Vmx, πc 2 τ 2 2 V mx, πc τ τ 2 2 V mx, πc τ 2 Vmx, πc 2 τ where he second inequliy follows from Lemm 20 nd he deviion bounds, nd he hird inequliy follows from Lemm 2 nd he fcs h V τ π = V mx, π > θk V mx, π mx, V mx,τ π τ, nd V mx, π V mx,τ π

16 D4 Regre Anlysis We now bound he vlue of he opimizion problem 4, which hen leds o our regre bound The nex lemm shows he exisence of fesible soluion wih cerin srucure bsed on he non-uniform consrins Recll from Secion 5, h solving he opimizion problem A, ie consrins 5, 52, 53, for he smlles fesible vlue of s is equivlen o solving he RUCB opimizion problem 4 β = 80C Recll h Lemm 24 There is poin W R K such h K W 4 β W C Zx, Z C : x h W mx{4k, β Z 2 } x, In priculr, he vlue of he opimizion problem 4, OPT, is bounded by 8 K β 0 KC Proof Define he ses {C i : i =, 2, } such h C i := {Z C : 2 i+ κ Z 2 i+2 κ}, K where κ = β Noe h since Z is liner funcion of Z, ech C i is closed, convex, compc se Also, define C 0 = {Z C : Z 4κ} This is lso closed, convex, compc se Noe h C = i=0 C i Le I = {i : C i }For i I \ {0}, define w i = 4 i, nd le w 0 = i I\{0} w i Noe h w 0 2/3 By Lemm, for ech i I, here is poin W i C i such h for ll Z C i, we hve Zx, 2K x, x h W i Here we use he fc h Kµ /2 o upper K bound Kµ by 2K Now consider he poin W = i I w iw i Since C is convex, W C Now fix ny i I For ny x,, we hve W x, w i W i x,, so h for ll Z C i, we hve Zx, W 2K x, w i x h 4 i+ K so he consrin for Z is sisfied mx{4k, β Z 2 }, Finlly, since for ll i I, we hve w i 4 i nd W i 2 i+2 κ, we ge W = w i W i 4 i 2 i+2 κ 8κ i I i=0 The vlue of he opimizion problem 4 cn be reled o he expeced insnneous regre of policy drwn rndomly from he disribuion P Lemm 25 Assume Condiion Then P π D π KC 2θ + 2ε op, π Π for ll > Proof Fix ny π Π nd > By he deviion bounds, we hve η D π η D π π + 2 V mx, π + V mx, π C Vmx, π + θk C π + 2, by Lemm 2 By Corollry 22 we hve 2θKC D π 2 Thus, we ge D π η D π η D π + D π Vmx, π + θk C π + 2 2θKC + 2 If V mx, π θk, hen we hve 2θKC D π π + 4 Oherwise, Lemm 23 implies h so V mx, π π 2 8C, π D π π θkc 8 2θKC + 2 2θKC 2 π + 4

17 Therefore π Π P π D π 2 π Π P π π OPT +ε op, + 4 2θKC 2θKC where OPT is he vlue of he opimizion problem 4 The conclusion follows from Lemm 24 We cn now finlly prove he min regre bound for RUCB Proof of Theorem 5 The regre hrough he firs rounds is rivilly bounded by In he even h Condiion holds, we hve for ll, A W r A nd herefore x, r D W = x, r D Kµ W P x, r A W P x, r Kµ = π Π P πr πx Kµ, r W r A π Π P πη D π Kµ η D π mx O KC + ε op, where he ls inequliy follows from Lemm 25 Summing he bound from = +,, T gives T = x, r D W η D π mx r + O T K log NT/ By Azum s inequliy, he probbiliy h T = r devies from is men by more hn O T log/ is mos Finlly, he probbiliy h Condiion does no hold is mos 2 by Lemm 9, Theorem 6, nd union bound The conclusion follows by finl union bound Deils of Orcle-bsed Algorihm We show how o pproximely solve A using he ellipsoid lgorihm wih AMO Fix ime period To void cluer, only in his secion we drop he subscrip from η,, nd h so h hey becomes η,, nd h respecively In order o use he ellipsoid lgorihm, we need o relx he progrm lile bi in order o ensure h he fesible region hs non-negligible volume To do his, we need o obin some perurbion bounds for he consrins of A The following lemm gives such bounds For ny > 0, we define C o be he se of ll poins wihin disnce of from C Lemm 26 Le b/4 be prmeer Le U, W C 2 be poins such h U W Then we hve U W γ Z C : x h where ɛ = 8 µ 2 Zx, U x, x h nd γ = µ Proof Firs, we hve ηu ηw which implies µ = γ, x,,r,q h Nex, for ny Z C, we hve Zx, U x, Zx, W x, 8 µ 2 Zx, W x, ɛ 2 r Ux, W x, p Zx, U x, W x, U x, W x, = ɛ In he ls inequliy, we use he Cuchy-Schwrz inequliy, nd use he following fcs here, Zx, denoes he vecor Zx,, ec: Zx, 2 since Z C, 2 U x, W x, Ux, W x,, nd 3 U x, bk 2 + b b/2, for b/4, nd similrly W x, b/2 This implies 2

18 We now consider he following relxed form of A Here, 0, b/4 is prmeer We wn o find poin W R K such h W s + γ 3 W C 4 Z C 2 : x h Zx, W x, mx{4k, β Z 2 } + ɛ, 5 where ɛ nd γ re s defined in Lemm 26 Cll his relxed progrm A We pply he ellipsoid mehod o A rher hn A Recll he requiremens of Lemm 8: we need n enclosing bll of bounded rdius for he fesible region, nd he rdius of n enclosed bll in he fesible region The following lemm gives his Lemm 27 The fesible region for A is conined in B0, +, nd if A is fesible, hen i conins bll of rdius Proof Noe h for ny W C, we hve W +, so he fesible region lies in B0, + Nex, if A is fesible, le W C be ny fesible soluion o A Consider he bll BW, Le U be ny poin in BW, Clerly U C By Lemm 26, ssuming /2, we hve for ll Z C 2, Also x h Zx, U x, x h Zx, U x, + ɛ mx{4k, β Z 2 } + ɛ U W + γ s + γ Thus, U is fesible for A, nd hence he enire bll BW, is fesible for A We now give he consrucion of seprion orcle for he fesible region of A by checking for violions of he consrins In he following, we use he word ierion o indice one sep of eiher he ellipsoid lgorihm or he percepron lgorihm ch such ierion involves one cll o AMO, nd ddiionl O 2 K 2 processing ime Le W R K be cndide poin h we wn o check for fesibiliy for A We cn check for violion of he consrin 3 esily, nd since i is liner consrin in W, i uomiclly yields sepring hyperplne if i is violed The hrder consrins re 4 nd 5 Recll h Lemm 9 shows h h AMO llows us o do liner opimizion over C efficienly This immediely gives us he following useful corollry: Corollry 28 Given vecor w R K nd > 0, we cn compue rg mx Z C w Z using one invocion of AMO Proof This follows direcly from he following fc: rg mx Z C w Z = w + rg mx w w Z Z C Now we show how o use AMO o check for consrin 4: Lemm 29 Suppose we re given poin W Then in O ierions, if W / C 2 2, we cn consruc hyperplne sepring W from C Oherwise, we declre correcly h W C 2 In he ler cse, we cn find n explici disribuion P over policies in Π such h W P sisfies W P W 2 Proof We run he percepron lgorihm wih he origin W nd ll poins in C being posiive exmples The gol of he percepron lgorihm hen is o find hyperplne going hrough W h pus ll of C sricly on one side In ech ierion of he percepron lgorihm, we hve weigh vecor w h is he norml o cndide hyperplne, nd we need o find poin Z C such h w Z W 0 noe h we hve shifed he origin o W To do his, we use AMO s in Lemm 9 o find Z = rg mx Z C w Z If w Z W 0, we use Z o upde w using he percepron upde rule, w w + Z W Oherwise, we hve w Z W > 0 for ll W C, nd hence we hve found our sepring hyperplne Now suppose h W / C 2, ie he disnce of W from C is more hn Since Z W = O for ll W C ssuming = O, he percepron convergence gurnee implies h in O ierions we find sepring hyperplne 2 If in k = O 2 ierions we hven found sepring hyperplne, hen W C 2 In fc he percepron lgorihm gives sronger gurnee: if he k policies found in he run of he percepron lgorihm re π, π 2,, π k Π, hen W is wihin disnce of 2 from heir convex hull, C = convπ, π 2,, π k This is becuse run of he percepron lgorihm on C 2 would be idenicl o h on C 2 for k seps We cn hen compue he explici disribuion over policies P by compuing he ucliden projecion of W on C in

19 polyk ime using convex qudric progrm: min W k i= P iπ i 2 P i = i i : P i 0 Solving his qudric progrm, we ge disribuion P over he policies {π, π 2,, π k } such h W P W 2 Finlly, we show how o check consrin 5: Lemm 30 Suppose we re given poin W In O 3 K 2 log 2 ierions, we cn eiher find poin Z C 2 such h Zx, W mx{4k, β Z 2 } + 2ɛ, x, x h x h or else we conclude correcly h for ll Z C, we hve Zx, W mx{4k, β Z 2 } + 3ɛ x, Proof We firs rewrie ηw s ηw = w π, where w is vecor defined s wx, = x,,r,p h: x =x, = r p Thus, Z = v w Z, where v = mx π ηπ = mx π w π which cn be compued by using AMO once Nex, using he cndide poin W, compue he vecor u defined s ux, = nx/ W x,, where n x is he number of imes x ppers in h, so h Zx, x h W x, = u Z Now, he problem reduces o finding poin R C which violes he consrin Define u Z mx{4k, β w Z v 2 } + 3ɛ fz = mx{4k, β w Z v 2 } + 3ɛ u Z Noe h f is convex funcion of Z Checking for violion of he bove consrin is equivlen o solving he following convex progrm: fz 0 Z C 6 7 To do his, we gin pply he ellipsoid mehod, bu on he relxed progrm fz ɛ Z C 8 9 To run he ellipsoid lgorihm, we need seprion orcle for he progrm Given cndide soluion Z, we run he lgorihm of Lemm 29, nd if Z / C 2, we consruc hyperplne sepring Z from C Now suppose we conclude h Z C 2 Then we consruc seprion orcle for 6 s follows If fz > ɛ, hen since f is convex funcion of Z, we cn consruc sepring hyperplne s in Lemm 0 Now we cn run he ellipsoid lgorihm wih he sring ellipsoid being B0, If here is poin Z C such h fz 0, hen consider he bll BZ 4, 5 For ny Y BZ 4, Kβ 5, we hve Kβ u Z u Y u Z Y ɛ 2 since u K µ Also, β w Z v 2 w Y v 2 = β w Z w Y w Z + w Y 2v β w Z Y w Z + Y + 2 v ɛ 2, since w µ, Z, Y + 2, nd v w µ Thus, fy fz + ɛ ɛ, so he enire bll BZ 4, 5 is fesible for he relxed progrm Kβ By Lemm 8, in O 2 K 2 log K ierions of he ellipsoid lgorihm, we obin one of he following: we eiher find poin Z C 2 such h fz ɛ, ie Zx, W mx{4k, β Z 2 } + 2ɛ, x, x h 2 or else we conclude h he originl convex progrm 6,7 is infesible, ie for ll Z C, we hve x h Zx, W x, mx{4k, β Z 2 } + 3ɛ The ol number of invocions of ierions is bounded by O 2 K 2 log K O = O 3 K log K

20 Lemm 3 Suppose we re given poin Z C 2 such h Zx, W mx{4k, β Z 2 } + 2ɛ x, x h Then we cn consruc hyperplne sepring W from ll fesible poins for A Proof For noionl convenience, define he funcion Zx, f Z W := x h W mx{4k, β Z 2 } 2ɛ x, Noe h i is convex funcion of W Noe h for ny poin U h is fesible for A, we hve f Z U ɛ, wheres f Z W 0 Thus, by Lemm 0, we cn consruc he desired sepring hyperplne We cn finlly prove Theorem : Proof Theorem We run he ellipsoid lgorihm sring wih he bll B0, + A ech poin, we re given cndide soluion W for progrm A We check for violion of consrin 3 firs If i is violed, he consrin, being liner, gives us sepring hyperplne lse, we use Lemm 29 o check for violion of consrin 4 If W / C 2, hen we cn consruc sepring hyperplne lse, we use Lemms 30 nd 3 o check for violion of consrin 5 If here is Z C such h Zx, x h W x, mx{4k, β Z 2 } + 3ɛ, hen we cn find sepring hyperplne lse, we conclude h he curren poin W sisfies he following consrins: W s + γ Zx, Z C : x h W mx{4k, β Z 2 } + 3ɛ x, W C 2 We cn hen use he percepron-bsed lgorihm of Lemm 29 o round W o n explici disribuion P over policies in Π such h W P sisfies W P W 2 Then Lemm 26 implies he sed bounds for W P By Lemm 8, in O 2 K 2 log ierions of he ellipsoid lgorihm, we find he poin W sisfying he consrins given bove, or declre correcly h A is infesible In he wors cse, we migh hve o run he lgorihm of Lemm 30 in every ierion, leding o n upper bound of O 2 K 2 log O 3 K 2 log K 2 = O 5 K 4 log 2 K on he number of ierions

Minimum Squared Error

Minimum Squared Error Minimum Squred Error LDF: Minimum Squred-Error Procedures Ide: conver o esier nd eer undersood prolem Percepron y i > 0 for ll smples y i solve sysem of liner inequliies MSE procedure y i i for ll smples

More information

Minimum Squared Error

Minimum Squared Error Minimum Squred Error LDF: Minimum Squred-Error Procedures Ide: conver o esier nd eer undersood prolem Percepron y i > for ll smples y i solve sysem of liner inequliies MSE procedure y i = i for ll smples

More information

Chapter 2: Evaluative Feedback

Chapter 2: Evaluative Feedback Chper 2: Evluive Feedbck Evluing cions vs. insrucing by giving correc cions Pure evluive feedbck depends olly on he cion ken. Pure insrucive feedbck depends no ll on he cion ken. Supervised lerning is

More information

Contraction Mapping Principle Approach to Differential Equations

Contraction Mapping Principle Approach to Differential Equations epl Journl of Science echnology 0 (009) 49-53 Conrcion pping Principle pproch o Differenil Equions Bishnu P. Dhungn Deprmen of hemics, hendr Rn Cmpus ribhuvn Universiy, Khmu epl bsrc Using n eension of

More information

e t dt e t dt = lim e t dt T (1 e T ) = 1

e t dt e t dt = lim e t dt T (1 e T ) = 1 Improper Inegrls There re wo ypes of improper inegrls - hose wih infinie limis of inegrion, nd hose wih inegrnds h pproch some poin wihin he limis of inegrion. Firs we will consider inegrls wih infinie

More information

4.8 Improper Integrals

4.8 Improper Integrals 4.8 Improper Inegrls Well you ve mde i hrough ll he inegrion echniques. Congrs! Unforunely for us, we sill need o cover one more inegrl. They re clled Improper Inegrls. A his poin, we ve only del wih inegrls

More information

September 20 Homework Solutions

September 20 Homework Solutions College of Engineering nd Compuer Science Mechnicl Engineering Deprmen Mechnicl Engineering A Seminr in Engineering Anlysis Fll 7 Number 66 Insrucor: Lrry Creo Sepember Homework Soluions Find he specrum

More information

Motion. Part 2: Constant Acceleration. Acceleration. October Lab Physics. Ms. Levine 1. Acceleration. Acceleration. Units for Acceleration.

Motion. Part 2: Constant Acceleration. Acceleration. October Lab Physics. Ms. Levine 1. Acceleration. Acceleration. Units for Acceleration. Moion Accelerion Pr : Consn Accelerion Accelerion Accelerion Accelerion is he re of chnge of velociy. = v - vo = Δv Δ ccelerion = = v - vo chnge of velociy elpsed ime Accelerion is vecor, lhough in one-dimensionl

More information

3. Renewal Limit Theorems

3. Renewal Limit Theorems Virul Lborories > 14. Renewl Processes > 1 2 3 3. Renewl Limi Theorems In he inroducion o renewl processes, we noed h he rrivl ime process nd he couning process re inverses, in sens The rrivl ime process

More information

REAL ANALYSIS I HOMEWORK 3. Chapter 1

REAL ANALYSIS I HOMEWORK 3. Chapter 1 REAL ANALYSIS I HOMEWORK 3 CİHAN BAHRAN The quesions re from Sein nd Shkrchi s e. Chper 1 18. Prove he following sserion: Every mesurble funcion is he limi.e. of sequence of coninuous funcions. We firs

More information

ENGR 1990 Engineering Mathematics The Integral of a Function as a Function

ENGR 1990 Engineering Mathematics The Integral of a Function as a Function ENGR 1990 Engineering Mhemics The Inegrl of Funcion s Funcion Previously, we lerned how o esime he inegrl of funcion f( ) over some inervl y dding he res of finie se of rpezoids h represen he re under

More information

The solution is often represented as a vector: 2xI + 4X2 + 2X3 + 4X4 + 2X5 = 4 2xI + 4X2 + 3X3 + 3X4 + 3X5 = 4. 3xI + 6X2 + 6X3 + 3X4 + 6X5 = 6.

The solution is often represented as a vector: 2xI + 4X2 + 2X3 + 4X4 + 2X5 = 4 2xI + 4X2 + 3X3 + 3X4 + 3X5 = 4. 3xI + 6X2 + 6X3 + 3X4 + 6X5 = 6. [~ o o :- o o ill] i 1. Mrices, Vecors, nd Guss-Jordn Eliminion 1 x y = = - z= The soluion is ofen represened s vecor: n his exmple, he process of eliminion works very smoohly. We cn elimine ll enries

More information

5.1-The Initial-Value Problems For Ordinary Differential Equations

5.1-The Initial-Value Problems For Ordinary Differential Equations 5.-The Iniil-Vlue Problems For Ordinry Differenil Equions Consider solving iniil-vlue problems for ordinry differenil equions: (*) y f, y, b, y. If we know he generl soluion y of he ordinry differenil

More information

0 for t < 0 1 for t > 0

0 for t < 0 1 for t > 0 8.0 Sep nd del funcions Auhor: Jeremy Orloff The uni Sep Funcion We define he uni sep funcion by u() = 0 for < 0 for > 0 I is clled he uni sep funcion becuse i kes uni sep = 0. I is someimes clled he Heviside

More information

A Kalman filtering simulation

A Kalman filtering simulation A Klmn filering simulion The performnce of Klmn filering hs been esed on he bsis of wo differen dynmicl models, ssuming eiher moion wih consn elociy or wih consn ccelerion. The former is epeced o beer

More information

An integral having either an infinite limit of integration or an unbounded integrand is called improper. Here are two examples.

An integral having either an infinite limit of integration or an unbounded integrand is called improper. Here are two examples. Improper Inegrls To his poin we hve only considered inegrls f(x) wih he is of inegrion nd b finie nd he inegrnd f(x) bounded (nd in fc coninuous excep possibly for finiely mny jump disconinuiies) An inegrl

More information

Convergence of Singular Integral Operators in Weighted Lebesgue Spaces

Convergence of Singular Integral Operators in Weighted Lebesgue Spaces EUROPEAN JOURNAL OF PURE AND APPLIED MATHEMATICS Vol. 10, No. 2, 2017, 335-347 ISSN 1307-5543 www.ejpm.com Published by New York Business Globl Convergence of Singulr Inegrl Operors in Weighed Lebesgue

More information

A LIMIT-POINT CRITERION FOR A SECOND-ORDER LINEAR DIFFERENTIAL OPERATOR IAN KNOWLES

A LIMIT-POINT CRITERION FOR A SECOND-ORDER LINEAR DIFFERENTIAL OPERATOR IAN KNOWLES A LIMIT-POINT CRITERION FOR A SECOND-ORDER LINEAR DIFFERENTIAL OPERATOR j IAN KNOWLES 1. Inroducion Consider he forml differenil operor T defined by el, (1) where he funcion q{) is rel-vlued nd loclly

More information

Mathematics 805 Final Examination Answers

Mathematics 805 Final Examination Answers . 5 poins Se he Weiersrss M-es. Mhemics 85 Finl Eminion Answers Answer: Suppose h A R, nd f n : A R. Suppose furher h f n M n for ll A, nd h Mn converges. Then f n converges uniformly on A.. 5 poins Se

More information

Some Inequalities variations on a common theme Lecture I, UL 2007

Some Inequalities variations on a common theme Lecture I, UL 2007 Some Inequliies vriions on common heme Lecure I, UL 2007 Finbrr Hollnd, Deprmen of Mhemics, Universiy College Cork, fhollnd@uccie; July 2, 2007 Three Problems Problem Assume i, b i, c i, i =, 2, 3 re rel

More information

EXISTENCE AND UNIQUENESS OF SOLUTIONS FOR A SECOND-ORDER ITERATIVE BOUNDARY-VALUE PROBLEM

EXISTENCE AND UNIQUENESS OF SOLUTIONS FOR A SECOND-ORDER ITERATIVE BOUNDARY-VALUE PROBLEM Elecronic Journl of Differenil Equions, Vol. 208 (208), No. 50, pp. 6. ISSN: 072-669. URL: hp://ejde.mh.xse.edu or hp://ejde.mh.un.edu EXISTENCE AND UNIQUENESS OF SOLUTIONS FOR A SECOND-ORDER ITERATIVE

More information

Probability, Estimators, and Stationarity

Probability, Estimators, and Stationarity Chper Probbiliy, Esimors, nd Sionriy Consider signl genered by dynmicl process, R, R. Considering s funcion of ime, we re opering in he ime domin. A fundmenl wy o chrcerize he dynmics using he ime domin

More information

S Radio transmission and network access Exercise 1-2

S Radio transmission and network access Exercise 1-2 S-7.330 Rdio rnsmission nd nework ccess Exercise 1 - P1 In four-symbol digil sysem wih eqully probble symbols he pulses in he figure re used in rnsmission over AWGN-chnnel. s () s () s () s () 1 3 4 )

More information

A new model for limit order book dynamics

A new model for limit order book dynamics Anewmodelforlimiorderbookdynmics JeffreyR.Russell UniversiyofChicgo,GrdueSchoolofBusiness TejinKim UniversiyofChicgo,DeprmenofSisics Absrc:Thispperproposesnewmodelforlimiorderbookdynmics.Thelimiorderbookconsiss

More information

f t f a f x dx By Lin McMullin f x dx= f b f a. 2

f t f a f x dx By Lin McMullin f x dx= f b f a. 2 Accumulion: Thoughs On () By Lin McMullin f f f d = + The gols of he AP* Clculus progrm include he semen, Sudens should undersnd he definie inegrl s he ne ccumulion of chnge. 1 The Topicl Ouline includes

More information

GENERALIZATION OF SOME INEQUALITIES VIA RIEMANN-LIOUVILLE FRACTIONAL CALCULUS

GENERALIZATION OF SOME INEQUALITIES VIA RIEMANN-LIOUVILLE FRACTIONAL CALCULUS - TAMKANG JOURNAL OF MATHEMATICS Volume 5, Number, 7-5, June doi:5556/jkjm555 Avilble online hp://journlsmhkueduw/ - - - GENERALIZATION OF SOME INEQUALITIES VIA RIEMANN-LIOUVILLE FRACTIONAL CALCULUS MARCELA

More information

Transforms II - Wavelets Preliminary version please report errors, typos, and suggestions for improvements

Transforms II - Wavelets Preliminary version please report errors, typos, and suggestions for improvements EECS 3 Digil Signl Processing Universiy of Cliforni, Berkeley: Fll 007 Gspr November 4, 007 Trnsforms II - Wveles Preliminry version plese repor errors, ypos, nd suggesions for improvemens We follow n

More information

MATH 124 AND 125 FINAL EXAM REVIEW PACKET (Revised spring 2008)

MATH 124 AND 125 FINAL EXAM REVIEW PACKET (Revised spring 2008) MATH 14 AND 15 FINAL EXAM REVIEW PACKET (Revised spring 8) The following quesions cn be used s review for Mh 14/ 15 These quesions re no cul smples of quesions h will pper on he finl em, bu hey will provide

More information

INTEGRALS. Exercise 1. Let f : [a, b] R be bounded, and let P and Q be partitions of [a, b]. Prove that if P Q then U(P ) U(Q) and L(P ) L(Q).

INTEGRALS. Exercise 1. Let f : [a, b] R be bounded, and let P and Q be partitions of [a, b]. Prove that if P Q then U(P ) U(Q) and L(P ) L(Q). INTEGRALS JOHN QUIGG Eercise. Le f : [, b] R be bounded, nd le P nd Q be priions of [, b]. Prove h if P Q hen U(P ) U(Q) nd L(P ) L(Q). Soluion: Le P = {,..., n }. Since Q is obined from P by dding finiely

More information

One Practical Algorithm for Both Stochastic and Adversarial Bandits

One Practical Algorithm for Both Stochastic and Adversarial Bandits One Prcicl Algorihm for Boh Sochsic nd Adversril Bndis Full Version Including Appendices Yevgeny Seldin Queenslnd Universiy of Technology, Brisbne, Ausrli Aleksndrs Slivkins Microsof Reserch, New York

More information

A 1.3 m 2.5 m 2.8 m. x = m m = 8400 m. y = 4900 m 3200 m = 1700 m

A 1.3 m 2.5 m 2.8 m. x = m m = 8400 m. y = 4900 m 3200 m = 1700 m PHYS : Soluions o Chper 3 Home Work. SSM REASONING The displcemen is ecor drwn from he iniil posiion o he finl posiion. The mgniude of he displcemen is he shores disnce beween he posiions. Noe h i is onl

More information

1 jordan.mcd Eigenvalue-eigenvector approach to solving first order ODEs. -- Jordan normal (canonical) form. Instructor: Nam Sun Wang

1 jordan.mcd Eigenvalue-eigenvector approach to solving first order ODEs. -- Jordan normal (canonical) form. Instructor: Nam Sun Wang jordnmcd Eigenvlue-eigenvecor pproch o solving firs order ODEs -- ordn norml (cnonicl) form Insrucor: Nm Sun Wng Consider he following se of coupled firs order ODEs d d x x 5 x x d d x d d x x x 5 x x

More information

1.0 Electrical Systems

1.0 Electrical Systems . Elecricl Sysems The ypes of dynmicl sysems we will e sudying cn e modeled in erms of lgeric equions, differenil equions, or inegrl equions. We will egin y looking fmilir mhemicl models of idel resisors,

More information

PHYSICS 1210 Exam 1 University of Wyoming 14 February points

PHYSICS 1210 Exam 1 University of Wyoming 14 February points PHYSICS 1210 Em 1 Uniersiy of Wyoming 14 Februry 2013 150 poins This es is open-noe nd closed-book. Clculors re permied bu compuers re no. No collborion, consulion, or communicion wih oher people (oher

More information

Solutions to Problems from Chapter 2

Solutions to Problems from Chapter 2 Soluions o Problems rom Chper Problem. The signls u() :5sgn(), u () :5sgn(), nd u h () :5sgn() re ploed respecively in Figures.,b,c. Noe h u h () :5sgn() :5; 8 including, bu u () :5sgn() is undeined..5

More information

22.615, MHD Theory of Fusion Systems Prof. Freidberg Lecture 9: The High Beta Tokamak

22.615, MHD Theory of Fusion Systems Prof. Freidberg Lecture 9: The High Beta Tokamak .65, MHD Theory of Fusion Sysems Prof. Freidberg Lecure 9: The High e Tokmk Summry of he Properies of n Ohmic Tokmk. Advnges:. good euilibrium (smll shif) b. good sbiliy ( ) c. good confinemen ( τ nr )

More information

A Time Truncated Improved Group Sampling Plans for Rayleigh and Log - Logistic Distributions

A Time Truncated Improved Group Sampling Plans for Rayleigh and Log - Logistic Distributions ISSNOnline : 39-8753 ISSN Prin : 347-67 An ISO 397: 7 Cerified Orgnizion Vol. 5, Issue 5, My 6 A Time Trunced Improved Group Smpling Plns for Ryleigh nd og - ogisic Disribuions P.Kvipriy, A.R. Sudmni Rmswmy

More information

Physics 2A HW #3 Solutions

Physics 2A HW #3 Solutions Chper 3 Focus on Conceps: 3, 4, 6, 9 Problems: 9, 9, 3, 41, 66, 7, 75, 77 Phsics A HW #3 Soluions Focus On Conceps 3-3 (c) The ccelerion due o grvi is he sme for boh blls, despie he fc h he hve differen

More information

( ) ( ) ( ) ( ) ( ) ( y )

( ) ( ) ( ) ( ) ( ) ( y ) 8. Lengh of Plne Curve The mos fmous heorem in ll of mhemics is he Pyhgoren Theorem. I s formulion s he disnce formul is used o find he lenghs of line segmens in he coordine plne. In his secion you ll

More information

Average & instantaneous velocity and acceleration Motion with constant acceleration

Average & instantaneous velocity and acceleration Motion with constant acceleration Physics 7: Lecure Reminders Discussion nd Lb secions sr meeing ne week Fill ou Pink dd/drop form if you need o swich o differen secion h is FULL. Do i TODAY. Homework Ch. : 5, 7,, 3,, nd 6 Ch.: 6,, 3 Submission

More information

Green s Functions and Comparison Theorems for Differential Equations on Measure Chains

Green s Functions and Comparison Theorems for Differential Equations on Measure Chains Green s Funcions nd Comprison Theorems for Differenil Equions on Mesure Chins Lynn Erbe nd Alln Peerson Deprmen of Mhemics nd Sisics, Universiy of Nebrsk-Lincoln Lincoln,NE 68588-0323 lerbe@@mh.unl.edu

More information

Properties of Logarithms. Solving Exponential and Logarithmic Equations. Properties of Logarithms. Properties of Logarithms. ( x)

Properties of Logarithms. Solving Exponential and Logarithmic Equations. Properties of Logarithms. Properties of Logarithms. ( x) Properies of Logrihms Solving Eponenil nd Logrihmic Equions Properies of Logrihms Produc Rule ( ) log mn = log m + log n ( ) log = log + log Properies of Logrihms Quoien Rule log m = logm logn n log7 =

More information

Some basic notation and terminology. Deterministic Finite Automata. COMP218: Decision, Computation and Language Note 1

Some basic notation and terminology. Deterministic Finite Automata. COMP218: Decision, Computation and Language Note 1 COMP28: Decision, Compuion nd Lnguge Noe These noes re inended minly s supplemen o he lecures nd exooks; hey will e useful for reminders ou noion nd erminology. Some sic noion nd erminology An lphe is

More information

white strictly far ) fnf regular [ with f fcs)8( hs ) as function Preliminary question jointly speaking does not exist! Brownian : APA Lecture 1.

white strictly far ) fnf regular [ with f fcs)8( hs ) as function Preliminary question jointly speaking does not exist! Brownian : APA Lecture 1. Am : APA Lecure 13 Brownin moion Preliminry quesion : Wh is he equivlen in coninuous ime of sequence of? iid Ncqe rndom vribles ( n nzn noise ( 4 e Re whie ( ie se every fm ( xh o + nd covrince E ( xrxs

More information

(b) 10 yr. (b) 13 m. 1.6 m s, m s m s (c) 13.1 s. 32. (a) 20.0 s (b) No, the minimum distance to stop = 1.00 km. 1.

(b) 10 yr. (b) 13 m. 1.6 m s, m s m s (c) 13.1 s. 32. (a) 20.0 s (b) No, the minimum distance to stop = 1.00 km. 1. Answers o Een Numbered Problems Chper. () 7 m s, 6 m s (b) 8 5 yr 4.. m ih 6. () 5. m s (b).5 m s (c).5 m s (d) 3.33 m s (e) 8. ().3 min (b) 64 mi..3 h. ().3 s (b) 3 m 4..8 mi wes of he flgpole 6. (b)

More information

1. Introduction. 1 b b

1. Introduction. 1 b b Journl of Mhemicl Inequliies Volume, Number 3 (007), 45 436 SOME IMPROVEMENTS OF GRÜSS TYPE INEQUALITY N. ELEZOVIĆ, LJ. MARANGUNIĆ AND J. PEČARIĆ (communiced b A. Čižmešij) Absrc. In his pper some inequliies

More information

ON NEW INEQUALITIES OF SIMPSON S TYPE FOR FUNCTIONS WHOSE SECOND DERIVATIVES ABSOLUTE VALUES ARE CONVEX

ON NEW INEQUALITIES OF SIMPSON S TYPE FOR FUNCTIONS WHOSE SECOND DERIVATIVES ABSOLUTE VALUES ARE CONVEX Journl of Applied Mhemics, Sisics nd Informics JAMSI), 9 ), No. ON NEW INEQUALITIES OF SIMPSON S TYPE FOR FUNCTIONS WHOSE SECOND DERIVATIVES ABSOLUTE VALUES ARE CONVEX MEHMET ZEKI SARIKAYA, ERHAN. SET

More information

arxiv: v1 [math.pr] 24 Sep 2015

arxiv: v1 [math.pr] 24 Sep 2015 RENEWAL STRUCTURE OF THE BROWNIAN TAUT STRING EMMANUEL SCHERTZER rxiv:59.7343v [mh.pr] 24 Sep 25 Absrc. In recen pper [LS5], M. Lifshis nd E. Seerqvis inroduced he u sring of Brownin moion w, defined s

More information

PART V. Wavelets & Multiresolution Analysis

PART V. Wavelets & Multiresolution Analysis Wveles 65 PART V Wveles & Muliresoluion Anlysis ADDITIONAL REFERENCES: A. Cohen, Numericl Anlysis o Wvele Mehods, Norh-Hollnd, (003) S. Mll, A Wvele Tour o Signl Processing, Acdemic Press, (999) I. Dubechies,

More information

Hermite-Hadamard-Fejér type inequalities for convex functions via fractional integrals

Hermite-Hadamard-Fejér type inequalities for convex functions via fractional integrals Sud. Univ. Beş-Bolyi Mh. 6(5, No. 3, 355 366 Hermie-Hdmrd-Fejér ype inequliies for convex funcions vi frcionl inegrls İmd İşcn Asrc. In his pper, firsly we hve eslished Hermie Hdmrd-Fejér inequliy for

More information

3 Motion with constant acceleration: Linear and projectile motion

3 Motion with constant acceleration: Linear and projectile motion 3 Moion wih consn ccelerion: Liner nd projecile moion cons, In he precedin Lecure we he considered moion wih consn ccelerion lon he is: Noe h,, cn be posiie nd neie h leds o rie of behiors. Clerl similr

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

Chapter Direct Method of Interpolation

Chapter Direct Method of Interpolation Chper 5. Direc Mehod of Inerpolion Afer reding his chper, you should be ble o:. pply he direc mehod of inerpolion,. sole problems using he direc mehod of inerpolion, nd. use he direc mehod inerpolns o

More information

3D Transformations. Computer Graphics COMP 770 (236) Spring Instructor: Brandon Lloyd 1/26/07 1

3D Transformations. Computer Graphics COMP 770 (236) Spring Instructor: Brandon Lloyd 1/26/07 1 D Trnsformions Compuer Grphics COMP 770 (6) Spring 007 Insrucor: Brndon Lloyd /6/07 Geomery Geomeric eniies, such s poins in spce, exis wihou numers. Coordines re nming scheme. The sme poin cn e descried

More information

EXERCISE - 01 CHECK YOUR GRASP

EXERCISE - 01 CHECK YOUR GRASP UNIT # 09 PARABOLA, ELLIPSE & HYPERBOLA PARABOLA EXERCISE - 0 CHECK YOUR GRASP. Hin : Disnce beween direcri nd focus is 5. Given (, be one end of focl chord hen oher end be, lengh of focl chord 6. Focus

More information

Version 001 test-1 swinney (57010) 1. is constant at m/s.

Version 001 test-1 swinney (57010) 1. is constant at m/s. Version 001 es-1 swinne (57010) 1 This prin-ou should hve 20 quesions. Muliple-choice quesions m coninue on he nex column or pge find ll choices before nswering. CubeUniVec1x76 001 10.0 poins Acubeis1.4fee

More information

Application on Inner Product Space with. Fixed Point Theorem in Probabilistic

Application on Inner Product Space with. Fixed Point Theorem in Probabilistic Journl of Applied Mhemics & Bioinformics, vol.2, no.2, 2012, 1-10 ISSN: 1792-6602 prin, 1792-6939 online Scienpress Ld, 2012 Applicion on Inner Produc Spce wih Fixed Poin Theorem in Probbilisic Rjesh Shrivsv

More information

Honours Introductory Maths Course 2011 Integration, Differential and Difference Equations

Honours Introductory Maths Course 2011 Integration, Differential and Difference Equations Honours Inroducory Mhs Course 0 Inegrion, Differenil nd Difference Equions Reding: Ching Chper 4 Noe: These noes do no fully cover he meril in Ching, u re men o supplemen your reding in Ching. Thus fr

More information

How to Prove the Riemann Hypothesis Author: Fayez Fok Al Adeh.

How to Prove the Riemann Hypothesis Author: Fayez Fok Al Adeh. How o Prove he Riemnn Hohesis Auhor: Fez Fok Al Adeh. Presiden of he Srin Cosmologicl Socie P.O.Bo,387,Dmscus,Sri Tels:963--77679,735 Emil:hf@scs-ne.org Commens: 3 ges Subj-Clss: Funcionl nlsis, comle

More information

Reinforcement Learning. Markov Decision Processes

Reinforcement Learning. Markov Decision Processes einforcemen Lerning Mrkov Decision rocesses Mnfred Huber 2014 1 equenil Decision Mking N-rmed bi problems re no good wy o model sequenil decision problem Only dels wih sic decision sequences Could be miiged

More information

Think of the Relationship Between Time and Space Again

Think of the Relationship Between Time and Space Again Repor nd Opinion, 1(3),009 hp://wwwsciencepubne sciencepub@gmilcom Think of he Relionship Beween Time nd Spce Agin Yng F-cheng Compny of Ruid Cenre in Xinjing 15 Hongxing Sree, Klmyi, Xingjing 834000,

More information

MTH 146 Class 11 Notes

MTH 146 Class 11 Notes 8.- Are of Surfce of Revoluion MTH 6 Clss Noes Suppose we wish o revolve curve C round n is nd find he surfce re of he resuling solid. Suppose f( ) is nonnegive funcion wih coninuous firs derivive on he

More information

Neural assembly binding in linguistic representation

Neural assembly binding in linguistic representation Neurl ssembly binding in linguisic represenion Frnk vn der Velde & Mrc de Kmps Cogniive Psychology Uni, Universiy of Leiden, Wssenrseweg 52, 2333 AK Leiden, The Neherlnds, vdvelde@fsw.leidenuniv.nl Absrc.

More information

T-Match: Matching Techniques For Driving Yagi-Uda Antennas: T-Match. 2a s. Z in. (Sections 9.5 & 9.7 of Balanis)

T-Match: Matching Techniques For Driving Yagi-Uda Antennas: T-Match. 2a s. Z in. (Sections 9.5 & 9.7 of Balanis) 3/0/018 _mch.doc Pge 1 of 6 T-Mch: Mching Techniques For Driving Ygi-Ud Anenns: T-Mch (Secions 9.5 & 9.7 of Blnis) l s l / l / in The T-Mch is shun-mching echnique h cn be used o feed he driven elemen

More information

Optimality of Myopic Policy for a Class of Monotone Affine Restless Multi-Armed Bandit

Optimality of Myopic Policy for a Class of Monotone Affine Restless Multi-Armed Bandit Univeriy of Souhern Cliforni Opimliy of Myopic Policy for Cl of Monoone Affine Rele Muli-Armed Bndi Pri Mnourifrd USC Tr Jvidi UCSD Bhkr Krihnmchri USC Dec 0, 202 Univeriy of Souhern Cliforni Inroducion

More information

Magnetostatics Bar Magnet. Magnetostatics Oersted s Experiment

Magnetostatics Bar Magnet. Magnetostatics Oersted s Experiment Mgneosics Br Mgne As fr bck s 4500 yers go, he Chinese discovered h cerin ypes of iron ore could rc ech oher nd cerin mels. Iron filings "mp" of br mgne s field Crefully suspended slivers of his mel were

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Asymptotic relationship between trajectories of nominal and uncertain nonlinear systems on time scales

Asymptotic relationship between trajectories of nominal and uncertain nonlinear systems on time scales Asympoic relionship beween rjecories of nominl nd uncerin nonliner sysems on ime scles Fim Zohr Tousser 1,2, Michel Defoor 1, Boudekhil Chfi 2 nd Mohmed Djemï 1 Absrc This pper sudies he relionship beween

More information

ANSWERS TO EVEN NUMBERED EXERCISES IN CHAPTER 2

ANSWERS TO EVEN NUMBERED EXERCISES IN CHAPTER 2 ANSWERS TO EVEN NUMBERED EXERCISES IN CHAPTER Seion Eerise -: Coninuiy of he uiliy funion Le λ ( ) be he monooni uiliy funion defined in he proof of eisene of uiliy funion If his funion is oninuous y hen

More information

Procedia Computer Science

Procedia Computer Science Procedi Compuer Science 00 (0) 000 000 Procedi Compuer Science www.elsevier.com/loce/procedi The Third Informion Sysems Inernionl Conference The Exisence of Polynomil Soluion of he Nonliner Dynmicl Sysems

More information

Reinforcement Learning

Reinforcement Learning Reiforceme Corol lerig Corol polices h choose opiml cios Q lerig Covergece Chper 13 Reiforceme 1 Corol Cosider lerig o choose cios, e.g., Robo lerig o dock o bery chrger o choose cios o opimize fcory oupu

More information

Notes on online convex optimization

Notes on online convex optimization Noes on online convex opimizaion Karl Sraos Online convex opimizaion (OCO) is a principled framework for online learning: OnlineConvexOpimizaion Inpu: convex se S, number of seps T For =, 2,..., T : Selec

More information

Journal of Mathematical Analysis and Applications. Two normality criteria and the converse of the Bloch principle

Journal of Mathematical Analysis and Applications. Two normality criteria and the converse of the Bloch principle J. Mh. Anl. Appl. 353 009) 43 48 Conens liss vilble ScienceDirec Journl of Mhemicl Anlysis nd Applicions www.elsevier.com/loce/jm Two normliy crieri nd he converse of he Bloch principle K.S. Chrk, J. Rieppo

More information

Tax Audit and Vertical Externalities

Tax Audit and Vertical Externalities T Audi nd Vericl Eernliies Hidey Ko Misuyoshi Yngihr Ngoy Keizi Universiy Ngoy Universiy 1. Inroducion The vericl fiscl eernliies rise when he differen levels of governmens, such s he federl nd se governmens,

More information

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities:

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities: Mah 4 Eam Review Problems Problem. Calculae he 3rd Taylor polynomial for arcsin a =. Soluion. Le f() = arcsin. For his problem, we use he formula f() + f () + f ()! + f () 3! for he 3rd Taylor polynomial

More information

MAT 266 Calculus for Engineers II Notes on Chapter 6 Professor: John Quigg Semester: spring 2017

MAT 266 Calculus for Engineers II Notes on Chapter 6 Professor: John Quigg Semester: spring 2017 MAT 66 Clculus for Engineers II Noes on Chper 6 Professor: John Quigg Semeser: spring 7 Secion 6.: Inegrion by prs The Produc Rule is d d f()g() = f()g () + f ()g() Tking indefinie inegrls gives [f()g

More information

Flow Networks Alon Efrat Slides courtesy of Charles Leiserson with small changes by Carola Wenk. Flow networks. Flow networks CS 445

Flow Networks Alon Efrat Slides courtesy of Charles Leiserson with small changes by Carola Wenk. Flow networks. Flow networks CS 445 CS 445 Flow Nework lon Efr Slide corey of Chrle Leieron wih mll chnge by Crol Wenk Flow nework Definiion. flow nework i direced grph G = (V, E) wih wo diingihed erice: orce nd ink. Ech edge (, ) E h nonnegie

More information

On the Pseudo-Spectral Method of Solving Linear Ordinary Differential Equations

On the Pseudo-Spectral Method of Solving Linear Ordinary Differential Equations Journl of Mhemics nd Sisics 5 ():136-14, 9 ISS 1549-3644 9 Science Publicions On he Pseudo-Specrl Mehod of Solving Liner Ordinry Differenil Equions B.S. Ogundre Deprmen of Pure nd Applied Mhemics, Universiy

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

How to prove the Riemann Hypothesis

How to prove the Riemann Hypothesis Scholrs Journl of Phsics, Mhemics nd Sisics Sch. J. Phs. Mh. S. 5; (B:5-6 Scholrs Acdemic nd Scienific Publishers (SAS Publishers (An Inernionl Publisher for Acdemic nd Scienific Resources *Corresonding

More information

P441 Analytical Mechanics - I. Coupled Oscillators. c Alex R. Dzierba

P441 Analytical Mechanics - I. Coupled Oscillators. c Alex R. Dzierba Lecure 3 Mondy - Deceber 5, 005 Wrien or ls upded: Deceber 3, 005 P44 Anlyicl Mechnics - I oupled Oscillors c Alex R. Dzierb oupled oscillors - rix echnique In Figure we show n exple of wo coupled oscillors,

More information

1. Consider a PSA initially at rest in the beginning of the left-hand end of a long ISS corridor. Assume xo = 0 on the left end of the ISS corridor.

1. Consider a PSA initially at rest in the beginning of the left-hand end of a long ISS corridor. Assume xo = 0 on the left end of the ISS corridor. In Eercise 1, use sndrd recngulr Cresin coordine sysem. Le ime be represened long he horizonl is. Assume ll ccelerions nd decelerions re consn. 1. Consider PSA iniilly res in he beginning of he lef-hnd

More information

ON NEW INEQUALITIES OF SIMPSON S TYPE FOR FUNCTIONS WHOSE SECOND DERIVATIVES ABSOLUTE VALUES ARE CONVEX.

ON NEW INEQUALITIES OF SIMPSON S TYPE FOR FUNCTIONS WHOSE SECOND DERIVATIVES ABSOLUTE VALUES ARE CONVEX. ON NEW INEQUALITIES OF SIMPSON S TYPE FOR FUNCTIONS WHOSE SECOND DERIVATIVES ABSOLUTE VALUES ARE CONVEX. MEHMET ZEKI SARIKAYA?, ERHAN. SET, AND M. EMIN OZDEMIR Asrc. In his noe, we oin new some ineuliies

More information

A new model for solving fuzzy linear fractional programming problem with ranking function

A new model for solving fuzzy linear fractional programming problem with ranking function J. ppl. Res. Ind. Eng. Vol. 4 No. 07 89 96 Journl of pplied Reserch on Indusril Engineering www.journl-prie.com new model for solving fuzzy liner frcionl progrmming prolem wih rning funcion Spn Kumr Ds

More information

CBSE 2014 ANNUAL EXAMINATION ALL INDIA

CBSE 2014 ANNUAL EXAMINATION ALL INDIA CBSE ANNUAL EXAMINATION ALL INDIA SET Wih Complee Eplnions M Mrks : SECTION A Q If R = {(, y) : + y = 8} is relion on N, wrie he rnge of R Sol Since + y = 8 h implies, y = (8 ) R = {(, ), (, ), (6, )}

More information

LAPLACE TRANSFORM OVERCOMING PRINCIPLE DRAWBACKS IN APPLICATION OF THE VARIATIONAL ITERATION METHOD TO FRACTIONAL HEAT EQUATIONS

LAPLACE TRANSFORM OVERCOMING PRINCIPLE DRAWBACKS IN APPLICATION OF THE VARIATIONAL ITERATION METHOD TO FRACTIONAL HEAT EQUATIONS Wu, G.-.: Lplce Trnsform Overcoming Principle Drwbcks in Applicion... THERMAL SIENE: Yer 22, Vol. 6, No. 4, pp. 257-26 257 Open forum LAPLAE TRANSFORM OVEROMING PRINIPLE DRAWBAKS IN APPLIATION OF THE VARIATIONAL

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

Factorized Decision Forecasting via Combining Value-based and Reward-based Estimation

Factorized Decision Forecasting via Combining Value-based and Reward-based Estimation Fcorized Decision Forecsing vi Combining Vlue-bsed nd Rewrd-bsed Esimion Brin D. Ziebr Crnegie Mellon Universiy Pisburgh, PA 15213 bziebr@cs.cmu.edu Absrc A powerful recen perspecive for predicing sequenil

More information

Integral Transform. Definitions. Function Space. Linear Mapping. Integral Transform

Integral Transform. Definitions. Function Space. Linear Mapping. Integral Transform Inegrl Trnsform Definiions Funcion Spce funcion spce A funcion spce is liner spce of funcions defined on he sme domins & rnges. Liner Mpping liner mpping Le VF, WF e liner spces over he field F. A mpping

More information

FURTHER GENERALIZATIONS. QI Feng. The value of the integral of f(x) over [a; b] can be estimated in a variety ofways. b a. 2(M m)

FURTHER GENERALIZATIONS. QI Feng. The value of the integral of f(x) over [a; b] can be estimated in a variety ofways. b a. 2(M m) Univ. Beogrd. Pul. Elekroehn. Fk. Ser. M. 8 (997), 79{83 FUTHE GENEALIZATIONS OF INEQUALITIES FO AN INTEGAL QI Feng Using he Tylor's formul we prove wo inegrl inequliies, h generlize K. S. K. Iyengr's

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

TIMELINESS, ACCURACY, AND RELEVANCE IN DYNAMIC INCENTIVE CONTRACTS

TIMELINESS, ACCURACY, AND RELEVANCE IN DYNAMIC INCENTIVE CONTRACTS TIMELINESS, ACCURACY, AND RELEVANCE IN DYNAMIC INCENTIVE CONTRACTS by Peer O. Chrisensen Universiy of Souhern Denmrk Odense, Denmrk Gerld A. Felhm Universiy of Briish Columbi Vncouver, Cnd Chrisin Hofmnn

More information

ECE Microwave Engineering. Fall Prof. David R. Jackson Dept. of ECE. Notes 10. Waveguides Part 7: Transverse Equivalent Network (TEN)

ECE Microwave Engineering. Fall Prof. David R. Jackson Dept. of ECE. Notes 10. Waveguides Part 7: Transverse Equivalent Network (TEN) EE 537-635 Microwve Engineering Fll 7 Prof. Dvid R. Jcson Dep. of EE Noes Wveguides Pr 7: Trnsverse Equivlen Newor (N) Wveguide Trnsmission Line Model Our gol is o come up wih rnsmission line model for

More information

PARABOLA. moves such that PM. = e (constant > 0) (eccentricity) then locus of P is called a conic. or conic section.

PARABOLA. moves such that PM. = e (constant > 0) (eccentricity) then locus of P is called a conic. or conic section. wwwskshieducioncom PARABOLA Le S be given fixed poin (focus) nd le l be given fixed line (Direcrix) Le SP nd PM be he disnce of vrible poin P o he focus nd direcrix respecively nd P SP moves such h PM

More information

Chapter 2. Motion along a straight line. 9/9/2015 Physics 218

Chapter 2. Motion along a straight line. 9/9/2015 Physics 218 Chper Moion long srigh line 9/9/05 Physics 8 Gols for Chper How o describe srigh line moion in erms of displcemen nd erge elociy. The mening of insnneous elociy nd speed. Aerge elociy/insnneous elociy

More information

Reinforcement learning

Reinforcement learning CS 75 Mchine Lening Lecue b einfocemen lening Milos Huskech milos@cs.pi.edu 539 Senno Sque einfocemen lening We wn o len conol policy: : X A We see emples of bu oupus e no given Insed of we ge feedbck

More information

Bipartite Matching. Matching. Bipartite Matching. Maxflow Formulation

Bipartite Matching. Matching. Bipartite Matching. Maxflow Formulation Mching Inpu: undireced grph G = (V, E). Biprie Mching Inpu: undireced, biprie grph G = (, E).. Mching Ern Myr, Hrld äcke Biprie Mching Inpu: undireced, biprie grph G = (, E). Mflow Formulion Inpu: undireced,

More information

HUI-HSIUNG KUO, ANUWAT SAE-TANG, AND BENEDYKT SZOZDA

HUI-HSIUNG KUO, ANUWAT SAE-TANG, AND BENEDYKT SZOZDA Communicions on Sochsic Anlysis Vol 6, No 4 2012 603-614 Serils Publicions wwwserilspublicionscom THE ITÔ FORMULA FOR A NEW STOCHASTIC INTEGRAL HUI-HSIUNG KUO, ANUWAT SAE-TANG, AND BENEDYKT SZOZDA Absrc

More information

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004 Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when

More information

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t M ah 5 2 7 Fall 2 0 0 9 L ecure 1 0 O c. 7, 2 0 0 9 Hamilon- J acobi Equaion: Explici Formulas In his lecure we ry o apply he mehod of characerisics o he Hamilon-Jacobi equaion: u + H D u, x = 0 in R n

More information