Online Markov Decision Processes under Bandit Feedback

Size: px
Start display at page:

Download "Online Markov Decision Processes under Bandit Feedback"

Transcription

1 Online Mrkov Decision Processes under Bndit Feedbck Gergely Neu, András György, Csb Szepesvári, András Antos Abstrct We consider online lerning in finite stochstic Mrkovin environments where in ech time step new rewrd function is chosen by n oblivious dversry The gol of the lerning gent is to compete with the best sttionry policy in hindsight in terms of the totl rewrd received Specificlly, in ech time step the gent observes the current stte nd the rewrd ssocited with the lst trnsition, however, the gent does not observe the rewrds ssocited with other stte-ction pirs The gent is ssumed to know the trnsition probbilities The stte of the rt result for this setting is n lgorithm with n expected regret of OT 2/3 ln T In this pper, ssuming tht sttionry policies mix uniformly fst, we show tht fter T time steps, the expected regret of this lgorithm more precisely, slightly modified version thereof is O T /2 ln T, giving the first rigorously proven, essentilly tight regret bound for the problem /0*2' ; <=' ; <=' ; <=' r t t 34"#5'67*-8%*' y t ; <=' 9*-%*2#%::5' *"+,-'!"#$%&'*"+,-' x t ; <=' I INTRODUCTION In this pper we consider online lerning in finite stochstic Mrkovin environments where in ech time step new rewrd function my be chosen by n oblivious dversry The interction between the lerner nd the environment is shown in Figure The environment is split into two prts: One prt hs controlled Mrkovin dynmics, while nother one hs n unrestricted, uncontrolled utonomous dynmics In ech discrete time step t, the lerning gent receives the stte of the Mrkovin environment x t X nd some informtion y t Y bout the previous stte of the utonomous dynmics The lerner then mkes decision bout the next ction t A, which is sent to the environment In response, the environment mkes trnsition: the next stte x t+ of the Mrkovin prt is drwn from trnsition probbility kernel P x t, t, while the other prt mkes trnsition in n utonomous fshion In the menwhile, the gent incurs rewrd r t = rx t, t, y t [0, tht depends on the complete stte of the environment nd the chosen ction; then the process continues with the next step The gol of the lerner is to collect s much rewrd s possible The gent knows the trnsition probbility kernel P nd the rewrd function r, however, he does not know the sequence y t in dvnce We cll this problem online lerning in Mrkov Decision Processes MDPs We tke the viewpoint tht the uncontrolled dynmics might be very complex nd thus modeling it bsed on the vilble limited in- This reserch ws supported in prt by the Ntionl Development Agency of Hungry from the Reserch nd Technologicl Innovtion Fund KTIA-OTKA CNK 77782, the Albert Innovtes Technology Futures, nd the Nturl Sciences nd Engineering Reserch Council NSERC of Cnd Prts of this work hve been published t the Twenty-Fourth Annul Conference on Neurl Informtion Processing Systems NIPS 200 [6 G Neu is with the Deprtment of Computer Science nd Informtion Theory, Budpest University of Technology nd Economics, Budpest, Hungry, nd with the Computer nd Automtion Reserch Institute of the Hungrin Acdemy of Sciences, Budpest, Hungry emil: neugergely@gmilcom A György is with the Deprtment of Computing Science, University of Albert, Edmonton, Cnd emil: gy@csbmehu During prts of this work he ws with the Mchine Lerning Reserch Group of the Computer nd Automtion Reserch Institute of the Hungrin Acdemy of Sciences Cs Szepesvári is with the Deprtment of Computing Science, University of Albert, Edmonton, Cnd emil: szepesv@ulbertc A Antos is with the Budpest University of Technology nd Economics, Budpest, Hungry emil: ntos@csbmehu During prts of this work he ws with the Mchine Lerning Reserch Group of the Computer nd Automtion Reserch Institute of the Hungrin Acdemy of Sciences, Budpest, Hungry Fig The interction between the lerning gent nd the environment Here q denotes unit dely, tht is, ny informtion sent through such box is received t the beginning of the next time step formtion might be hopeless Equivlently, we ssume tht whtever cn be modeled bout the environment is modeled in the Mrkovin, controlled prt As result, when evluting the performnce of the lerner, the totl rewrd of the lerner will be compred to tht of the best stochstic sttionry policy in hindsight tht ssigns ctions to the sttes of the Mrkovin prt in rndom mnner This sttionry policy is thus selected s the policy tht mximizes the totl rewrd given the sequence of rewrd functions r t, r,, y t, t =, 2, Given horizon T > 0, ny policy π nd initil distribution uniquely determines distribution over the sequence spce X A T Noting tht the expected totl rewrd of π is then liner function of the distribution of π nd tht the spce of distributions is convex polytope with vertices corresponding to distributions of deterministic policies, we see tht there will lwys exist deterministic policy tht mximizes the totl expected rewrd in T time steps Hence, it is enough to consider deterministic policies only s reference To mke the objective more precise, for given sttionry deterministic policy π : X A let x π t, π t denote the stte-ction pir tht would hve been visited in time step t hd one used policy π from the beginning of time the initil stte being fixed Then, the gol cn be expressed s keeping the expected regret, ˆL T = mx E π [ T [ T r tx π t, π t E r t smll, regrdless of the sequence of rewrd functions {r t} T In prticulr, subliner regret-growth, L T = ot T mens tht the verge rewrd collected by the lerning gent pproches tht of the best policy in hindsight Nturlly, smller growth-rte is more desirble 2 The motivtion to study this problem is mnifold One viewpoint It is worth noting tht the problem cn be defined without referring to the uncontrolled, unmodelled dynmics by strting with n rbitrry sequence of rewrd functions {r t} Tht the two problems re equivlent follows becuse there is no restriction on the rnge of {y t} or its dynmics 2 Following previous works in the re, in this pper we only consider regret reltive to fixed sttionry policy However, s usul in online lerning, our results nd lgorithms cn lso be extended to less restricted sets of reference policies, such s the clss of sequences of sttionry policies with restricted number of switches We discuss such extensions in Section IV-D

2 2 is tht lerning gent chieving subliner regret growth shows robustness in the fce of rbitrrily ssigned rewrds, thus, the model provides useful generliztion of lerning nd cting in Mrkov Decision Processes Some exmples where the need for such robustness rises nturlly re discussed below Another viewpoint is tht this problem is useful generliztion of online lerning problems studied in the mchine lerning literture eg, [5 In prticulr, in this literture, the problems studied re so-clled prediction problems tht involve n oblivious environment tht chooses sequence of loss functions The lerner s predictions re elements in the common domin of these loss functions nd the gol is to keep the regret smll s compred with the best fixed prediction in hindsight Identifying losses with negtive rewrds we my notice tht this problem coincides exctly with our model with X =, tht is, our problem is indeed generliztion of this problem where the rewrd functions hve memory represented by multiple sttes subject to the Mrkovin control Let us now consider some exmples tht fit the bove model Generlly, since our pproch ssumes tht the hrd-to-model, uncontrolled prt influences the rewrds only, the exmples concern cses where the rewrd is difficult to model This is the cse, for exmple, in vrious production- nd resource-lloction problems, where the mjor source of difficulty is to model the prices tht influence the rewrds Indeed, the prices in these problems tend to depend on externl, generlly unobserved fctors nd thus dynmics of the prices might be hrd to model Other exmples include problems coming from computer science, such s the k-server problem, pging problems, or web-optimiztion eg, d-lloction problems with delyed informtion [see, eg, 7, 22 Previous results tht concern online lerning in MDPs with known trnsition probbility kernels re summrized in Tble I In pper lgorithm feedbck loops regret bound Even-Dr et l full MDP-E [6, 7 informtion yes ÕT /2 Yu et l [22 LAZY-FPL full ÕT yes, informtion ɛ > 0 Yu et l [22 Q-FPL 2 bndit yes ot Neu et l [3 bndit no OT /2 Neu et l [6 MDP-EXP3 bndit yes ÕT 2/3 this pper MDP-EXP3 bndit yes ÕT /2 TABLE I SUMMARY OF PREVIOUS RESULTS PREVIOUS WORKS CONCERNED PROBLEMS WITH EITHER FULL-INFORMATION OR BANDIT FEEDBACK, PROBLEMS WHEN THE MDP DYNAMICS MAY OR MAY NOT HAVE LOOPS TO BE MORE PRECISE, IN NEU ET AL [3 WE CONSIDERED EPISODIC MDPS WITH RESTARTS FOR EACH PAPER, THE ORDER OF THE OBTAINED REGRET BOUND IN TERMS OF THE TIME HORIZON T IS GIVEN The Lzy-FPL lgorithm hs smller computtionl complexity thn MDP-E 2 The stochstic regret of Q-FPL ws shown to be subliner lmost surely not only in expecttion the current pper we study the problem with recurrent Mrkovin dynmics while ssuming tht the only informtion received bout the uncontrolled prt is in the form of the ctul rewrd r t In prticulr, in our model the gent does not receive y t, while in most previous works it ws ssumed tht y t is observed [6, 7, 22 Following the terminology used in the online lerning literture [2, when y t is vilble equivlently, the gent receives the rewrd function r t : X A R in every time step, we sy tht lerning hppens under full informtion, while in our cse we sy tht lerning hppens under bndit feedbck note tht Even-Dr et l [7 suggested s n open problem to ddress the bndit sitution studied here In n erlier version of this pper [6, we provided n lgorithm, MDP- EXP3, for lerning in MDPs with recurrent dynmics under bndit feedbck, nd showed tht it chieves regret of order ÕT 2/3 3 In this pper we improve upon the nlysis of [6 nd prove n ÕT /2 -regret bound for the sme lgorithm As it follows from lower bound proven by Auer et l [2 for bndit problems, prt from logrithmic nd constnt terms the rte obtined is unimprovble The improvement compred to [6 is chieved by more elborte proof technique tht builds on perhps novel observtion tht the soclled exponentil weights technique tht our lgorithm builds upon chnges its weights slowly As in previous works where loopy Mrkovin dynmics were considered, our min ssumptions on the MDP trnsition probbility kernel will be tht sttionry policies mix uniformly fst In ddition, we shll ssume tht the sttionry distributions of these policies re bounded wy from zero These ssumptions will be discussed lter We lso mention here tht Yu nd Mnnor [20, 2 considered the relted problem of online lerning in MDPs where the trnsition probbilities my lso chnge rbitrrily fter ech trnsition This problem is significntly more difficult thn the cse where only the rewrd function is llowed to chnge Accordingly, the lgorithms proposed in these ppers do not chieve subliner regret Unfortuntely, these ppers hve lso gps in the proofs, s discussed in detil in [3 Finlly, we note in pssing tht the contextul bndit problem considered by Lzric nd Munos [2 cn lso be regrded s simplified version of our online lerning problem where the sttes re generted in n iid fshion though we do not consider the problem of competing with the best policy in restricted subset of sttionry policies For regret bounds concerning lerning in purely stochstic unknown MDPs, see the work of Jksch et l [0 nd the references therein Lerning in dversril MDPs without loops ws lso considered by György et l [8 for deterministic trnsitions under bndit feedbck, nd under full informtion but with unknown trnsition probbility kernels in our recent pper [4 The rest of the pper is orgnized s follows: The problem is lid out in Section II, which is followed by section tht mkes our ssumptions precise Section III The lgorithm nd the min result re given nd discussed in Section IV, with the proofs presented in Section V II NOTATION AND PROBLEM DEFINITION The purpose of this section is to provide the forml definition of our problem nd to set the gols We strt with some preliminries, in prticulr by reviewing the lnguge we use in connection to Mrkov Decision Processes MDPs This will be followed by the definition of the online lerning problem We ssume tht the reder is fmilir with the concepts necessry to study MDPs, our purpose here is to introduce the nottion only For more bckground bout MDPs, consult Putermn [7 We define finite Mrkov Decision Process MDP M by finite stte spce X, finite ction set A, trnsition probbility kernel P : X A X [0,, nd rewrd function r : X A [0, At time t {, 2, }, bsed on the sequence of pst sttes, observed rewrds, nd ctions, x,, rx,, x 2,, x t, t, rx t, t, x t X A R t X, n gent cting in the MDP M chooses n ction t A to be executed 4 As result, the process moves to stte x t+ X with probbility 3 Here, Õgs denotes the clss of functions f : N R + stisfying fs sup s N gs ln α < for some α 0 4 gs Throughout the pper we will use boldfce letters to denote rndom vribles

3 3 P x t+ x t, t nd the gent incurs the rewrd rx t, t We note in pssing tht t the price of incresed nottionl lod, but with essentilly no chnge to the contents, we could consider the cse where the set of ctions vilble t time step t is restricted to nonempty subset Ax t of ll ctions, where the set-system, Ax x X, is known to the gent However, for simplicity, in the rest of the pper we stick to the cse Ax = A In n MDP the gol of the gent is to mximize the long-term rewrd In prticulr, in the so-clled verge-rewrd problem, the gol of the gent is to mximize the long-run verge rewrd In wht follows, the symbols x, x, will be reserved to denote stte in X, while,, b will be reserved to denote n ction in A In expressions involving sums over X, the domin of x, x, will be suppressed to void clutter The sme holds for sums involving ctions Before defining the lerning problem, let us introduce some more nottion We use v p to denote the L p-norm of function or vector In prticulr, for p = the supremum norm of function v : S R is defined s v = sup s S vs, nd for p < nd for ny vector u = u,, u d R d d /p, u p = i= ui p We use e,, e d to denote the row vectors of the cnonicl bsis of the Eucliden spce R d Since we will identify X with the integers {,, X }, we will lso use the nottion e x for x X We will use ln to denote the nturl logrithm function A Online lerning in MDPs In this pper we consider so-clled online lerning problem when the rewrd function is llowed to chnge rbitrrily in every time step Tht is, insted of single rewrd function r, sequence of rewrd functions {r t} is given This sequence is ssumed to be fixed hed of time, nd, for simplicity, we ssume tht r tx, [0, for ll x, X A nd t {, 2, } No other ssumptions re mde bout this sequence The lerning gent is ssumed to know the trnsition probbilities P, but is not given the sequence {r t} The protocol of interction with the environment is unchnged: At time step t the gent selects n ction t bsed on the informtion vilble to it, which is sent to the environment In response, the rewrd r tx t, t nd the next stte x t+ re communicted to the gent The initil stte x is generted from fixed distribution P, which my or my not be known Let the expected totl rewrd collected by the gent up to time T be denoted by [ T R T = E r tx t, t As before, the gol of the gent is to mke this sum s lrge s possible In clssicl pproches to lerning one would ssume some kind of regulrity of r t nd then derive bounds on how much rewrd the lerning gent loses s compred to the gent tht knew bout the regulrity of the rewrds nd who cted optimlly from the beginning of time The loss or regret, mesured in terms of the difference of totl expected rewrds of the two gents, quntifies the lerner s efficiency In this pper, following the recent trend in the mchine lerning literture [5, while keeping the regret criterion, we will void mking ny ssumption on how the rewrd sequence is generted, nd tke worst-cse viewpoint The potentil benefit is tht the results will be more generlly pplicble nd the lgorithms will enjoy dded robustness, while, generlizing from results vilble for supervised lerning [4,, 8, the lgorithms cn lso be shown to void being too pessimistic The concept of regret in our cse is defined s follows: We shll consider lgorithms which re competitive with stochstic sttionry policies Fix stochstic sttionry policy π : X A [0, nd let {x t, t} be the trjectory tht results from following policy π from x P in prticulr, t π x t def = πx t, The expected totl rewrd of π over the first T time steps is defined s [ T RT π = E r tx t, t Now, the expected regret or expected reltive loss of the lerning gent reltive to the clss of sttionry policies is defined s ˆL T = sup RT π R T, π where the supremum is tken over ll stochstic sttionry policies in M Note tht the policy mximizing the totl expected rewrd is chosen in hindsight, tht is, bsed on the knowledge of the rewrd functions r,, r T Thus, the regret mesures how well the lerning gent is ble to generlize from its moment to moment knowledge of the rewrds to the sequence r,, r T If the regret of n gent grows sublinerly with T then it cn be sid to ct s well s the best stochstic sttionry policy in the long run ie, the verge expected rewrd of the gent in the limit is equl to tht of the best policy In this pper our min result will show tht there exists n lgorithm such tht if tht lgorithm is followed by the lerning gent, then the lerning gent s regret will be bounded by C T ln T, where C > 0 is constnt tht depends on the trnsition probbility kernel, but is independent of the sequence of rewrds {r t} III ASSUMPTIONS ON THE TRANSITION PROBABILITY KERNEL Before describing our ssumptions, few more definitions re needed: First of ll, for brevity, in wht follows we will cll stochstic sttionry policies just policies Further, without loss of generlity, we shll identify the sttes with the first X integers nd ssume tht X = {, 2,, X } Now, tke policy π nd define the Mrkov kernel P π x x = π xp x x, The identifiction of X with the first X integers mkes it possible to view P π s mtrix: P π x,x = P π x x In wht follows, we will lso tke this view when convenient In generl, distributions will lso be treted s row vectors Hence, for distribution µ over X, µp π is the distribution over X tht results from using policy π for one step fter stte is smpled from µ ie, the next-stte distribution under π Finlly, sttionry distribution of policy π is distribution µ st tht stisfies µ st P π = µ st In wht follows we ssume tht every stochstic sttionry policy π hs well-defined unique sttionry distribution µ π st This ensures tht the verge rewrd underlying ny sttionry policy is well-defined single rel number It is well-known tht in this cse the convergence to the sttionry distribution is exponentilly fst Following Even-Dr et l [7, we consider the following stronger, uniform mixing condition which implies the existence of the unique sttionry distributions: Assumption A: There exists number τ 0 such tht for ny policy π nd ny pir of distributions µ nd µ over X, µ µ P π e /τ µ µ As Even-Dr et l [7, we cll the smllest τ stisfying this ssumption the mixing time of the trnsition probbility kernel P Together with the existence nd uniqueness of the sttionry policy, the next ssumption ensures tht every stte is visited eventully no mtter wht policy is chosen: Assumption A2: The sttionry distributions re uniformly bounded wy from zero: inf π,x µπ stx > 0

4 4 for some R Note tht e /τ is the supremum over ll policy π of the Mrkov-Dobrushin coefficient of ergodicity, defined µ µ s m P π = sup P π µ µ for the trnsition µ µ probbility kernel P π, see, eg, [9 It is lso known tht m P π = min x,x X y X min{p π y x, P π y x } [9 Since m P π is continuous function of π nd the set of policies is compct, there is policy π with m P π = sup π mp π These fcts imply tht Assumption A is stisfied, tht is, sup π m P π <, if nd only if for every π, m P π <, tht is, P π is scrmbling mtrix P π is scrmbling mtrix if ny two rows of P π shre some column in which they both hve positive element Furthermore, if P π is scrmbling mtrix for ny deterministic policy π then it is lso scrmbling mtrix for ny stochstic policy Thus, to gurntee Assumption A it is enough to verify mixing for deterministic policies only The ssumptions will be further discussed in Section IV-D IV LEARNING IN ONLINE MDPS UNDER BANDIT FEEDBACK In this section we shll first introduce some dditionl, stndrd MDP concepts tht we will need Tht these concepts re well-defined follows from our ssumptions on P nd from stndrd results to be found, for exmple, in the book by Putermn [7 After the definitions, we specify our lgorithm The section is finished by the sttement of our min result concerning the performnce of the proposed lgorithm A Preliminries Fix n rbitrry policy π nd t Let {x s, s} be rndom trjectory generted by π nd the trnsition probbility kernel P nd n rbitrry everywhere positive initil distribution over the sttes We will use qt π to denote the ction-vlue function underlying π nd the immedite rewrd r t, while we will use vt π to denote the corresponding stte vlue function 5 Tht is, for x, X A, q π t x, = E v π t x = E [ s= [ s= rtx s, s ρ π t x = x, = rtx s, s ρ π t x = x where ρ π t is the verge rewrd per stge corresponding to π: ρ π t = lim S S S E[r tx s, s s= The verge rewrd per stge cn be expressed s ρ π t = x µ π stx π xr tx,, where µ π st is the sttionry distribution underlying policy π Under our ssumptions stted in the previous section, up to shift by constnt function, the vlue functions qt π, vt π re the unique solutions to the Bellmn equtions q π t x, = r tx, ρ π t + x P x x, v π t x, v π t x = π xq π t x,, which hold simultneously for ll x, X A Corollry 827 of [7 We will use qt to denote the optiml ction-vlue function, 5 Most sources would cll these functions differentil ction- nd stte-vlue functions We omit this djective for brevity,, 2 tht is, the ction-vlue function underlying policy tht mximizes the verge-rewrd in the MDP specified by P, r t We will lso need these concepts for n rbitrry rewrd function r: X A R In such cse, we will use v π, q π, nd ρ π to denote the respective vlue function, ction-vlue function, nd verge rewrd of policy π Now, consider the trjectory {x t, t} followed by lerning gent with x P For ny t, define u t = x,, r x,,, x t, t, r tx t, t 3 nd introduce the policy followed in time step t, π t x = P[ t = u t, x t = x, where u 0 nd, more generlly u s for ll s 0 is defined to be the empty sequence Note tht π t is computed bsed on pst informtion nd is therefore rndom We introduce the following nottion: q t = q π t t, v t = v π t t, ρ t = ρ π t t With this, we see tht the following equtions hold simultneously for ll x, X A: q tx, = r tx, ρ t + x P x x, v tx, v tx = B The lgorithm π t xq tx, Our lgorithm, MDP-EXP3, shown s Algorithm, is inspired by tht of Even-Dr et l [7, while lso borrowing ides from the EXP3 lgorithm exponentil weights lgorithm for explortion nd exploittion of Auer et l [2 The min ide of the lgorithm is to Algorithm MDP-EXP3: n lgorithm for online lerning in MDPs Set N, w x, = w 2x, = = w 2N x, =, γ 0,, 0, γ For t =, 2, repet: Set π t x = w tx, + γ wtx, b b for ll x, X A 2 Drw n ction t π t x t 3 Receive rewrd r tx t, t nd observe x t+ 4 If t N Compute µ N t for ll x X using 8 b Construct estimtes ˆr t using 6 nd compute ˆq t using 5 c Set w t+n x, = w t+n x, e ˆq tx, for ll x, X A construct estimtes {ˆq t} of the ction-vlue functions {q t}, which re then used to determine the ction-selection probbilities π t x in ech stte x in ech time step t In prticulr, the probbility of selecting ction in stte x t time step t is computed s the mixture of the uniform distribution which encourges exploring ctions irrespective of wht the lgorithm hs lerned bout the ction-vlues nd Gibbs distribution, the mixture prmeter being γ > 0 Given stte x, the Gibbs distribution defines the probbility of choosing ction t time step t to be proportionl to exp t N s=n ˆqsx, 6 6 In the lgorithm the Gibbs ction-selection probbilities re computed in n incrementl fshion with the help of the weights w tx, Note tht numericlly stble implementtion would clculte the ction-selection probbilities bsed on the reltive vlue differences, t N s=n ˆqsx, mx t N A s=n ˆqsx, These reltive vlue differences cn lso be updted incrementlly The form shown in Algorithm is preferred for mthemticl clrity 4

5 5 Here, > 0, N > 0 re further prmeters of the lgorithm Note tht for the single-stte setting with N =, MDP-EXP3 is equivlent to the EXP3 lgorithm of Auer et l [2 w t x, b w tx,b It is interesting to discuss how the Gibbs policy ie, is relted to wht is known s the Boltzmnn-explortion policy in the reinforcement lerning literture [eg, 9 Remember tht given stte x, the Boltzmnn-explortion policy would select ction t time step t with probbility proportionl to expˆq t x, for some estimte ˆq t of the optiml ction-vlue function in the MDP P, ˆr t, where {ˆr t} is the sequence of estimted rewrd functions Thus, we cn see couple of differences between the Boltzmnn explortion nd our Gibbs policy The first difference is tht the Gibbs policy in our lgorithm uses the cumulted sum of the estimtes of ction-vlues, while the Boltzmnn policy uses only the lst estimte By depending on the sum, the Gibbs policy will rely less on the lst estimte This reduces how fst the policies cn chnge, mking the lerning smoother Another difference is tht in our Gibbs policy the sum of previous ction-vlues runs only up to step t N insted of using the sum tht runs up to the lst step t The resons for doing this will be explined below Finlly, the Gibbs policy uses the ctionvlue function estimtes in the MDPs {P, ˆr s} of the policies {π s} selected by the lgorithm, s opposed to using n estimte of the optiml ction-vlue function This mkes our lgorithm closer in spirit to modified policy itertion thn to vlue itertion nd is gin expected to reduce the vrince of the lerning process The reson the Gibbs policy does not use the lst N estimtes is to llow the construction of resonble estimte ˆq t of the ctionvlue function q t If r t ws vilble, one could compute q t bsed on r t cf 4 nd the sum could then run up to t, resulting in the lgorithm of Even-Dr et l [7 Since in our problem r t is not vilble, we estimte it using n importnce smpling estimtor ˆr t below from now on, t N Given this ˆr t, the estimte ˆq t of the ction-vlue function q t is defined s the ction-vlue function underlying policy π t in the verge-rewrd MDP given by the trnsition probbility kernel P nd rewrd function ˆr t Thus, ˆq t, up to shift by constnt function, cn be computed s the solution to the Bellmn equtions corresponding to P, ˆr t cf 4: ˆq tx, = ˆr tx, ˆρ t + x P x x, ˆv tx, ˆv tx = π t xˆq tx,, ˆρ t = x, µ π t st x π t x ˆr tx,, which hold simultneously for ll x, X A Since π t is invrint to constnt shifts of ˆq t, ny of the solutions of these equtions leds to the sme sequence of policies Hence, in wht follows, without loss of generlity we ssume tht the lgorithm uses ˆq t, ie, the vlue function of π t in the verge-rewrd MDP defined by P, ˆr t To define the estimtor ˆr t define µ N t x s the probbility of visiting stte x t time step t, conditioned on the history u t N up to time step t N, including x t N nd t N cf 3 for the definition of {u t}: µ N t x def = P [x t = x u t N, x X Then, the estimte of r t is constructed using ˆr tx, = { rt x,, π t xµ N t x if x, = xt, t ; 0, otherwise The importnce smpling estimtor 6 is well-defined only if for 5 6 x = x t, µ N t x > 0 7 holds lmost surely by construction π t x t > γ/> 0 To see the intuitive reson of why 7 holds, it is instructive to look into how the distribution µ N t cn be computed When t = N, it should be cler from the definition of µ N t tht, viewing µ N t s row vector, µ N N = P P π N Now let t > N Denote by P the trnsition probbility mtrix of the policy tht selects ction in every stte nd recll tht e x denotes the x th unit row vector of the cnonicl bsis of the X -dimensionl Eucliden spce We my write µ N t = e xt N P t N P π t N+ P π t, t > N 8 This holds becuse for ny t N, π t is entirely determined by the history u t N, while for t > N the history u t N lso includes nd thus determines x t N, t N Using the nottion z σu t N to denote tht the rndom vrible z is mesurble with respect to the sigm-lgebr generted by the history u t N, the bove fct cn be stted s x t N, t N σu t N for t > N, π t σu t N for t N Consequently, we lso hve tht π t,, π t N+ σu t N nd therefore 8 follows from the lw of totl probbility Note lso tht P [ t = x t =x, u t N = P [ t = x t = x, u t =π t x, 0 where the lst equlity follows from the definition of π t nd t The lgorithm s presented needs to know P to compute µ N t t step t = N When P is unknown, insted of strting the computtion of the weights t time step t = N, we cn strt the computtion t time step t = N + ie, chnge t N of step 4 to t N + Clerly, in the worst-cse, the regret cn only increse by constnt mount the mgnitude of the lrgest rewrd s result of this chnge An essentil step of the proof of our min result is to show tht inequlity 7 indeed holds, tht is, µ N t x is bounded wy from zero In fct, we will show tht this inequlity holds lmost surely 7 for ll x X provided tht N is lrge enough, which explins why the sum in the definition of the Gibbs policy runs from time N This will be done by first showing tht the policies π t especilly, during the lst N steps chnge sufficiently slowly this is where it becomes useful tht the Gibbs policy is defined using sum of previous ction vlues Consequently, π t N+,,π t will ll be quite close to the policy of the lst time step Then, the expression on the right-hnd side of 8 cn be seen to be close to the N -step stte distribution of π t when strting from x t N, t N, which, if N is lrge enough, will be shown to be close to the sttionry distribution of π t thnks to Assumption A Since by Assumption A2, min x X µ π t st x > 0 then, by choosing the lgorithm s prmeters ppropritely, we cn show tht µ N t x /2 > 0 holds for ll x X, tht is, inequlity 7 follows This is shown in Lemm 3 It remins to be seen tht the estimte ˆr t is meningful In this regrd, we clim tht 9 E [ˆr tx, u t N = r tx, 7 In wht follows, for the ske of brevity, unless otherwise stted, we will omit the modifier lmost surely from probbilistic sttements It is worth to mention tht the finiteness of X nd A llows severl sttements concerning conditionl expecttions to hold lwys, insted of lmost surely

6 6 holds for ll x, X A First note tht E [ˆr tx, u t N = where we hve exploited tht π t, µ N t E [ I {x,=xt, t } u t N r tx, π t xµ N t x E [ I {x,=xt, t } u t N, σu t N Now, = P [ t = x t = x, u t N P [x t = x u t N By definition, P [x t = x u t N = µ N t x nd by 0, P [ t = x t = x, u t N = π t x Putting together the equlities obtined, we get By linerity of expecttion nd since π t, µ π t st σu t N, it then follows from 5 nd tht E[ ˆρ t u t N = ρ t, nd, hence, by the linerity of the Bellmn equtions nd by our ssumption tht ˆq t is the vlue function underlying the MDP P, ˆr t nd policy π t, we hve, for ll x, X A, E[ˆq tx, u t N = q tx,, E[ˆv tx u t N = v tx As consequence, we lso hve, for ll x, X A, t N, E[ ˆρ t = E [ρ t, E[ˆq tx, = E [q tx,, E[ˆv tx = E [v tx 2 3 Let us finlly comment on the computtionl complexity of our lgorithm Due to the dely in updting the policies bsed on the weights, the lgorithm needs to store N policies or weights, leding to the policies Thus, the memory requirement of MDP- EXP3 scles with N X in the rel-number model The computtionl complexity of the lgorithm is dominted by the cost of computing ˆr t nd, in prticulr, by the cost of computing µ N t, plus the cost of solving the Bellmn equtions 5 The cost of this is O X 2 N + X + in the worst cse, for ech time step, however, it cn be much smller for specific prcticl cses such s when the number of possible next-sttes is limited C Min result Our min result is the following bound concerning the performnce of MDP-EXP3 Theorem Regret under bndit feedbck: Let the trnsition probbility kernel P stisfy Assumptions A nd A2 Let T > 0 nd let N = + τ ln T, nd hy = 2y ln y for y > 0 Then for n pproprite choice of the prmeters nd γ which depend on, T,, τ, for ny sequence of rewrd functions {r t} tking vlues in [0,, for T > mx c τ + τ 3 3 ln, h τ c 2 + τ ln nd τ 8 the regret of the lgorithm MDP-EXP3 cn be bounded s τ L T C 3 T ln lnt + C τ 2 ln T for some universl constnts c, c 2, C, C > 0 Note tht with the specific choice of prmeters the totl cost of the lgorithm for time horizon of T is O T X 2 τ lnt + X + 8 The choice of the lower bound on τ is rbitrry, but the constnts in the theorem depend on it Furthermore, with some extr work, our proof lso gives rise to bound for the cse when τ 0, but for simplicity we decided to leve out this nlysis The proof is presented in the next section For comprison, we give now the nlogue result for the lgorithm of Even-Dr et l [7 tht ws developed for the full-informtion cse when the lgorithm is given r t in ech time step As hinted on before, our lgorithm reduces to this lgorithm if we set N =, ˆr t = r t nd γ = 0 We cll this lgorithm MDP-E fter Even-Dr et l [7 The following regret bound holds for this lgorithm: Theorem 2 Regret under full-informtion feedbck: Fix T > 0 Let the trnsition probbility kernel P stisfy Assumption A Then, for n pproprite choice of the prmeter which depends on, T, τ, for ny sequence of rewrd functions {r t} tking vlues in [0,, the regret of the lgorithm MDP-E cn be bounded s ˆL T 4τ + + 2T 2τ + 32τ 2 + 6τ + 5 ln 4 For pedgogicl resons, we shll present the proof in the next section, too Note tht the constnts in this bound re different from those presented in Theorem 5 of Even-Dr et l [7 In prticulr, the leding term here is 2τ 3/2 2T ln, while their leding term is 4τ 2 T ln The bove bound both corrects some smll mistkes in their clcultions nd improves the result t the sme time 9 As Even-Dr et l [7 note, the regret bound 4 does not depend directly on the number of sttes, X, but the dependence ppers implicitly through τ only Even-Dr et l [7 lso note tht tighter bound, where only the mixing times of the ctul policies chosen pper, cn be derived However, it is uncler whether in the worstcse this could be used to improve the bound Similrly to 4, our bound depends on X through other constnts In the bndit cse, these re nd τ Compring the theorems it seems tht the min price of not seeing the rewrds is the ppernce of insted of ln typicl difference between the bndit nd full observtion cses nd the ppernce of / term in the bound D Discussion nd future work In this pper, we hve presented n online lerning lgorithm, MDP-EXP3 for dversril MDPs, tht is, finite stochstic Mrkovin decision environments where the rewrd function my chnge fter ech trnsition This is the first lgorithm for this setting tht hs rigorously proved O T ln T bound on its regret We discuss the fetures of the lgorithm, long with future reserch directions below Extensions: We considered the expected regret reltive to the best fixed policy selected in hindsight A typicl extension is to prove high probbility bound on the regret, which we think cn be done in stndrd wy using concentrtion inequlities Note, however, tht the extension is more complicted thn for the bndit problems becuse the mixing property hs to be used together with the mrtingle resoning Another potentil extension is to compete with lrger policy clsses, such s with sequences of policies with bounded number of policy-switches Similrly to Neu et l [3, 5, the MDP-EXP3 lgorithm should then be modified by replcing EXP3 with the EXP3S lgorithm of Auer et l [2, specificlly designed to compete with switching experts in plce of EXP3 Note tht, gin, the nlysis will be more complicted thn in the bndit cse, nd requires to bound the mximum regret of EXP3S reltive to ny fixed policy over ny time window When compred to policy with C switches, the resulting regret bound is expected to be C times 9 One of the mistkes is in the proof of Theorem 4 of Even-Dr et l [7 where they filed to notice tht q π t t cn tke on negtive vlues Thus, their Assumption 3 is not met by {q π t t } one needs to extend the upper bound given in their Lemm 22 with lower bound nd chnge Assumption 3 As result, Assumption 3 cnnot be used to show tht the inequlity in the proof of Theorem 4 holds This mistke, s well s the others, cn esily be corrected, s we show it here

7 7 lrger thn tht of Theorem, while the lgorithm would not need to know the number of switches C b Tuning nd complexity: Setting up nd running the lgorithm MDP-EXP3 my ctully be computtionlly demnding Setting the prmeters of the lgorithm nd γ requires known lower bound on the visittion probbilities such tht = inf π,x µ π stx > > 0 nd lso the knowledge of n upper bound τ on the mixing time τ While these quntities cn be determined in principle from the trnsition probbility kernel P, it is not cler how to compute efficiently the minimum over ll policies Computtionl issues lso rise during running the lgorithm: s it is discussed in Section IV-B, ech step of the MDP-EXP3 lgorithm requires O X 2 τ ln T + X + computtions, which my be too demnding if, eg, the size of the stte spce is lrge It is n interesting problem to design more efficient method tht chieves similr performnce gurntees c Assumptions on the Mrkovin dynmics: We believe tht it should be possible to extend our min result beyond Assumption A, requiring only the existence of unique sttionry distribution for ny policy π we will refer to this ltter ssumption s the unichin ssumption Using tht the distribution of ny unichin Mrkov chin converges exponentilly fst to its sttionry distribution, nd tht it is enough to verify Assumption A for deterministic policies only, one cn esily show tht if P stisfies the unichin ssumption, then there exists n integer K > 0 such tht P π K is scrmbling mtrix for ny policy π Then, we conjecture tht the MDP-EXP3 lgorithm will work s it is, except tht the regret will be incresed The key to prove this result is to generlize Lemms 4 nd 5 to this cse Finlly, one my lso consider the cse when the Mrkov chins corresponding to P π re periodic We speculte tht this my be delt with using occupncy probbilities nd Cesro-verges insted of the sttionry nd stte distributions, respectively V PROOFS In this section we present the proofs of Theorem nd Theorem 2 We strt with the proof of Theorem 2 s this is simpler result The proof of this result is presented prtly for the ske of completeness nd prtly so tht we cn be more specific bout the corrections required to fix the min result Theorem 52 of Even-Dr et l [7 Further, the proof will lso serve s strting point for the proof of our min result, Theorem Nevertheless, the imptient reder my skip this next section nd jump immeditely to the proof of Theorem, which prt from referring to some generl lemms developed in the next subsection, is entirely self-contined A Proof of Theorem 2 Throughout this section we consider the MDP-E lgorithm given by Algorithm with N =, ˆr t = r t nd γ = 0, nd we suppose tht P stisfies Assumption A Let π t denote the policy used in step t of the lgorithm Note tht π t is not rndom since by ssumption the rewrd function is vilble t ll sttes not just the visited ones Hence, the sequence of policies chosen does not depend on the sttes visited by the lgorithm but is deterministic Remember tht ρ t = ρ π t t denotes the verge rewrd of policy π t mesured with respect to the rewrd function r t Following Even-Dr et l [7, fix some policy π nd consider the decomposition of the regret reltive to π: RT π R T T T T T = RT π + ρ π t ρ t + ρ t R T ρ π t 5 The first nd the lst terms mesure the difference between the sum of symptotic verge rewrds nd the ctul expected rewrd The mixing ssumption Assumption A ensures tht these differences re not lrge In prticulr, in the cse of fixed policy, this difference is bounded by constnt of order τ: Lemm : For ny T nd ny policy π, it holds tht T RT π ρ π t 2τ This lemm is lso stted in [7 We give the proof for completeness lso to correct slight inccurcies of the proof given in [7 Proof: Let {x t, t} be the trjectory when π is followed Note tht the difference between RT π difference between the initil distribution of x nd the sttionry distribution of π To quntify the difference, write R π T T ρ π t = nd T ρπ t is cused by the T νt π x µ π stx π xr tx,, x where νt π x = P[x t = x is the stte distribution t time step t Viewing νt π s row vector, we hve νt π = νt P π π Consider the t th term of the bove difference Then, using r tx, [0, nd Assumption A we get 0 νt π x µ π stx π xr tx, x νt π µ π st = νt P π π µ π stp π e /τ ν π t µ π st e t /τ ν π µ π st 2e t /τ This, together with the elementry inequlity T e t /τ + e t/τ dt = + τ gives the desired bound 0 Consider now the second term of 5 nd in prticulr its t th term ρ π t ρ t = ρ π t ρ π t t This term is the difference of the verge rewrd obtined by π nd π t The following lemm shows tht this difference cn be rewritten in terms of the stte-wise ctiondisdvntges underlying π t: Lemm 2 Performnce difference lemm: Consider n MDP specified by the trnsition probbility kernel P nd rewrd function r Let π, ˆπ be two stochstic sttionry policies in the MDP Assume tht µ π st, ρˆπ nd q ˆπ re well-defined Then, ρ π ρˆπ = x, µ π stxπ x [ q ˆπ x, v ˆπ x This lemm ppered s Lemm 4 in [7, but similr sttements hve been known for while For exmple, the book of Co [3 lso puts performnce difference sttements in the center of the theory of MDPs For the ske of completeness, we include the esy proof Note tht the sttement of the lemm continues to hold even when qˆπ nd v ˆπ re shifted by the sme constnt function Proof: We hve x, µ π stxπ xq ˆπ x, = x, µ π stxπ x = ρ π ρˆπ + x [ rx, ρˆπ + x P x x, v ˆπ x µ π stxv ˆπ x, where the second equlity holds since x, µπ stxπ xp x x, = µ π stx Reordering the terms gives the desired result 0 Even-Dr et l [7 mistkenly uses ν π t µπ st e t/τ ν π µ π st in their pper t = immeditely shows tht this cn be flse See, eg, the proofs of their Lemms 22 nd 52 This lemm does not need Assumption A nd in fct the ssumptions we mke could be further relxed with slight chnge to the clim

8 8 Becuse of this lemm, ρ π t ρ t = Let us now consider the third term of 5, T x, µπ stxπ x q π t t x, v π ρt R T The t t x Thus, by flipping t th term of this difference is the difference between the verge the sum tht runs over time with the one tht runs over the stte-ction pirs, we get: T ρπ t rewrd of π t nd the expected rewrd obtined T ρt = in step π t If ν tx x, µπ stxπ x is the distribution of sttes in time step t, T T qπ t t x, v π ρt R T = t t x Thus, it suffices to T x µπ stx ν tx π xrtx, Thus, bound, for fixed stte-ction pir x,, the sum T T q π t t x, v π t t x = q π t t x, T ρ t R T T µ π t π t xq π t t x, st ν t 9 By construction, π t x exp t s= qπs s x, recll tht γ = 0 in this version of the lgorithm, which mens tht the sum is the regret of the so-clled exponentil weights lgorithm EWA ginst ction when the lgorithm is used on the sequence {q π t t x, } Assume for moment tht K > 0 is such tht q π t t K holds for t T Then, since q π t t tkes its vlues from n intervl of length 2K, Theorem 22 in [5 implies tht the regret of EWA cn be bounded by ln + K2 T 2 7 Notice tht {q π t t } is sequence tht is sequentilly generted from {r t} It is Lemm 4 of [5 tht shows tht the bound of Theorem 22 of [5 continues to hold for such sequentilly generted functions Putting the inequlities together, we obtin T T ρ π t ρ t ln + K2 T 2 8 According to the next lemm n pproprite vlue for K is 2τ + 3 The lemm is stted in greter generlity thn wht is needed here becuse the more generl form will be used lter Lemm 3: Pick ny policy π in n MDP P, r Assume tht the mixing time of π is τ in the sense of If π xrx, R r holds for ny x X, then v π x 2Rτ + holds for ll x X Furthermore, for ny x, X A, q π x, R2τ + 3+ rx, nd, if, in ddition, rx, 0 for ny x, X A, then q π x, 2τ + 3 r Proof: As it is well known nd is esy to see from the definitions, the differentil vlue of policy π t stte x cn be written s v π x = νs,xx π µ π stx π x rx,, s= x where νs,x π = e xp π s is the stte distribution when following π for s steps strting from stte x The tringle inequlity nd then the bound on π x rx, gives v π x R νs,xx π µ π stx 2R τ +, s= x where in the second inequlity we used ν π s,x µ π st 2e s /τ nd tht s= e s /τ τ + cf the proof of Lemm This proves the first inequlity The inequlities on q π x, follow from the first prt nd the Bellmn eqution: q π x, rx, + ρ π + x P x x, v π x R2τ rx,, q π x, rx, ρ π + x P x x, v π x 2τ + 3 r Here, in the first inequlity we used tht ρ π x µπ stx π xrx, R, while the second inequlity holds since rx, ρ π, R [0, r nd so remins to bound the l distnces between the distributions µ π t st nd ν t For this, we will use two generl lemms tht will gin come useful lter For f : X A R, introduce the mixed norm f, = mx x f x, where f x is identified with fx, Clerly, νp π νp ˆπ π ˆπ, holds for ny two policies π, ˆπ nd ny distribution ν cf Lemm 5 in [7 The first lemm shows tht the mp π µ π st s mp from the spce of sttionry policies equipped with the mixed norm, to the spce of distributions equipped with the l -norm is τ +-Lipschitz: Lemm 4: Let P be trnsition probbility kernel over X A such tht the mixing time of P is τ < For ny two policies, π, ˆπ, it holds tht µ π st µˆπ st τ + π ˆπ, Proof: The sttement follows from solving µ π st µˆπ st µ π stp π µˆπ stp π + µˆπ stp π µˆπ stp ˆπ e /τ µ π st µˆπ st + π ˆπ, for µ π st µˆπ st nd using / e /τ τ + 20 The next lemm llows us to compre n n-step distribution under policy sequence with the sttionry distribution of the sequence s lst policy: Lemm 5: Let P be trnsition probbility kernel over X A such tht the mixing time of P is τ < Tke ny probbility distribution ν over X, integer n nd policies π,, π n Consider the distribution ν n = ν P π P π n Then, it holds tht ν n µ πn st 2e n /τ + τ + 2 mx tn πt πt,, where, for convenience, we hve introduced π 0 = π Proof: If n = the result is obtined from ν µ π st 2 Thus, in wht follows we ssume n 2 Let c = mx tn π t π t, By the tringle inequlity, ν n µ πn st νn µ π n st + π µ n st µ πn st e /τ ν n µ π n st + τ + c, µ πn st τ + where we used tht by the previous lemm µ π n st π n π n, τ + c Continuing recursively, we get ν n µ πn st e /τ e /τ νn 2 µ π n 2 st e n τ ν µ π st + τ + c 2e n /τ + τ + 2 c, + τ + c + τ + c + e τ + + e n 2 τ where we bounded the geometricl series by / e /τ nd used 20

9 9 Applying this lemm to ν t µ π t st we get ν t µ π t st 2e t /τ + τ + 2 K, where K is bound on mx 2tn π t π t, 2 Therefore, by 9, we hve T ρ t R T T 2 e t/τ + τ + 2 K T 2τ + τ + 2 K T Thus, it remins to find n pproprite vlue for K It is well known property of EWA tht π t x π t x q π t t x, Indeed, pplying Pinsker s inequlity nd Hoeffding s lemm see Section A2 nd Lemm A6 in Ces-Binchi nd Lugosi 5, we get for ny x X π t x π t x 2Dπ t x π t x = 2 [ln π t b xe qπ t t b,x q π t t x, b π t xq π t t b, x where, for two distributions Dv v = i vi lnvi/v i denotes the Kullbck-Leibler divergence of the distributions v nd v Thus, π t π t, q π t t Now, by Lemm 3, q π t t 2τ + 3, showing tht K = 2τ + 3 is suitble Putting together the inequlities obtined, we get T ρ t R T 2τ + 2τ + 3τ + 2 T Combining 6, 8 nd this lst bound, we obtin R π T R T = 4τ ln Setting + T 2τ + 32τ 2 + 6τ ln = T 2τ + 32τ 2 + 6τ + 5, we get the bound stted in Theorem 2 B Proof of Theorem Throughout this section we consider the MDP-EXP3 lgorithm nd suppose tht both Assumptions A nd A2 hold for P We strt from the decomposition 5, which is repeted to emphsize the difference tht some of the terms re rndom now: RT π R T T T T T = RT π + ρ π t ρ t + ρ π t 2 As before, Lemm shows tht the first term is bounded by 2τ + Thus, it remins to bound the expecttion of the other two terms This is done in the following two propositions whose proofs re deferred to the next subsections: 2 Lemm 52 of Even-Dr et l [7 gives bound on ν t µ π t st with slightly different technique However, there re multiple mistkes in the proof Once the mistkes re removed, their bounding technique gives the sme result s ours One of the mistkes is tht Assumption 3 sttes tht K = ln/t, wheres since the rnge of the ction-vlue functions scles with τ, K should lso scle with τ Unfortuntely, in [6 we committed the sme mistke, which we correct here We choose to present n lternte proof, s we find it somewht clener nd it lso gve us the opportunity to present Lemm 4 Proposition : Let L = 2 2τ + 3, Vˆq =, 2 + 2τ + 2 γ Uˆv = 4 τ +, U πˆq = 4 τ + 2, Uq = 2τ + 3, U q = 2τ + 4, e = e, e = e 2, c = euˆv + L + γvˆq envˆq, c = e L + γvˆq + e U πˆq + Uˆv U q en + Vˆq, nd ssume tht γ 0,, cτ + 2 < /2, N + 4 γ τ ln, 0 < < Then, for 2cτ+ 2 2eN+/γ +2τ+2 ny policy π, we hve T E [ρ π t ρ t ln + N U q + U q + + T 2N + 2 c N + e Vˆq + γ U q + e U πˆq U q Proposition 2: Assume tht the conditions of Proposition hold Then, T E [ρ t R T N + T N + c τ T N + e N /τ 22 Note tht setting N + τ ln T, s suggested in Theorem, the lst term in the right-hnd side of 22 becomes O, while for T sufficiently lrge ll the conditions of the lst two propositions will be stisfied This leds to the proof of Theorem : Proof of Theorem : If = then, due to ˆL T = 0, the sttement is trivil, so we ssume 2 from now on Define α = 2 euˆv + L + γvˆq, α = 2 {e L + γvˆq + e U πˆq + Uˆv U q} α so tht c = 2 nd c α = 2, γ 2eNV ˆq γ 2eN+V ˆq where V ˆq = 2 Vˆq = /γ + 2τ + 2 In the following we will use the nottion f g for two positive-vlued functions f, g : D R + defined on the sme domin D to denote tht they re equivlent up to constnt fctor, tht is, sup x D mx{fx/gx, gx/fx} < With this nottion, on 2, τ nd s long s γ, we hve α +τ nd α τ 2 23 independently of the vlue of nd of the choice of, γ, N In wht follows ll the equivlences will be stted for the domin 2, τ We now show how to choose, γ nd N so s to chieve smll regret bound In order to do so we will choose these constnts so tht the conditions of Propositions nd 2 re stisfied For simplicity, ρ t R T we dd the constrint γ /2 tht we will lso show to hold Under this dditionl constrint, the inequlity < 2eN + /γ + 2τ will be stisfied if we choose γ = 8eN ++τ +/ Indeed, the sid inequlity holds since it is equivlent to D = 2eN + /γ + 2τ + 2 > 0 nd 2eN + +γ2τ + 2 D = γ 2eN + +τ + = 2 γ 4 > 0, where the first inequlity holds becuse γ /2 nd the second equlity holds by the definition of γ Since c 2α/D nd c =

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004 Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

Lecture 1. Functional series. Pointwise and uniform convergence.

Lecture 1. Functional series. Pointwise and uniform convergence. 1 Introduction. Lecture 1. Functionl series. Pointwise nd uniform convergence. In this course we study mongst other things Fourier series. The Fourier series for periodic function f(x) with period 2π is

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent

More information

Entropy and Ergodic Theory Notes 10: Large Deviations I

Entropy and Ergodic Theory Notes 10: Large Deviations I Entropy nd Ergodic Theory Notes 10: Lrge Devitions I 1 A chnge of convention This is our first lecture on pplictions of entropy in probbility theory. In probbility theory, the convention is tht ll logrithms

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Riemann is the Mann! (But Lebesgue may besgue to differ.) Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >

More information

Continuous Random Variables

Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information

7.2 The Definite Integral

7.2 The Definite Integral 7.2 The Definite Integrl the definite integrl In the previous section, it ws found tht if function f is continuous nd nonnegtive, then the re under the grph of f on [, b] is given by F (b) F (), where

More information

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique? XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk bout solving systems of liner equtions. These re problems tht give couple of equtions with couple of unknowns, like: 6 2 3 7 4

More information

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral Improper Integrls Every time tht we hve evluted definite integrl such s f(x) dx, we hve mde two implicit ssumptions bout the integrl:. The intervl [, b] is finite, nd. f(x) is continuous on [, b]. If one

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams Chpter 4 Contrvrince, Covrince, nd Spcetime Digrms 4. The Components of Vector in Skewed Coordintes We hve seen in Chpter 3; figure 3.9, tht in order to show inertil motion tht is consistent with the Lorentz

More information

Review of basic calculus

Review of basic calculus Review of bsic clculus This brief review reclls some of the most importnt concepts, definitions, nd theorems from bsic clculus. It is not intended to tech bsic clculus from scrtch. If ny of the items below

More information

LECTURE NOTE #12 PROF. ALAN YUILLE

LECTURE NOTE #12 PROF. ALAN YUILLE LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of

More information

MAA 4212 Improper Integrals

MAA 4212 Improper Integrals Notes by Dvid Groisser, Copyright c 1995; revised 2002, 2009, 2014 MAA 4212 Improper Integrls The Riemnn integrl, while perfectly well-defined, is too restrictive for mny purposes; there re functions which

More information

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007 A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information

Math 360: A primitive integral and elementary functions

Math 360: A primitive integral and elementary functions Mth 360: A primitive integrl nd elementry functions D. DeTurck University of Pennsylvni October 16, 2017 D. DeTurck Mth 360 001 2017C: Integrl/functions 1 / 32 Setup for the integrl prtitions Definition:

More information

APPROXIMATE INTEGRATION

APPROXIMATE INTEGRATION APPROXIMATE INTEGRATION. Introduction We hve seen tht there re functions whose nti-derivtives cnnot be expressed in closed form. For these resons ny definite integrl involving these integrnds cnnot be

More information

1 Probability Density Functions

1 Probability Density Functions Lis Yn CS 9 Continuous Distributions Lecture Notes #9 July 6, 28 Bsed on chpter by Chris Piech So fr, ll rndom vribles we hve seen hve been discrete. In ll the cses we hve seen in CS 9, this ment tht our

More information

{ } = E! & $ " k r t +k +1

{ } = E! & $  k r t +k +1 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

ECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance

ECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance Generl structure ECO 37 Economics of Uncertinty Fll Term 007 Notes for lectures 4. Stochstic Dominnce Here we suppose tht the consequences re welth mounts denoted by W, which cn tke on ny vlue between

More information

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a). The Fundmentl Theorems of Clculus Mth 4, Section 0, Spring 009 We now know enough bout definite integrls to give precise formultions of the Fundmentl Theorems of Clculus. We will lso look t some bsic emples

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Chapter 5 : Continuous Random Variables

Chapter 5 : Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll

More information

Line and Surface Integrals: An Intuitive Understanding

Line and Surface Integrals: An Intuitive Understanding Line nd Surfce Integrls: An Intuitive Understnding Joseph Breen Introduction Multivrible clculus is ll bout bstrcting the ides of differentition nd integrtion from the fmilir single vrible cse to tht of

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0) 1 Tylor polynomils In Section 3.5, we discussed how to pproximte function f(x) round point in terms of its first derivtive f (x) evluted t, tht is using the liner pproximtion f() + f ()(x ). We clled this

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Math 61CM - Solutions to homework 9

Math 61CM - Solutions to homework 9 Mth 61CM - Solutions to homework 9 Cédric De Groote November 30 th, 2018 Problem 1: Recll tht the left limit of function f t point c is defined s follows: lim f(x) = l x c if for ny > 0 there exists δ

More information

Math 8 Winter 2015 Applications of Integration

Math 8 Winter 2015 Applications of Integration Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl

More information

An approximation to the arithmetic-geometric mean. G.J.O. Jameson, Math. Gazette 98 (2014), 85 95

An approximation to the arithmetic-geometric mean. G.J.O. Jameson, Math. Gazette 98 (2014), 85 95 An pproximtion to the rithmetic-geometric men G.J.O. Jmeson, Mth. Gzette 98 (4), 85 95 Given positive numbers > b, consider the itertion given by =, b = b nd n+ = ( n + b n ), b n+ = ( n b n ) /. At ech

More information

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as Improper Integrls Two different types of integrls cn qulify s improper. The first type of improper integrl (which we will refer to s Type I) involves evluting n integrl over n infinite region. In the grph

More information

Convergence of Fourier Series and Fejer s Theorem. Lee Ricketson

Convergence of Fourier Series and Fejer s Theorem. Lee Ricketson Convergence of Fourier Series nd Fejer s Theorem Lee Ricketson My, 006 Abstrct This pper will ddress the Fourier Series of functions with rbitrry period. We will derive forms of the Dirichlet nd Fejer

More information

Best Approximation. Chapter The General Case

Best Approximation. Chapter The General Case Chpter 4 Best Approximtion 4.1 The Generl Cse In the previous chpter, we hve seen how n interpolting polynomil cn be used s n pproximtion to given function. We now wnt to find the best pproximtion to given

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 2013 Outline 1 Riemnn Sums 2 Riemnn Integrls 3 Properties

More information

Abstract inner product spaces

Abstract inner product spaces WEEK 4 Abstrct inner product spces Definition An inner product spce is vector spce V over the rel field R equipped with rule for multiplying vectors, such tht the product of two vectors is sclr, nd the

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 203 Outline Riemnn Sums Riemnn Integrls Properties Abstrct

More information

Lecture 1: Introduction to integration theory and bounded variation

Lecture 1: Introduction to integration theory and bounded variation Lecture 1: Introduction to integrtion theory nd bounded vrition Wht is this course bout? Integrtion theory. The first question you might hve is why there is nything you need to lern bout integrtion. You

More information

arxiv:math/ v2 [math.ho] 16 Dec 2003

arxiv:math/ v2 [math.ho] 16 Dec 2003 rxiv:mth/0312293v2 [mth.ho] 16 Dec 2003 Clssicl Lebesgue Integrtion Theorems for the Riemnn Integrl Josh Isrlowitz 244 Ridge Rd. Rutherford, NJ 07070 jbi2@njit.edu Februry 1, 2008 Abstrct In this pper,

More information

Bases for Vector Spaces

Bases for Vector Spaces Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

ODE: Existence and Uniqueness of a Solution

ODE: Existence and Uniqueness of a Solution Mth 22 Fll 213 Jerry Kzdn ODE: Existence nd Uniqueness of Solution The Fundmentl Theorem of Clculus tells us how to solve the ordinry differentil eqution (ODE) du = f(t) dt with initil condition u() =

More information

A recursive construction of efficiently decodable list-disjunct matrices

A recursive construction of efficiently decodable list-disjunct matrices CSE 709: Compressed Sensing nd Group Testing. Prt I Lecturers: Hung Q. Ngo nd Atri Rudr SUNY t Bufflo, Fll 2011 Lst updte: October 13, 2011 A recursive construction of efficiently decodble list-disjunct

More information

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by. NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with

More information

Reversals of Signal-Posterior Monotonicity for Any Bounded Prior

Reversals of Signal-Posterior Monotonicity for Any Bounded Prior Reversls of Signl-Posterior Monotonicity for Any Bounded Prior Christopher P. Chmbers Pul J. Hely Abstrct Pul Milgrom (The Bell Journl of Economics, 12(2): 380 391) showed tht if the strict monotone likelihood

More information

Lecture 3. Limits of Functions and Continuity

Lecture 3. Limits of Functions and Continuity Lecture 3 Limits of Functions nd Continuity Audrey Terrs April 26, 21 1 Limits of Functions Notes I m skipping the lst section of Chpter 6 of Lng; the section bout open nd closed sets We cn probbly live

More information

20 MATHEMATICS POLYNOMIALS

20 MATHEMATICS POLYNOMIALS 0 MATHEMATICS POLYNOMIALS.1 Introduction In Clss IX, you hve studied polynomils in one vrible nd their degrees. Recll tht if p(x) is polynomil in x, the highest power of x in p(x) is clled the degree of

More information

Heat flux and total heat

Heat flux and total heat Het flux nd totl het John McCun Mrch 14, 2017 1 Introduction Yesterdy (if I remember correctly) Ms. Prsd sked me question bout the condition of insulted boundry for the 1D het eqution, nd (bsed on glnce

More information

Online Supplements to Performance-Based Contracts for Outpatient Medical Services

Online Supplements to Performance-Based Contracts for Outpatient Medical Services Jing, Png nd Svin: Performnce-bsed Contrcts Article submitted to Mnufcturing & Service Opertions Mngement; mnuscript no. MSOM-11-270.R2 1 Online Supplements to Performnce-Bsed Contrcts for Outptient Medicl

More information

Week 10: Line Integrals

Week 10: Line Integrals Week 10: Line Integrls Introduction In this finl week we return to prmetrised curves nd consider integrtion long such curves. We lredy sw this in Week 2 when we integrted long curve to find its length.

More information

221B Lecture Notes WKB Method

221B Lecture Notes WKB Method Clssicl Limit B Lecture Notes WKB Method Hmilton Jcobi Eqution We strt from the Schrödinger eqution for single prticle in potentil i h t ψ x, t = [ ] h m + V x ψ x, t. We cn rewrite this eqution by using

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

Notes on length and conformal metrics

Notes on length and conformal metrics Notes on length nd conforml metrics We recll how to mesure the Eucliden distnce of n rc in the plne. Let α : [, b] R 2 be smooth (C ) rc. Tht is α(t) (x(t), y(t)) where x(t) nd y(t) re smooth rel vlued

More information

Improper Integrals, and Differential Equations

Improper Integrals, and Differential Equations Improper Integrls, nd Differentil Equtions October 22, 204 5.3 Improper Integrls Previously, we discussed how integrls correspond to res. More specificlly, we sid tht for function f(x), the region creted

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

Math Lecture 23

Math Lecture 23 Mth 8 - Lecture 3 Dyln Zwick Fll 3 In our lst lecture we delt with solutions to the system: x = Ax where A is n n n mtrix with n distinct eigenvlues. As promised, tody we will del with the question of

More information

( dg. ) 2 dt. + dt. dt j + dh. + dt. r(t) dt. Comparing this equation with the one listed above for the length of see that

( dg. ) 2 dt. + dt. dt j + dh. + dt. r(t) dt. Comparing this equation with the one listed above for the length of see that Arc Length of Curves in Three Dimensionl Spce If the vector function r(t) f(t) i + g(t) j + h(t) k trces out the curve C s t vries, we cn mesure distnces long C using formul nerly identicl to one tht we

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

Frobenius numbers of generalized Fibonacci semigroups

Frobenius numbers of generalized Fibonacci semigroups Frobenius numbers of generlized Fiboncci semigroups Gretchen L. Mtthews 1 Deprtment of Mthemticl Sciences, Clemson University, Clemson, SC 29634-0975, USA gmtthe@clemson.edu Received:, Accepted:, Published:

More information

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1 Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution

More information

Lecture Note 9: Orthogonal Reduction

Lecture Note 9: Orthogonal Reduction MATH : Computtionl Methods of Liner Algebr 1 The Row Echelon Form Lecture Note 9: Orthogonl Reduction Our trget is to solve the norml eution: Xinyi Zeng Deprtment of Mthemticl Sciences, UTEP A t Ax = A

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

Lecture 3 ( ) (translated and slightly adapted from lecture notes by Martin Klazar)

Lecture 3 ( ) (translated and slightly adapted from lecture notes by Martin Klazar) Lecture 3 (5.3.2018) (trnslted nd slightly dpted from lecture notes by Mrtin Klzr) Riemnn integrl Now we define precisely the concept of the re, in prticulr, the re of figure U(, b, f) under the grph of

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

Stuff You Need to Know From Calculus

Stuff You Need to Know From Calculus Stuff You Need to Know From Clculus For the first time in the semester, the stuff we re doing is finlly going to look like clculus (with vector slnt, of course). This mens tht in order to succeed, you

More information

The final exam will take place on Friday May 11th from 8am 11am in Evans room 60.

The final exam will take place on Friday May 11th from 8am 11am in Evans room 60. Mth 104: finl informtion The finl exm will tke plce on Fridy My 11th from 8m 11m in Evns room 60. The exm will cover ll prts of the course with equl weighting. It will cover Chpters 1 5, 7 15, 17 21, 23

More information

Advanced Calculus: MATH 410 Uniform Convergence of Functions Professor David Levermore 11 December 2015

Advanced Calculus: MATH 410 Uniform Convergence of Functions Professor David Levermore 11 December 2015 Advnced Clculus: MATH 410 Uniform Convergence of Functions Professor Dvid Levermore 11 December 2015 12. Sequences of Functions We now explore two notions of wht it mens for sequence of functions {f n

More information

Credibility Hypothesis Testing of Fuzzy Triangular Distributions

Credibility Hypothesis Testing of Fuzzy Triangular Distributions 666663 Journl of Uncertin Systems Vol.9, No., pp.6-74, 5 Online t: www.jus.org.uk Credibility Hypothesis Testing of Fuzzy Tringulr Distributions S. Smpth, B. Rmy Received April 3; Revised 4 April 4 Abstrct

More information

FUNDAMENTALS OF REAL ANALYSIS by. III.1. Measurable functions. f 1 (

FUNDAMENTALS OF REAL ANALYSIS by. III.1. Measurable functions. f 1 ( FUNDAMNTALS OF RAL ANALYSIS by Doğn Çömez III. MASURABL FUNCTIONS AND LBSGU INTGRAL III.. Mesurble functions Hving the Lebesgue mesure define, in this chpter, we will identify the collection of functions

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17 EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,

More information

Czechoslovak Mathematical Journal, 55 (130) (2005), , Abbotsford. 1. Introduction

Czechoslovak Mathematical Journal, 55 (130) (2005), , Abbotsford. 1. Introduction Czechoslovk Mthemticl Journl, 55 (130) (2005), 933 940 ESTIMATES OF THE REMAINDER IN TAYLOR S THEOREM USING THE HENSTOCK-KURZWEIL INTEGRAL, Abbotsford (Received Jnury 22, 2003) Abstrct. When rel-vlued

More information

1.9 C 2 inner variations

1.9 C 2 inner variations 46 CHAPTER 1. INDIRECT METHODS 1.9 C 2 inner vritions So fr, we hve restricted ttention to liner vritions. These re vritions of the form vx; ǫ = ux + ǫφx where φ is in some liner perturbtion clss P, for

More information

STEP FUNCTIONS, DELTA FUNCTIONS, AND THE VARIATION OF PARAMETERS FORMULA. 0 if t < 0, 1 if t > 0.

STEP FUNCTIONS, DELTA FUNCTIONS, AND THE VARIATION OF PARAMETERS FORMULA. 0 if t < 0, 1 if t > 0. STEP FUNCTIONS, DELTA FUNCTIONS, AND THE VARIATION OF PARAMETERS FORMULA STEPHEN SCHECTER. The unit step function nd piecewise continuous functions The Heviside unit step function u(t) is given by if t

More information

Lecture notes. Fundamental inequalities: techniques and applications

Lecture notes. Fundamental inequalities: techniques and applications Lecture notes Fundmentl inequlities: techniques nd pplictions Mnh Hong Duong Mthemtics Institute, University of Wrwick Emil: m.h.duong@wrwick.c.uk Februry 8, 207 2 Abstrct Inequlities re ubiquitous in

More information

Quantum Physics II (8.05) Fall 2013 Assignment 2

Quantum Physics II (8.05) Fall 2013 Assignment 2 Quntum Physics II (8.05) Fll 2013 Assignment 2 Msschusetts Institute of Technology Physics Deprtment Due Fridy September 20, 2013 September 13, 2013 3:00 pm Suggested Reding Continued from lst week: 1.

More information

Theoretical foundations of Gaussian quadrature

Theoretical foundations of Gaussian quadrature Theoreticl foundtions of Gussin qudrture 1 Inner product vector spce Definition 1. A vector spce (or liner spce) is set V = {u, v, w,...} in which the following two opertions re defined: (A) Addition of

More information

Chapters 4 & 5 Integrals & Applications

Chapters 4 & 5 Integrals & Applications Contents Chpters 4 & 5 Integrls & Applictions Motivtion to Chpters 4 & 5 2 Chpter 4 3 Ares nd Distnces 3. VIDEO - Ares Under Functions............................................ 3.2 VIDEO - Applictions

More information

Vyacheslav Telnin. Search for New Numbers.

Vyacheslav Telnin. Search for New Numbers. Vycheslv Telnin Serch for New Numbers. 1 CHAPTER I 2 I.1 Introduction. In 1984, in the first issue for tht yer of the Science nd Life mgzine, I red the rticle "Non-Stndrd Anlysis" by V. Uspensky, in which

More information

2008 Mathematical Methods (CAS) GA 3: Examination 2

2008 Mathematical Methods (CAS) GA 3: Examination 2 Mthemticl Methods (CAS) GA : Exmintion GENERAL COMMENTS There were 406 students who st the Mthemticl Methods (CAS) exmintion in. Mrks rnged from to 79 out of possible score of 80. Student responses showed

More information

Bernoulli Numbers Jeff Morton

Bernoulli Numbers Jeff Morton Bernoulli Numbers Jeff Morton. We re interested in the opertor e t k d k t k, which is to sy k tk. Applying this to some function f E to get e t f d k k tk d k f f + d k k tk dk f, we note tht since f

More information

Properties of the Riemann Integral

Properties of the Riemann Integral Properties of the Riemnn Integrl Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University Februry 15, 2018 Outline 1 Some Infimum nd Supremum Properties 2

More information

Section 6.1 INTRO to LAPLACE TRANSFORMS

Section 6.1 INTRO to LAPLACE TRANSFORMS Section 6. INTRO to LAPLACE TRANSFORMS Key terms: Improper Integrl; diverge, converge A A f(t)dt lim f(t)dt Piecewise Continuous Function; jump discontinuity Function of Exponentil Order Lplce Trnsform

More information

UNIT 1 FUNCTIONS AND THEIR INVERSES Lesson 1.4: Logarithmic Functions as Inverses Instruction

UNIT 1 FUNCTIONS AND THEIR INVERSES Lesson 1.4: Logarithmic Functions as Inverses Instruction Lesson : Logrithmic Functions s Inverses Prerequisite Skills This lesson requires the use of the following skills: determining the dependent nd independent vribles in n exponentil function bsed on dt from

More information

Euler, Ioachimescu and the trapezium rule. G.J.O. Jameson (Math. Gazette 96 (2012), )

Euler, Ioachimescu and the trapezium rule. G.J.O. Jameson (Math. Gazette 96 (2012), ) Euler, Iochimescu nd the trpezium rule G.J.O. Jmeson (Mth. Gzette 96 (0), 36 4) The following results were estblished in recent Gzette rticle [, Theorems, 3, 4]. Given > 0 nd 0 < s

More information

The Henstock-Kurzweil integral

The Henstock-Kurzweil integral fculteit Wiskunde en Ntuurwetenschppen The Henstock-Kurzweil integrl Bchelorthesis Mthemtics June 2014 Student: E. vn Dijk First supervisor: Dr. A.E. Sterk Second supervisor: Prof. dr. A. vn der Schft

More information

Numerical Analysis: Trapezoidal and Simpson s Rule

Numerical Analysis: Trapezoidal and Simpson s Rule nd Simpson s Mthemticl question we re interested in numericlly nswering How to we evlute I = f (x) dx? Clculus tells us tht if F(x) is the ntiderivtive of function f (x) on the intervl [, b], then I =

More information

Chapter 14. Matrix Representations of Linear Transformations

Chapter 14. Matrix Representations of Linear Transformations Chpter 4 Mtrix Representtions of Liner Trnsformtions When considering the Het Stte Evolution, we found tht we could describe this process using multipliction by mtrix. This ws nice becuse computers cn

More information