Off-policy TD(λ) with a true online equivalence

Size: px
Start display at page:

Download "Off-policy TD(λ) with a true online equivalence"

Transcription

1 Off-policy TD(λ) wih a rue online equivalence Hado van Hassel A Rupam Mahmood Richard S Suon Reinforcemen Learning and Arificial Inelligence Laboraory Universiy of Albera, Edmonon, AB T6G 2E8 Canada Absrac Van Seijen and Suon (2014) recenly proposed a new version of he linear TD(λ) learning algorihm ha is exacly equivalen o an online forward view and ha empirically performed beer han is classical counerpar in boh predicion and conrol problems However, heir algorihm is resriced o on-policy learning In he more general case of off-policy learning, in which he policy whose oucome is prediced and he policy used o generae daa may be differen, heir algorihm canno be applied One reason for his is ha he algorihm boosraps and hus is subjec o insabiliy problems when funcion approximaion is used A second reason rue online TD(λ) canno be used for off-policy learning is ha he off-policy case requires sophisicaed imporance sampling in is eligibiliy races To address hese limiaions, we generalize heir equivalence resul and use his generalizaion o consruc he firs online algorihm o be exacly equivalen o an off-policy forward view We show his algorihm, named rue online GTD(λ), empirically ouperforms GTD(λ) (Maei, 2011) which was derived from he same objecive as our forward view bu lacs he exac online equivalence In he general heorem ha allows us o derive his new algorihm, we encouner a new general eligibiliy-race updae 1 Temporal difference learning Eligibiliy races improve learning in emporal-difference (TD) algorihms by efficienly propagaing credi for laer observaions bac o updae earlier predicions (Suon, 1988), and can help speed up learning significanly A good way o inerpre hese races, he exen of which is regulaed by a race parameer λ 0, 1], is o consider he evenual updaes o each predicion For λ = 1 he updae for he predicion a ime is similar o a Mone Carlo updae owards he full reurn following For λ = 0 he predicion is updaed oward only he immediaely (reward) signal, and he res of he reurn is esimaed wih he predicion a he nex sae Such an inerpreaion is called a forward view, because i considers he effec of fuure observaions on he updaes In pracice, learning is ofen fases for inermediae values of λ (Suon & Baro, 1998) Tradiionally, he equivalence o a forward view was nown o hold only when he predicions are updaed offline In pracice TD algorihms are more commonly used online, during learning, bu hen his equivalence was only approximae Recenly, van Seijen and Suon (2014) developed rue online TD(λ), he firs algorihm o be exacly equivalen o a forward view under online updaing For λ = 1 he updaes by rue online TD(λ) evenually become exacly equivalen o a Mone Carlo updae owards he full reurn As demonsraed by van Seijen and Suon, such an online equivalence is more han a heoreical curiosiy, and leads o lower predicion errors han when using he radiional TD(λ) algorihm ha only achieves an offline equivalence In his paper, we generalize his resul and show exac online equivalences are possible for a wide range of forward views, leading o compuaionally efficien online algorihms by exploiing a new generic race updae A limiaion of he rue online TD(λ) algorihm by van Seijen and Suon (2014) is ha i is only applicable o onpolicy learning, when he learned predicions correspond o he policy ha is used o generae he daa Off-policy learning is imporan o be able o learn from demonsraions, o learn abou many hings a he same ime (Suon e al, 2011), and ulimaely o learn abou he unnown opimal policy A naural nex sep is herefore o apply our general equivalence resul o an off-policy forward view We consruc such a forward view and derive an equivalen new off-policy gradien TD algorihm, ha we call rue online GTD(λ) This algorihm is consruced o be equivalen for λ = 0, by design, o he exising GTD(λ) algorihm (Maei, 2011) We demonsrae empirically ha for higher λ he new algorihm is much beer behaved due o is exac

2 equivalence o a desired forward view In addiion o he pracical poenial of he new algorihm, his demonsraes he usefulness of our general equivalence resul and he resuling new race updae 2 Problem seing We consider a learning agen in an unnown environmen where a each ime sep he agen performs an acion A afer which he environmen ransiions from he curren sae S o he nex sae S +1 We do no assume he sae iself can be observed and he agen insead observes a feaure vecor φ R n, which is ypically a funcion of he sae S such ha φ = φ(s ) The agen selecs is acions according o a behavior policy b, such ha b(a S ) denoes he probabiliy of selecing acion A = a in sae S Typically b(a s) depends on s hrough φ(s) Afer performing A, he agen observes a scalar (reward) signal R +1 and he process can eiher erminae or coninue We allow for sof erminaions, defined by a poenially ime-varying sae-dependen erminaion facor γ 0, 1] (cf Suon, Mahmood, Precup & van Hassel, 2014) Wih weigh 1 γ +1 he process erminaes a ime + 1 and R +1 is considered he las reward in his episode Wih weigh γ +1 we coninue o he nex sae and observe φ +1 = φ(s+1 ) The agen hen selecs a new acion A +1 and his process repeas A special case is he episodic seing where γ = 1 for all non-erminaing imes and γ T = 0 when he episode ends a ime T The erminaion facors are commonly called discoun facors, because hey discoun he effec of laer rewards The goal is o predic he sum of fuure rewards, discouned by he probabiliies of erminaion, under a arge policy π The opimal predicion is hus defined for each sae s by v π (s) = E π =1 1 R =1 γ S 0 = s where E π ] = E A π( S ), ] is he expecancy condiional on he policy π We esimae he values v π (s) wih a parameerized funcion of he observed feaures In paricular we consider linear funcions of he feaures, such ha θ φ v π (S ) is he esimaed value of he sae a ime according o a weigh vecor θ The goal is hen o improve he predicions by updaing θ We desire online algorihms wih a consan O(n) per-sep complexiy, where n is he number of feaures in φ Such compuaional consideraions are imporan in seings wih a lo of daa or when φ is a large vecor For insance, we wan our algorihms o be able o run on a robo wih many sensors and limied on-board processing power ], 3 General online equivalence beween forward and bacward views We can hin abou wha he ideal updae would be for a predicion afer observing all relevan fuure rewards and saes Such an updae is called a forward view, because i depends on observaions from he fuure A concree example is he on-policy Mone Carlo reurn, consising of he discouned sum of all fuure rewards In pracice, full Mone Carlo updaes can have high variance I can be beer o augmen he reurn wih he hencurren predicions a he visied saes When we coninue afer some ime sep, wih weigh γ +1, we replace a porion of 1 λ +1 of he remaining reurn wih our curren predicion of his reurn a S +1 Maing use of laer predicions o updae earlier predicions in his way is called boosrapping The process hen coninues o he nex acion and reward wih oal weigh γ +1 λ +1, where again we erminae wih 1 γ +2 and hen boosrap wih 1 λ +2, and so on When λ +1 = 0 we ge he usual one-sep TD reurn R +1 + γ +1 φ +1θ If λ = 1 for all, we obain a full (discouned) Mone Carlo reurn In he on-policy seing, when we do no have o worry abou deviaions from he arge policy, we can hen updae he predicion made a ime owards he on-policy λ-reurn defined by G λ = R +1 + γ +1 (1 λ+1 )φ +1θ + λ +1 G λ +1] The discoun facors γ are normally considered a propery of he problem, bu he boosrap parameers λ can be considered unable parameers The full reurn (obained for λ = 1) is an unbiased esimae for he value of he behavior policy, bu is variance can be high The value esimaes are ypically no unbiased, bu can be considerably less variable As such, one can inerpre he λ parameers as rading off bias and variance Typically, learning is fases for inermediae values of λ If erminaion never occurs, G λ is never fully defined To consruc a well-defined forward view, we can runcae he recursion a he curren daa horizon (van Seijen & Suon, 2014; Suon e al, 2014) o obain inerim λ-reurns If we have daa up o ime, all reurns are runcaed as if λ = 0 and we boosrap on he mos recen value esimae φ θ 1 of he curren sae This gives us, for each 0 < G λ, = R +1 + γ +1 (1 λ+1 )φ +1θ + λ +1 G λ ] +1, and G λ, = φ θ 1 In his definiion of G λ,, for each ime sep j wih < j he value of sae S j is esimaed using φ j θ j 1, because θ j 1 is he mos up-o-dae weigh vecor a he momen we reach his sae Using hese inerim reurns, we can consruc an inerim forward view which, in conras o convenional forward

3 views, can be compued before an episode has concluded or even if he episode never fully erminaes For insance, when we have daa up o ime he following se of linear updaes for all imes < is an inerim forward view: θ +1 = θ + α (G λ, φ θ )φ, <, (1) where θ0 = θ 0 is he iniial weigh vecor The subscrip on θ (firs index on Gλ, ) corresponds o he sae for he h updae, he superscrip (second index on G λ, ) denoes he curren daa horizon The forward view (1) is well-defined and compuable a every ime, bu i is no very compuaionally efficien For each new observaion, when incremens o + 1, we poenially have o recompue all he updaes, as G λ,+1 migh differ from G λ, for arbirary many The resuling compuaional complexiy is O(n) per ime sep, which is problemaic when becomes large Therefore, forward views are no mean o be implemened as is They serve as a concepual updae, in which we formulae wha we wan o achieve afer observing he relevan daa In he nex heorem, we prove ha for many forward views an efficien and fully equivalen bacward view exiss ha explois eligibiliy races o consruc online updaes ha use only O(n) compuaion per ime sep, bu ha sill resul in exacly he same weigh vecors The heorem is consrucive, allowing us o find such bacward views auomaically for a given forward view Theorem 1 (Equivalence beween forward and bacward views) Consider any forward view ha updaes owards some inerim arges Y wih θ +1 = θ + η (Y φ θ )φ + x, 0 <, where θ0 = θ 0 for some iniial θ 0 and where x R n is any vecor ha does no depend on Assume ha he emporal differences Y +1 Y for differen are relaed hrough Y +1 Y = c (Y Y +1), <, (2) where c is a scalar ha does no depend on Then, he final weighs θ a each are equal o he weighs θ as defined by e 0 = η 0 φ 0 and he bacward view θ +1 = θ + (Y +1 Y )e + η (Y φ θ )φ + x, e = c 1 e 1 + η (1 c 1 φ e 1 )φ, > 0 (3) Proof We inroduce he fading marix F = I η φ φ, such ha θ+1 = F θ + η Y φ We subrac θ from o find he change when incremens Expanding θ θ+1 +1, we ge θ+1 +1 θ = F θ +1 θ + η Y +1 φ + x = F (θ +1 θ) + η Y +1 φ + (F I)θ + x = F (θ +1 θ) + η Y +1 φ η φ φ θ + x = F (θ +1 θ ) + η (Y +1 φ θ )φ + x (4) We now repeaedly expand boh θ +1 θ +1 and θ o ge θ = F 1 (θ 1 +1 θ 1) + η 1 (Y 1 +1 Y 1)φ 1 = F 1 F 2 (θ 1 +1 θ 1) + η 2 (Y 2 +1 Y 2)F 1 φ 2 + η 1 (Y 1 +1 Y 1)φ 1 = (Expand unil reaching θ0 +1 θ0 = 0) = F 1 F 0 (θ0 +1 θ0) 1 + η F 1 F +1 (Y +1 =0 1 = η F 1 F +1 (Y +1 =0 Y )φ Y )φ 1 = η F 1 F +1 c (Y Y +1)φ (Using (2)) =0 = (Apply (2) repeaedly) 1 2 = c 1 c j F 1 F +1 φ =0 η j= }{{} = e 1 = c 1 e 1 (Y +1 (Y +1 Y ) Y ) (5) The vecor e can be compued wih he recursion e = =0 1 = =0 η η 1 j= 1 j= 1 = c 1 F =0 η c j c j F F +1 φ F F +1 φ + η φ 2 j= = c 1 F e 1 + η φ c j F 1 F +1 φ + η φ = c 1 e 1 + η (1 c 1 φ e 1 )φ

4 We plug (5) bac ino (4) and obain θ+1 +1 θ = c 1 F e 1 (Y +1 Y ) + η (Y +1 φ θ )φ + x = (e η φ )(Y +1 = (Y +1 Y ) + η (Y +1 φ θ )φ + x Y )e + η (Y φ θ )φ + x Because θ 0, = θ0 for all, he desired resul follows hrough inducion The heorem shows ha under condiion (2) we can urn a general forward view ino an equivalen online algorihm ha only uses O(n) compuaion per ime sep Compared o previous wor on forward/bacward equivalences, his grans us wo imporan hings Firs, he obained equivalence is boh online and exac; mos previous equivalences were only exac under offline updaing, when he weighs are no updaed during learning (Suon & Baro, 1998; Suon e al, 2014) Second, he heorem is consrucive, and gives an equivalen bacward view direcly from a desired forward view, raher han having o prove such an equivalence in hindsigh (as in, eg, van Seijen & Suon, 2014) This is perhaps he main benefi of he heorem: raher han relying on insigh and inuiion o consruc efficien online algorihms, Theorem 1 can be used o derive an exac bacward view direcly from a desired forward view We exploi his in Secion 6 when we urn a desired offpolicy forward view ino an efficien new online off-policy algorihm We refer o races of he general form (3) as duch races The race updae can be inerpreed as firs shrining he races wih c, for insance c = γλ, and hen updaing he races for he curren sae, φ e, owards one wih a sep size of η In conras, radiional accumulaing races, defined by e = c 1 e 1 + φ, add o he race value of he curren sae raher han updaing i oward one This can cause he accumulaing races o grow large, poenially resuling in high-variance updaes To demonsrae one advanage of Theorem 1, we apply i o he on-policy TD(λ) forward view defined by (1) Theorem 2 (Equivalence for rue online TD(λ)) Define θ 0 = θ 0 Then, θ as defined by (1) equals θ as defined by he bacward view δ = R +1 + γφ +1θ φ θ 1, e = γλe 1 + α (1 γλφ e 1 )φ, θ +1 = θ + δ e + α (φ θ 1 φ θ )φ Proof In Theorem 1, we subsiue x = 0, c = γλ and Y = Gλ +1,, such ha Y Y = δ and Y = φ θ 1 The desired resul follows immediaely The bacward view in Theorem 2 is rue online TD(λ), as proposed by van Seijen and Suon (2014) Using Theorem 1, we have proved equivalence o is forward view wih a few simple subsiuions, whereas he original proof is much longer and more complex 4 Off-policy learning In his secion, we urn o off-policy learning wih funcion approximaion In consrucing an off-policy forward view wo issues arise ha are no presen in he on-policy seing Firs, we need o esimae he value of a policy ha is differen han he one used o obain he observaions Second, using a forward view such as (1) under off-policy sampling can cause i o be unsable, poenially resuling in divergence of he weighs (Suon e al, 2008) These issues can be avoided by consrucing our off-policy algorihms o minimize a mean-squared projeced Bellman error (MSPBE) wih gradien descen (Suon e al, 2009; Maei & Suon, 2010; Maei, 2011) The MSPBE was previously used o derive GTD(λ) (Maei, 2011), which is an online algorihm ha can be used o learn off-policy predicions GTD(λ) was no consruced o be exacly equivalen o any forward view and i is a naural quesion wheher he algorihm can be improved from having such an equivalence, jus as was he case wih TD(λ) and rue online TD(λ) In his secion, we inroduce an off-policy MSPBE and show how GTD(λ) can be derived In he nex secion, we use he same MSPBE o consruc a new off-policy forward view from which we will derive an exacly equivalen online bacward view To obain esimaes for one disribuion when he samples are generaed under anoher disribuion, we can weigh he observaions by he relaive probabiliies of hese observaions occurring under he arge policy, as compared o he behavior disribuion This is called imporance sampling (Rubinsein, 1981; Precup, Suon & Singh, 2000) Recall ha b(a s) and π(a s) denoe he probabiliies of selecing acion a in sae s according o he behavior policy and he arge policy, respecively Afer selecing an acion A in a sae S according o b, we observe a reward R +1 The expeced value of his reward is E b R +1 ], bu if we muliply he reward wih he imporance-sampling raio ρ = π(a S )/b(a S ) he expeced value is E b ρ R +1 S ] = a b(a S ) π(a S ) b(a S ) ER +1 S, A = a] = π(a S )ER +1 S, A = a] a = E π R +1 S ] Therefore ρ R +1 is an unbiased sample for he reward under he arge policy This echnique can be applied o all he rewards and value esimaes in a given λ-reurn

5 For insance, if we wan o obain an unbiased sample for he reward under he arge policy n seps afer he curren sae S, he oal weigh applied o his reward should be ρ ρ +1 ρ +n 1 An off-policy λ-reurn saring from sae S is given by G λρ (θ) = ρ (R +1 + γ +1 (1 λ +1 )φ +1θ (6) ) + γ +1 λ +1 G λ +1(θ) In conras o G λ, his reurn is defined as a funcion of a single weigh vecor θ This is useful laer, when we wish o deermine he gradien of his reurn wih respec o θ When using funcion approximaion i is generally no possible o esimae he value of each sae wih full accuracy or, equivalenly, o reduce he condiional expeced TD error for each sae o zero a he same ime More formally, le v θ be a parameerized value funcion defined by v θ (s) = θ φ(s) and le T λ π be a paramerized Bellman operaor defined, for any v : {s} R, by (T λ π v)(s) = E π R1 + γ 1 (1 λ 1 )v(s 1 ) + γ 1 λ 1 (T λ π v)(s 1 ) S 0 = s ] In general, we hen canno achieve v θ = Tπ λ v θ, because Tπ λ v θ is no guaraneed o be a funcion ha we can represen wih our chosen funcion approximaion I is, however, possible o find he fixed poin defined by v θ = ΠT λ π v θ (7) where Πv is a projecion of v ino he space of represenable funcions {v θ θ R n } Le d be he seady-sae disribuion of saes under he behavior policy The projecion of any v is hen defined by Πv = v θv, where θ v = arg min v θ v 2 d, θ where 2 d is a norm defined by f 2 d = s d(s)f(s)2 Following Maei (2011), he projecion is defined in erms of he seady-sae disribuion resuling from he behavior policy, which means ha d(s) = lim P(S = s A j b( S j ), j) This implies our objecive weighs he imporance of he accuracy of he predicion in each sae according o he relaive frequency ha his sae occurs under he behavior policy, which is a naural choice for online learning The fixed poin in (7) can be found by minimizing he MSPBE defined by (Maei, 2011) J(θ) = v θ ΠTπ λ v θ 2 d (8) = E b δ π (θ)φ ] E b φ φ ] 1 Eb δ π (θ)φ ], where δ π (θ) = (T λ π v θ )(S ) v θ (S ) and where he expecancies are wih respec o he seady-sae disribuion d, as induced by he behavior policy b The ideal gradien updae for ime sep is hen where 1 2 θ J(θ) θ θ +1 = θ 1 2 α θ J(θ) θ, (9) = E b θ δ π (θ)φ = E b (φ θ G λρ (θ))φ ] Eb φ φ ] 1 Eb δ π (θ )φ ] ] E b φ φ ] 1Eb δ π (θ )φ ] = E b δ π (θ )φ ] ] E b θ G λρ (θ)φ E b φ φ ] 1 Eb δ π (θ )φ ] w = E b δ π (θ )φ ] E b θ G λρ ] (θ)φ, (10) wih G λρ as defined in (6), and where w = Eb φ φ ] 1 Eb δ π (θ )φ ] Updae (9) can be inerpreed as an expeced forward view The derivaion of he GTD(λ) algorihm proceeds by exploiing he expeced equivalences (Maei, 2011) E b θ G (θ)φ ] = E b ρ γ +1 (1 λ +1 )φ +1 φ ] + E b ρ γ +1 λ +1 θ G +1 (θ)φ ] = E b ρ γ +1 (1 λ +1 )φ +1 φ ] + E b ρ 1 γ λ θ G (θ)φ ] 1 = E b ρ γ +1 (1 λ +1 )φ +1 φ ] + E b ρ 1 γ λ ρ γ +1 (1 λ +1 )φ +1 φ ] 1 + E b ρ 1 γ λ ρ γ +1 λ +1 θ G +1 (θ)φ ] 1 = E b ρ γ +1 (1 λ +1 )φ +1 φ ] + E b ρ 1 γ λ ρ γ +1 (1 λ +1 )φ +1 φ ] 1 + E b ρ 2 γ 1 λ 1 ρ 1 γ λ θ G (θ)φ ] 2 = (Repea unil we reach φ 0 ) ] = E b γ +1 (1 λ +1 )φ +1 ρ ρ i 1 γ i λ i j=0 i=j+1 φ j }{{} = (e ) = E b γ+1 (1 λ +1 )φ +1 (e ) ], (11) and, similarly, E b δ π(θ )φ ] = E b δ (θ )e ], where e = ρ (γ λ e 1 + φ ), (12) δ (θ) = R +1 + γ +1 φ +1θ φ θ The auxiliary vecor w w can be updaed wih leas mean squares (LMS) (Suon e al, 2009; Maei, 2011), using he sample δ (θ )e E b δ (θ )e ] = E b δ π (θ )φ ]

6 and he updae w +1 = w + β δ (θ )e β φ w φ The complee GTD(λ) algorihm is hen defined by 1 δ = R +1 + γ +1 φ +1θ φ θ, e = ρ (γ λ e 1 + φ ), θ +1 = θ + α δ e α γ +1 (1 λ +1 )w e φ +1, w +1 = w + β δ e β φ w φ 5 An off-policy forward view In his secion, we define an off-policy forward view which we urn ino a fully equivalen bacward view in he nex secion, using Theorem 1 GTD(λ) is derived by firs urning an expeced forward view ino an expeced bacward view, and hen sampling We propose insead o sample he expeced forward view direcly and hen inver he sampled forward view ino an equivalen online bacward view This way we obain an exac equivalence beween forward and bacward views insead of he expeced equivalence of GTD(λ) This was previously no nown o be possible, bu i has he advanage ha we can use he precise (poenially discouned and boosrapped) sample reurns consising of all fuure rewards and sae values in each updae This can resul in more accurae predicions, as confirmed by our experimens in Secion 7 The new forward view derives from he MSPBE, as defined in (8), and more specifically from he gradien updae defined by (9) and (10) To find an implemenable inerim forward view, we need sampled esimaes of all hree pars in (10) We discuss each of hese pars separaely Our inerim forward view is defined in erms of a daa horizon, so he gradien of he MSPBE is aen o θ raher han θ Furhermore, δ π is defined as he error beween a λ-reurn and a curren esimae, and herefore we need o consruc an inerim λ-reurn To esimae he firs erm of (10) we herefore need an esimae for E b δ π(θ )φ ] = E b G λρ, φ θ ], for some suiably defined G λρ, The variance of off-policy updaes is ofen lower when we weigh he errors (ha is, he difference beween he reurn and he curren esimae) wih he imporance-sampling raios, raher han weighing he reurns (Suon e al, 2014) Le δ = R +1 + γ +1 φ +1 θ φ θ 1 denoe a onesep TD error The on-policy reurn used in he forward view (1) can hen be wrien as a sum of such errors: ( 1 j ) G λ, = φ θ 1 + γ i λ i δ j j= i=+1 1 Dann, Neumann and Peers (2014) call his algorihm TDC(λ), bu we use he original name by Maei (2011) We apply he imporance-sampling weighs o he one-sep TD errors, raher han jus o he reward and boosrapped value esimae 2 This does no affec he expeced value, because E b ρ φ θ ] ] 1 S = Eb φ θ 1 S, bu i can have a beneficial effec on he variance of he resuling updaes A sampled off-policy error is hen where G λρ, G λρ, ρ φ θ E b δ π (θ )φ ], (13) 1 = ρ φ θ 1 + ρ j= ( j i=+1 γ i λ i ρ i ) δ j An equivalen recursive definiion for G λρ, is ( G λρ, = ρ R +1 + γ +1 (1 λ +1 ρ +1 )φ +1θ for <, and G λρ, when ρ = 1 for all, G λρ + γ +1 λ +1 G λρ +1, ), (14) = ρ φ θ 1 In he on-policy case,, reduces exacly o Gλ,, as used in he forward view (1) for rue online TD(λ) Furhermore, E b G λρ, S = s] = E π G λ, S = s] for any s For he second erm in (10), which can be hough of as he gradien correcion erm, we need an esimae w w As in he derivaion of GTD(λ), we use a LMS updae Assuming we have daa up o, he ideal forward-view updae for w is hen w +1 = w + β (δ λρ, φ w )φ, (15) for some appropriae sample δ λρ, E bδ π (θ )] A naural inerim esimae is defined by where δ λρ, = 0 and δ λρ, = ρ (δ + γ +1 λ +1 δ λρ +1, ), (16) δ = R +1 + γ +1 θ φ +1 θ φ This is no he only possible way o esimae w, bu his choice ensures he resuling algorihm is equivalen o GTD(0) when λ = 0, allowing us o invesigae he effecs of he rue online equivalence and he resuling new race updaes in some isolaion wihou having o worry abou oher poenial differences beween he algorihms In he nex secion we consruc an equivalen bacward view for (15) o compue he sequence {w }, where w = w, 2 For he PTD(λ) and PQ(λ) algorihms, Suon e al (2014) propose anoher weighing based on weighing fla reurn errors conaining muliple rewards In conras, our weighing is chosen o be consisen wih GTD(λ) True online versions of PTD(λ) and PQ(λ) exis, bu we do no consider hem furher in his paper

7 Finally, we use he expeced equivalence proved in (11), and hen sample o obain γ +1 (1 λ +1 )φ +1 (e ) E b θ G (θ)φ ], (17) wih e as defined in (12) We now have all he pieces o sae he off-policy forward view for θ We approximae he expeced forward view as defined by (9) and (10) by using he sampled esimaes (13), (17) and w = w w, wih w as defined by (15) This gives us he inerim forward view θ +1 = θ + α (G λρ, ρ φ θ )φ (18) wih G λρ, as defined in (14) α γ +1 (1 λ +1 )φ +1 w e, 6 Bacward view: rue online GTD(λ) In his secion, we apply Theorem 1 o conver he offpolicy forward view as given by (18) ino an efficien online bacward view Firs, we consider w Theorem 3 (Auxiliary vecors) The vecor w, as defined by he forward view in (15), is equal o w as defined by he bacward view e w = ρ 1 γ λ e w 1 + β (1 ρ 1 γ λ φ e w 1)φ, w +1 = w + ρ δ e w β φ w φ, where e w 0 = β 0 φ 0, w 0 = w 0,, and δ = R+1 + γ +1 φ +1θ φ θ Proof We apply Theorem 1 by subsiuing θ = w, η = β, x = 0 and Y, as defined in (16) Then = δλρ, δ λρ,+1 δλρ, = ρ γ +1 λ +1 (δ λρ +1,+1 δλρ +1, ), which implies c = ρ γ +1 λ +1 Finally, Y and Y +1 Y = δ λρ, = 0 = δ λρ,+1 = ρ δ Insering hese subsiuions ino he bacward view in Theorem 1 immediaely yields he bacward view in he curren heorem Theorem 4 (True online GTD(λ)) For any, he weigh vecor θ as defined by forward view in (18) is equal o θ, as defined by he bacward view e = ρ (γ λ e 1 + α (1 ρ γ λ φ e 1 )φ ), e = ρ (γ λ e 1 + φ ), θ +1 = θ + δ e + (e α ρ φ )(θ θ 1 ) φ wih w and δ as defined in Theorem 3 α γ +1 (1 λ +1 )w e φ +1, Proof Again, we apply Theorem 1 Subsiue η = ρ α, x = α γ +1 (1 λ +1 )φ +1 w e, Y = θ 1φ and Y = R +1 +γ +1 (1 λ +1 ρ +1 )θ φ +1 +λ +1 G λρ +1, ] This las subsiuion implies Y +1 Y = γ +1 λ +1 ρ +1 (Y Y +1), so ha c = γ +1 λ +1 ρ +1 Furhermore, Y +1 Y = R +1 + γ +1 θ φ +1 θ 1φ = δ + (θ θ 1 ) φ Applying Theorem 1 wih hese subsiuions, and replacing w wih he equivalen w, yields he bacward view θ +1 = θ + (δ + (θ θ 1 ) φ )e + α ρ (θ 1φ θ φ )φ α γ +1 (1 λ +1 )w e φ +1, = θ + δ e + (e α ρ φ )φ (θ θ 1 ) where e 0 = α 0 ρ 0 φ 0 and α γ +1 (1 λ +1 )w e φ +1, e = ρ γ λ e 1 + α ρ (1 ρ γ λ e 1φ )φ True online GTD(λ) algorihm is hen defined by δ = R +1 + γ +1 φ +1θ φ θ, e = ρ (γ λ e 1 + α (1 ρ γ λ φ e 1 )φ ), e = ρ (γ λ e 1 + φ ), e w = ρ 1 γ λ e w 1 + β (1 ρ 1 γ λ φ e w 1)φ, θ +1 = θ + δ e + (e α ρ φ )(θ θ 1 ) φ w +1 = w + ρ δ e w β φ w φ α γ +1 (1 λ +1 )w e φ +1, The races e and e w are duch races The race e is an accumulaing race ha follows from he gradien correcion, as discussed in Secion 4 I migh be possible o adap he forward view o replace e wih e This is already possible in pracice and in preliminary experimens he resuling algorihm performed similar o rue online GTD(λ) A more deailed invesigaion of his possibiliy is lef for fuure wor For λ = 0 he algorihm reduces o θ +1 = θ + α ρ δ φ α ρ γ +1 w φ φ +1, w +1 = w + β ρ δ φ β φ w φ, which is precisely GTD(0) 3 3 The on-policy varian of his algorihm, wih ρ = 1 for all, is nown as TDC (Suon e al, 2009; Maei, 2011)

8 7 Experimens We compare rue online GTD(λ) o GTD(λ) empirically in various seings The main goal of he experimens is o es he inuiion ha rue online GTD(λ) should be more robus o high sep sizes and high λ, due o is rue online equivalence and beer behaved races This was shown o be he case for rue online TD(λ) (van Seijen & Suon, 2014), and he experimens serve o verify ha his exends o he off-policy seing wih rue online GTD(λ) This is relevan because i implies rue online GTD(λ) should hen be easier o une in pracice, and because hese parameers can effec he limiing performance of he algorihms as well Boh algorihms opimize he MSPBE, as given in (8), which is a funcion of λ When he sae represenaion is of poor qualiy, he soluion ha minimizes he MSPBE can sill have a high mean-sqared error (MSE): v θ v π 2 d This means ha wih a low λ we are no always guaraneed o reach a low MSE, even asympoically The closer λ is o one, he closer he MSPBE becomes o he MSE, wih equaliy for λ = 1 In pracice his implies ha someimes we need a high λ o be able o obain an sufficienly accurae predicions, even if we run he algorihms a long ime To illusrae hese poins, we invesigae a fairly simple problem The problem seing is a random wal consising of 15 saes ha can be hough o lie on a horizonal line In each sae we have wo acions: move one sae o he lef, or one sae o he righ If we move lef in he lefmos sae, s 1, we bounce bac ino ha sae If we move righ in he righ-mos sae, s 15, he episode ends and we ge a reward of +1 On all oher ime seps, he reward is zero Each episode sars in s 1, which is he lef-mos sae This problem seing is similar o he one used by van Seijen and Suon (2014), wih hree differences Firs, we use 15 raher han 11 saes, bu his maes lile difference o he conclusions Second, we urn i ino an off-policy learning problem, as we describe in a momen Third, we use differen sae represenaions This las poin is because we wan o es he performance of he algorihm no jus wih feaures ha can accuraely represen he value funcion, as used by van Seijen and Suon, bu also wih feaures ha canno reduce he MSE all he way o zero In he original problem, here was a 09 probabiliy of moving righ in each sae (van Seijen & Suon, 2014) Here, we inerpre hese probabiliies as begin due o a behavior policy ha selecs he righ acion wih probabiliy 09 Then, we formulae a arge policy ha moves righ more ofen, wih probabiliy 095 The sochasic arge policy demonsraes ha our algorihm is applicable o arbirary off-policy learning ass, and ha he resuls do no depend on he arge policy being deerminisic We did also es he performance for a deerminisic policy ha moves righ always and he resuls are similar o hose given below Be- abular φ binary φ monoonic φ MSE MSE MSE λ = 1 GTD(λ) λ = α rue online GTD(λ) α Figure 1: The MSE on he random wal of GTD(λ) (lef column) and rue online GTD(λ) (righ column) The x- axis shows α, and he differen lines are for differen λ, wih λ = 0 in blue and λ = 1 in orange The op row is for 15 abular feaures, he middle row for 4 binary feaures, and he boom row for 2 monoonic feaures The MSE is minimized over β cause his is an episodic as, γ = 1 As saed above, we define hree differen sae represenaions In he firs as, we use abular feaures, such ha φ(s i ) is a vecor of 15 elemens, wih he ih elemen equal o one and all oher elemens equal o zero In he second as he sae number is urned ino a binary represenaion, such ha φ(s 1 ) = (0, 0, 0, 1), φ(s 2 ) = (0, 0, 1, 0), φ(s 3 ) = (0, 0, 1, 1), and so on up o φ(s 15 ) = (1, 1, 1, 1) The feaures are hen normalized o be uni vecors, such ha for insance φ(s 3 ) = 1 (0, 0, 2 1, 2 ) and φ(s 15 ) = ( 1 2, 1 2, 1 2, 1 2 ) In our final represenaion, we use one monoonically increasing feaure and one monoonically decreasing feaure, such ha φ(s i ) = ( 14 i+1 14, i 1 14 ) for all i These feaures were no normalized For α he range of parameers was from 2 8 o 1 wih seps in he exponen of 025 so ha α {2 8, 2 775,, 1} The secondary sep size β was varied over he same range, wih he addiion of β = 0 The race parameer λ was varied from 0 o wih seps of 1 in he exponen and wih he addiion of λ = 1, such ha λ {0, 1 2 1,, 1 2 9, , 1} The MSE (averaged over 20 repeiions) afer 10 episodes for all hree represenaions are shown in Figure 1 The lef graphs all correspond o GTD(λ) and he plos on he righ

9 MSE binary feaures GTD(λ) rue online GTD(λ) λ Figure 2: The MSE on he random wal for differen λ of GTD(λ) and rue online GTD(λ) for opimized α and β and binary feaures are for rue online GTD(λ) Each graph show he MSE as a funcion of α, wih differen lines for differen values of λ of which he exremes are highlighed (λ = 0 is blue; λ = 1 is orange) In all cases, he MSE was minimized for β, bu his secondary sep size had lile impac on he performance a all in hese problems Noe ha he blue lines in he pair of graphs in each row are exacly equal, because by design he algorihms are equivalen for λ = 0 In he op plos, he abular represenaion was used and we see ha especially wih high λ boh algorihms reach low predicion errors This demonsraes ha indeed learning can be faser wih higher λ When using funcion approximaion, in he middle and boom graphs, he benefi of having an online equivalence o a well-defined forward view becomes apparen For boh represenaions, he performance of GTD(λ) wih higher λ begins o deeriorae around α = 02 In conras, rue online GTD(λ) performs well even for α = λ = 1 Noe he log scale of he y-axis; he difference in MSE is many orders of magniude In pracice i is no always possible o fully une he algorihmic parameers and herefore he robusness of rue online GTD(λ) o differen seings is imporan However, i is sill ineresing o see wha he bes performance could be for a fully uned algorihm Therefore, in Figure 2 we show he MSE as a funcion of λ when minimized over boh α and β For all λ, rue online GTD(λ) ouperforms GTD(λ) 8 Discussion The main heoreical conribuion of his paper is a general heorem for equivalences beween forward and bacward views The heorem allows us o find an efficien fully equivalen online algorihm for a desired forward view The heorem is as general as required and as specific as possible for all applicaions of i in his paper, and in is curren form i is limied o forward views for which an O(n) bacward view exiss The heorem can be generalized furher, o include recursive (off-policy) LSTD(λ) (Boyan, 1999) and oher algorihms ha can be formulaed in erms of forward views (cf Geis & Scherrer, 2014; Dann, Neumann & Peers, 2014), bu we did no invesigae hese exensions We used Theorem 1 o consruc a new off-policy algorihm named rue online GTD(λ), which is he firs TD algorihm o have an exac online equivalence o a off-policy forward view We consruced his forward view o mainain equivalence o he exising GTD(λ) algorihm for λ = 0 The forward view we proposed is no he only possible, and in paricular i will be ineresing o invesigae differen mehods of imporance sampling We could for insance use he imporance sampling as proposed by Suon e al (2014) We did consruc he resuling online algorihm, and in preliminary ess is performance was similar o rue online GTD(λ) Liewise, if desired, i is possible o obain a full online equivalence o off-policy Mone Carlo for λ = 1 by consrucing a forward view ha achieves his For insance we could use a similar forward view as used in he paper, bu hen apply he imporance-sampling raios only o he reurns raher han o he errors For now, i remains an open quesion wha he bes off-policy forward view is True online GTD(λ) is limied o sae-value esimaes I is sraighforward o consruc a corresponding algorihm for acion values, similar o he correspondence beween GTD(λ) and GQ(λ) (Maei & Suon, 2010; Maei, 2011) and beween PTD(λ) and PQ(λ) (Suon e al, 2014) We leave such an exension for fuure wor Acnowledgmens The auhors han Joseph Modayil, Harm van Seijen and Adam Whie for fruiful discussions ha helped improve he qualiy of his wor This wor was suppored by grans from Albera Innovaes Technology Fuures, he Naional Science and Engineering Research Council of Canada, and he Albera Innovaes Cenre for Machine Learning References Boyan, J A, (1999) Leas-squares emporal difference learning In Proceedings of he 16h Inernaional Conference on Machine Learning, pp Dann, C, Neumann, G, & Peers, J (2014) Policy evaluaion wih emporal differences: A survey and comparison In Journal of Machine Learning Research 15: Geis, M, & Scherrer, B (2014) Off-policy learning wih eligibiliy races: A survey In Journal of Machine Learning Research 15: Maei, H R, & Suon, R S (2010) GQ(λ): A general gradien algorihm for emporal-difference predicion learning wih eligibiliy races In Proceedings of he Third Conference on Arificial General Inelligence, pp Alanis Press

10 Maei, H R (2011) Gradien Temporal-Difference Learning Algorihms PhD hesis, Universiy of Albera Precup, D, Suon, R S, & Singh, S (2000) Eligibiliy races for off-policy policy evaluaion In Proceedings of he 17h Inernaional Conference on Machine Learning, pp Morgan Kaufmann Rubinsein, R Y (1981) Simulaion and he Mone Carlo mehod New Yor, Wiley Suon, R S (1988) Learning o predic by he mehods of emporal differences Machine Learning 3:9 44 Suon, R S, Baro, A G (1998) Reinforcemen Learning: An Inroducion MIT Press Suon, R S, Mahmood, A R, Precup, D, & van Hassel, H (2014) A new Q(λ) wih inerim forward view and Mone Carlo equivalence In Proceedings of he 31s Inernaional Conference on Machine Learning JMLR W&CP 32(2) Suon, R S, Maei, H R, Precup, D, Bhanagar, S, Silver, D, Szepesvári, Cs, & Wiewiora, E (2009) Fas gradien-descen mehods for emporal-difference learning wih linear funcion approximaion In Proceedings of he 26h Annual Inernaional Conference on Machine Learning, pp , ACM Suon, R S, Modayil, J, Delp, M, Degris, T, Pilarsi, P M, Whie, A, & Precup, D (2011) Horde: A scalable real-ime archiecure for learning nowledge from unsupervised sensorimoor ineracion In Proceedings of he 10h Inernaional Conference on Auonomous Agens and Muliagen Sysems, pp Suon, R S, Szepesvári, Cs, & Maei, H R (2008) A convergen O(n) algorihm for off-policy emporaldifference learning wih linear funcion approximaion In Advances in Neural Informaion Processing Sysems 21, pp MIT Press van Seijen, H, & Suon, R S (2014) True online TD(λ) In Proceedings of he 31s Inernaional Conference on Machine Learning JMLR W&CP 32(1):

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and

More information

True Online Temporal-Difference Learning. A. Rupam Mahmood Patrick M. Pilarski

True Online Temporal-Difference Learning. A. Rupam Mahmood Patrick M. Pilarski True Online Temporal-Difference Learning True Online Temporal-Difference Learning Harm van Seijen harm.vanseijen@ualbera.ca A. Rupam Mahmood ashique@ualbera.ca Parick M. Pilarski parick.pilarski@ualbera.ca

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

Lecture 33: November 29

Lecture 33: November 29 36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

Presentation Overview

Presentation Overview Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updaes Michael Kearns AT&T Labs mkearns@research.a.com Sainder Singh AT&T Labs baveja@research.a.com Absrac We give he firs rigorous upper bounds on he error

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Temporal Abstraction in Temporal-difference Networks

Temporal Abstraction in Temporal-difference Networks Temporal Absracion in Temporal-difference Neworks Richard S. Suon, Eddie J. Rafols, Anna Koop Deparmen of Compuing Science Universiy of Albera Edmonon, AB, Canada T6G 2E8 {suon,erafols,anna}@cs.ualbera.ca

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate.

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate. Inroducion Gordon Model (1962): D P = r g r = consan discoun rae, g = consan dividend growh rae. If raional expecaions of fuure discoun raes and dividend growh vary over ime, so should he D/P raio. Since

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Two Coupled Oscillators / Normal Modes

Two Coupled Oscillators / Normal Modes Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves Rapid Terminaion Evaluaion for Recursive Subdivision of Bezier Curves Thomas F. Hain School of Compuer and Informaion Sciences, Universiy of Souh Alabama, Mobile, AL, U.S.A. Absrac Bézier curve flaening

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

arxiv: v1 [cs.lg] 18 Jul 2018

arxiv: v1 [cs.lg] 18 Jul 2018 General Value Funcion Neworks Mahew Schlegel Universiy of Albera mkschleg@ualbera.ca Adam Whie Universiy of Albera amw8@ualbera.ca Andrew Paerson Indiana Universiy andnpa@indiana.edu arxiv:1807.06763v1

More information

3.1 More on model selection

3.1 More on model selection 3. More on Model selecion 3. Comparing models AIC, BIC, Adjused R squared. 3. Over Fiing problem. 3.3 Sample spliing. 3. More on model selecion crieria Ofen afer model fiing you are lef wih a handful of

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

Some Basic Information about M-S-D Systems

Some Basic Information about M-S-D Systems Some Basic Informaion abou M-S-D Sysems 1 Inroducion We wan o give some summary of he facs concerning unforced (homogeneous) and forced (non-homogeneous) models for linear oscillaors governed by second-order,

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3 and d = c b - b c c d = c b - b c c This process is coninued unil he nh row has been compleed. The complee array of coefficiens is riangular. Noe ha in developing he array an enire row may be divided or

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8) I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models. Technical Repor Doc ID: TR--203 06-March-203 (Las revision: 23-Februar-206) On formulaing quadraic funcions in opimizaion models. Auhor: Erling D. Andersen Convex quadraic consrains quie frequenl appear

More information

In this chapter the model of free motion under gravity is extended to objects projected at an angle. When you have completed it, you should

In this chapter the model of free motion under gravity is extended to objects projected at an angle. When you have completed it, you should Cambridge Universiy Press 978--36-60033-7 Cambridge Inernaional AS and A Level Mahemaics: Mechanics Coursebook Excerp More Informaion Chaper The moion of projeciles In his chaper he model of free moion

More information

14 Autoregressive Moving Average Models

14 Autoregressive Moving Average Models 14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class

More information

DEPARTMENT OF STATISTICS

DEPARTMENT OF STATISTICS A Tes for Mulivariae ARCH Effecs R. Sco Hacker and Abdulnasser Haemi-J 004: DEPARTMENT OF STATISTICS S-0 07 LUND SWEDEN A Tes for Mulivariae ARCH Effecs R. Sco Hacker Jönköping Inernaional Business School

More information

Learning to Take Concurrent Actions

Learning to Take Concurrent Actions Learning o Take Concurren Acions Khashayar Rohanimanesh Deparmen of Compuer Science Universiy of Massachuses Amhers, MA 0003 khash@cs.umass.edu Sridhar Mahadevan Deparmen of Compuer Science Universiy of

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

Numerical Dispersion

Numerical Dispersion eview of Linear Numerical Sabiliy Numerical Dispersion n he previous lecure, we considered he linear numerical sabiliy of boh advecion and diffusion erms when approimaed wih several spaial and emporal

More information

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t Exercise 7 C P = α + β R P + u C = αp + βr + v (a) (b) C R = α P R + β + w (c) Assumpions abou he disurbances u, v, w : Classical assumions on he disurbance of one of he equaions, eg. on (b): E(v v s P,

More information

Let us start with a two dimensional case. We consider a vector ( x,

Let us start with a two dimensional case. We consider a vector ( x, Roaion marices We consider now roaion marices in wo and hree dimensions. We sar wih wo dimensions since wo dimensions are easier han hree o undersand, and one dimension is a lile oo simple. However, our

More information

Planning in POMDPs. Dominik Schoenberger Abstract

Planning in POMDPs. Dominik Schoenberger Abstract Planning in POMDPs Dominik Schoenberger d.schoenberger@sud.u-darmsad.de Absrac This documen briefly explains wha a Parially Observable Markov Decision Process is. Furhermore i inroduces he differen approaches

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9)

CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9) CSE/NB 528 Lecure 14: Reinforcemen Learning Chaper 9 Image from hp://clasdean.la.asu.edu/news/images/ubep2001/neuron3.jpg Lecure figures are from Dayan & Abbo s book hp://people.brandeis.edu/~abbo/book/index.hml

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H.

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H. ACE 56 Fall 5 Lecure 8: The Simple Linear Regression Model: R, Reporing he Resuls and Predicion by Professor Sco H. Irwin Required Readings: Griffihs, Hill and Judge. "Explaining Variaion in he Dependen

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1 SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision

More information

Longest Common Prefixes

Longest Common Prefixes Longes Common Prefixes The sandard ordering for srings is he lexicographical order. I is induced by an order over he alphabe. We will use he same symbols (,

More information

A variational radial basis function approximation for diffusion processes.

A variational radial basis function approximation for diffusion processes. A variaional radial basis funcion approximaion for diffusion processes. Michail D. Vreas, Dan Cornford and Yuan Shen {vreasm, d.cornford, y.shen}@ason.ac.uk Ason Universiy, Birmingham, UK hp://www.ncrg.ason.ac.uk

More information

Sequential Importance Resampling (SIR) Particle Filter

Sequential Importance Resampling (SIR) Particle Filter Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

CSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14

CSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14 CSE/NB 58 Lecure 14: From Supervised o Reinforcemen Learning Chaper 9 1 Recall from las ime: Sigmoid Neworks Oupu v T g w u g wiui w Inpu nodes u = u 1 u u 3 T i Sigmoid oupu funcion: 1 g a 1 a e 1 ga

More information

BU Macro BU Macro Fall 2008, Lecture 4

BU Macro BU Macro Fall 2008, Lecture 4 Dynamic Programming BU Macro 2008 Lecure 4 1 Ouline 1. Cerainy opimizaion problem used o illusrae: a. Resricions on exogenous variables b. Value funcion c. Policy funcion d. The Bellman equaion and an

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

The equation to any straight line can be expressed in the form:

The equation to any straight line can be expressed in the form: Sring Graphs Par 1 Answers 1 TI-Nspire Invesigaion Suden min Aims Deermine a series of equaions of sraigh lines o form a paern similar o ha formed by he cables on he Jerusalem Chords Bridge. Deermine he

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number

More information

2. Nonlinear Conservation Law Equations

2. Nonlinear Conservation Law Equations . Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear

More information

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin ACE 56 Fall 005 Lecure 4: Simple Linear Regression Model: Specificaion and Esimaion by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Simple Regression: Economic and Saisical Model

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION DOI: 0.038/NCLIMATE893 Temporal resoluion and DICE * Supplemenal Informaion Alex L. Maren and Sephen C. Newbold Naional Cener for Environmenal Economics, US Environmenal Proecion

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

Unit Root Time Series. Univariate random walk

Unit Root Time Series. Univariate random walk Uni Roo ime Series Univariae random walk Consider he regression y y where ~ iid N 0, he leas squares esimae of is: ˆ yy y y yy Now wha if = If y y hen le y 0 =0 so ha y j j If ~ iid N 0, hen y ~ N 0, he

More information

Principle of Least Action

Principle of Least Action The Based on par of Chaper 19, Volume II of The Feynman Lecures on Physics Addison-Wesley, 1964: pages 19-1 hru 19-3 & 19-8 hru 19-9. Edwin F. Taylor July. The Acion Sofware The se of exercises on Acion

More information

) were both constant and we brought them from under the integral.

) were both constant and we brought them from under the integral. YIELD-PER-RECRUIT (coninued The yield-per-recrui model applies o a cohor, bu we saw in he Age Disribuions lecure ha he properies of a cohor do no apply in general o a collecion of cohors, which is wha

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

Unsteady Flow Problems

Unsteady Flow Problems School of Mechanical Aerospace and Civil Engineering Unseady Flow Problems T. J. Craf George Begg Building, C41 TPFE MSc CFD-1 Reading: J. Ferziger, M. Peric, Compuaional Mehods for Fluid Dynamics H.K.

More information

5.1 - Logarithms and Their Properties

5.1 - Logarithms and Their Properties Chaper 5 Logarihmic Funcions 5.1 - Logarihms and Their Properies Suppose ha a populaion grows according o he formula P 10, where P is he colony size a ime, in hours. When will he populaion be 2500? We

More information

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070

More information

EXERCISES FOR SECTION 1.5

EXERCISES FOR SECTION 1.5 1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler

More information

Single and Double Pendulum Models

Single and Double Pendulum Models Single and Double Pendulum Models Mah 596 Projec Summary Spring 2016 Jarod Har 1 Overview Differen ypes of pendulums are used o model many phenomena in various disciplines. In paricular, single and double

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks - Deep Learning: Theory, Techniques & Applicaions - Recurren Neural Neworks - Prof. Maeo Maeucci maeo.maeucci@polimi.i Deparmen of Elecronics, Informaion and Bioengineering Arificial Inelligence and Roboics

More information

( ) a system of differential equations with continuous parametrization ( T = R + These look like, respectively:

( ) a system of differential equations with continuous parametrization ( T = R + These look like, respectively: XIII. DIFFERENCE AND DIFFERENTIAL EQUATIONS Ofen funcions, or a sysem of funcion, are paramerized in erms of some variable, usually denoed as and inerpreed as ime. The variable is wrien as a funcion of

More information

A Reinforcement Learning Approach for Collaborative Filtering

A Reinforcement Learning Approach for Collaborative Filtering A Reinforcemen Learning Approach for Collaboraive Filering Jungkyu Lee, Byonghwa Oh 2, Jihoon Yang 2, and Sungyong Park 2 Cyram Inc, Seoul, Korea jklee@cyram.com 2 Sogang Universiy, Seoul, Korea {mrfive,yangjh,parksy}@sogang.ac.kr

More information

Lecture 4 Kinetics of a particle Part 3: Impulse and Momentum

Lecture 4 Kinetics of a particle Part 3: Impulse and Momentum MEE Engineering Mechanics II Lecure 4 Lecure 4 Kineics of a paricle Par 3: Impulse and Momenum Linear impulse and momenum Saring from he equaion of moion for a paricle of mass m which is subjeced o an

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

1 Differential Equation Investigations using Customizable

1 Differential Equation Investigations using Customizable Differenial Equaion Invesigaions using Cusomizable Mahles Rober Decker The Universiy of Harford Absrac. The auhor has developed some plaform independen, freely available, ineracive programs (mahles) for

More information

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given

More information

Ordinary dierential equations

Ordinary dierential equations Chaper 5 Ordinary dierenial equaions Conens 5.1 Iniial value problem........................... 31 5. Forward Euler's mehod......................... 3 5.3 Runge-Kua mehods.......................... 36

More information

18 Biological models with discrete time

18 Biological models with discrete time 8 Biological models wih discree ime The mos imporan applicaions, however, may be pedagogical. The elegan body of mahemaical heory peraining o linear sysems (Fourier analysis, orhogonal funcions, and so

More information

Chapter 8 The Complete Response of RL and RC Circuits

Chapter 8 The Complete Response of RL and RC Circuits Chaper 8 The Complee Response of RL and RC Circuis Seoul Naional Universiy Deparmen of Elecrical and Compuer Engineering Wha is Firs Order Circuis? Circuis ha conain only one inducor or only one capacior

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17 EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive

More information

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013 IMPLICI AND INVERSE FUNCION HEOREMS PAUL SCHRIMPF 1 OCOBER 25, 213 UNIVERSIY OF BRIISH COLUMBIA ECONOMICS 526 We have exensively sudied how o solve sysems of linear equaions. We know how o check wheher

More information

Lecture 4 Notes (Little s Theorem)

Lecture 4 Notes (Little s Theorem) Lecure 4 Noes (Lile s Theorem) This lecure concerns one of he mos imporan (and simples) heorems in Queuing Theory, Lile s Theorem. More informaion can be found in he course book, Bersekas & Gallagher,

More information

= ( ) ) or a system of differential equations with continuous parametrization (T = R

= ( ) ) or a system of differential equations with continuous parametrization (T = R XIII. DIFFERENCE AND DIFFERENTIAL EQUATIONS Ofen funcions, or a sysem of funcion, are paramerized in erms of some variable, usually denoed as and inerpreed as ime. The variable is wrien as a funcion of

More information